Convolutional Neural Networks – Convolutional neural networks (CNNs) are motivated by the architecture of the cerebral cortex. Which is a higher-level brain structure responsible for complex cognitive functions.

Convolutional Neural Networks

Moreover, it is important to highlight that these systems exhibit a significant distance from the complexity and capabilities that characterize the human brain. however, for the current discourse, we shall forego any discussion of neural science pertaining to the biological aspects, given my lack of expertise and academic knowledge in the biological science domain. This blog post alludes to a movement towards synthetic entities but in plain english and at very high level for ease of understanding.

Convolutional neural networks (CNN) – Outlook

Convolutional neural networks (CNN) are based on simple principles of science and math, even though they may seem complicated to some people. CNNs are a subset of advanced machine learning models created to handle structured grid-based information, particularly images. CNNs are primarily used for image and video analysis. They are designed to automatically learn and extract hierarchical features from visual data using convolutional layers.

CNNs are highly successful in tasks like image classification, object detection, and image generation. When a filter is applied to the input, it will cause the activation to happen. This is a simple and easy thing to do. A feature map is made by using a filter many times on an input. This shows where the features are and how strong they are in the input.

A common application of CNNs is in computerized vision tasks such as the classification of images, the detection of objects within them, and the segmentation of images.

CNN’s are a class of Neural Networks that have proven very effective in areas of image recognition, processing, and classification. In this article, we will explore and discuss our intuitive explanation of convolutional neural networks (CNN’s) on a high level and in simple language.

Convolutional Neural Networks are a special kind of multi-layer neural networks.

The process of a CNN entails four crucial stages: convolution, pooling, flattening, and full connection. These steps were comprehensively explored. Decide on the parameters, apply filters using strides, and include padding if necessary. Apply convolution to the image, followed by matrix activation using ReLU. The fundamental process at CNN, which is crucial to its success, is immensely important. If it is not correctly executed, the entire process is doomed to fail.

What is Deep Learning?

AILabPage defines Deep learning is “undoubtedly a mind-blowing synchronization technique applied on the basis of three foundation pillars: large data, computing power, skills (enriched algorithms), and experience, which practically has no limits”.

Deep learning is a subfield of the machine learning domain. Deep learning is entirely concerned with algorithms inspired by the structure and function of artificial neural networks, which are inspired by the human brain (inspired only, please). Deep learning is used with too much ease to predict the unpredictable. In our opinion, “we are all so busy creating artificial intelligence by using a combination of non-biological neural networks and natural intelligence rather than exploring what we have in hand.

Deep learning is performed by a specialist with a complex skillset in order to achieve better results from the same data set than what could be achieved without it. It makes amazing attempts to mimic the natural intelligence (NI) mechanics of the biological neuron system.

Deep learning prioritizes the acquisition of data representations as its foundational approach to learning. It has a complex skill set because of the methods it uses for training, i.e., learning in deep learning is based on “learning data representations” rather than “task-specific algorithms,” which is the case for other methods.

“I think people need to understand that deep learning is making a lot of things, behind the scenes, much-better” – Sir Geoffrey Hinton

Human Brain: It is a special or critical point of discussion for everyone and a puzzling game of all times as well. How our brain is designed and how it functions we can’t cover in this post as I am nowhere close or even can dream to be close to a neuroscientist. Out of curiosity, I am tempted to compare artificial neural networks with the human brain (with the help of talk shows on such topics).

It’s fascinating to me to know how the human brain is able to decode technologies, numbers, puzzles, handle entertainment, understand science, set body modes into pleasure, aggression, art, etc. How does the brain train itself to name a certain object by just looking at 2-3 images when ANNs need millions of those?

Deep Learning – Introduction to Recurrent Neural Networks

Deep Learning – Deep Convolutional Generative Adversarial Networks Basics

Deep Learning – Backpropagation Algorithm Basics

What are Convolutional Neural Networks

Convolutional neural networks (CNN) It might look or appear like magic to many, but in reality, it’s just simple science and mathematics. CNNs are a class of neural networks that have proven very effective in areas of image recognition; in most cases, they are applied to image processing.

CNNs have had huge adoption and success within computer vision applications, but mainly with supervised learning as compared with unsupervised learning, which has gotten very little attention.

Deep Learning – Introduction to Convolutional Neural Networks

This network is a great example of variation in multilayer perceptrons for processing and classification. It’s a deep learning algorithm in which it takes input as an image, applies weights and biases effectively to its objects, and is finally able to differentiate images from each other.

As per Wiki: “In machine learning, a convolutional neural network (CNN, or ConvNet) is a class of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery.

They have existed for several decades but have been shown to be very powerful when large labelled datasets are used. This requires fast computers (e.g., GPUs)!

The artificial intelligence solutions behind CNN amazingly transform how businesses and developers create user experiences and solve real-world problems. CNN is also known as the application of neuroscience to machine learning. They employ mathematical operations known as “convolution,” which is a specialized kind of linear operation.

Convolutional neural network applications include high-calibre AI systems such as AI-based robots, virtual assistants, and self-driving cars. For image processing, the filters scan through the image to pass the feature map, which gets generated for each filter. Adding more and more filtering layers along with creating more feature maps generally allows abstracts for creating deeper CNN. Other common applications are used for

  • Image Processing
    • Recognition
    • Classification
    • Video labelling
    • Text analysis,
  • Speech Recognition
    • Natural language processing
    • Text classification processing

Convolutional Neural Network applications solve many unresolved problems that could have remained unsolved without CNN layers, including high-caliber AI systems such as AI-based robots, virtual assistants, and self-driving cars.

Other common applications where CNNs are used, as mentioned above, are emotion recognition, estimating age and gender, etc. The best-known models are convolutional neural networks and recurrent neural networks.

Data Processing – Convolutional Neural Network

CNN has a grid topology for processing data. Data points in this topology are called grid-like, as the processing of data happens in a spatial correlation between the neighbouring data points.

  • 1D Grid – Time series data – Takes samples at regular time intervals
  • 2D Grid – Image data – Grid of pixels

These neural networks use the convolution method as opposed to general matrix multiplication in at least one of the layers. Convolution leverages on

  • Equivariant Representations –  This simply means that if the input changes, the output changes in the same way
  • Sparse Interactions – This allows the network to efficiently describe complicated interactions between many variables using simple building blocks.
  • Parameter sharing – Using the same parameter for more than one function in a model

The above structure was created to improve a machine-learning system. CNN also allows for working with inputs of variable size and efficiently describing complicated interactions between many variables from simple building blocks.

There are significant limitations to these neural networks at the API level. Input, e.g., an image, and output, e.g., classes of probabilities, are both fixed-size vectors. Even the computation through its data models is performed by mapping using a fixed number of layers.

Some history around – Convolutional Neural Networks (CNN’s)

ConvNets have been successful in identifying faces, objects, and traffic signs, apart from powering vision in robots and self-driving cars. Yann LeCun was named LeNet5 after many successful iterations since 1988. LeNet was one of the very first convolutional neural networks that pushed forward deep learning.

Everything You Need to Know About Convolutional Neural Networks.

In 2012, Alex Krizhevsky used a convolutional neural network in the ImageNet competition, and ever since then, all the big companies have been running for this. CNNs are the most influential innovations in the computer vision field. In the 1990s, LeNet architecture was used mainly for character recognition tasks such as reading zip codes, digits, etc.

Image Processing – Human vs Computers

For humans, the recognition of objects is the first skill we learn right from birth. A newborn baby starts recognizing faces as Papa, Mumma, etc. By the time you turn into an adult, recognition becomes effortless and a kind of automated process.

Human behavior for processing images is very different from that of machines. Humans give a label to each image automatically by just looking around and immediately characterizing the scene and giving each object a label without even consciously noticing.

Everything You Need to Know About Convolutional Neural Networks.

For computers, recognizing objects is slightly complex as they see everything as input and output, which come as a class or set of classes. This is known as image processing, which we will discuss in detail in the next section. In computers, CNNs do image recognition and image classification.

It is very useful in object detection, face recognition, and various text classification tasks with word embedding, etc. So in simple words, computer vision is the ability to automatically understand any image or video based on visual elements and patterns.

Inputs and Outputs – How it works

Our focus in this post will be on image processing only

CNN requires models to train and test. Each input image passes through a series of convolution layers with filters (Kernels), pooling, and fully connected layers (FC) and applies the softmax function (a generalization of the logistic function that “squashes” a K-dimensional vector of arbitrary real values into a real Kd vector) to classify an object with probabilistic values between 0 and 1. This is the reason every image in CNN gets represented as a matrix of pixel values.

The convolutional neural network classifies an input image into categories, e.g., dog, cat, deer, lion, or bird.

Convolutional Neural Networks

The convolution and pooling layers act as feature extractors from the input image, while a fully connected layer acts as a classifier. In the above image figure, on receiving a dear image as input, the network correctly assigns the highest probability for it (0.94) among all four categories. The sum of all probabilities in the output layer should be one, though. There are four main operations in the ConvNet shown in the image above:

  1. Convolution
  2. Non Linearity (ReLU)
  3. Pooling or Sub Sampling
  4. Classification (Fully Connected Layer)

These operations are the basic building blocks of every convolutional neural network, so understanding how they work is an important step to developing a sound understanding of ConvNets, or CNNs.

Convolutional Neural Networks – Architecture

Layers are used to build convolutional neural networks, as we have mentioned in the above picture. A simple ConvNet is a sequence of layers, and every layer of a ConvNet transforms one volume of activations into another through a differentiable function. We use four main types of layers to build our ConvNet architectures.

Convolutional Layer, ReLU, Pooling, and Fully Connected Layer.


Data Preparation: Prepare the input data for the CNN. Divide the data into training, validation, and test sets and preprocessing the images, such as resizing, normalizing pixel values, and augmenting the dataset with transformations like rotations or flips.

  • Split the dataset into training, validation, and test sets.
  • Preprocess the images, such as resizing and normalizing pixel values.

The initialisations of all filters and parameters/weights with random values

Convolution Layer 

The first layer of a CNN is a convolutional layer. It applies a set of filters (also called kernels) to the input image i.e. image of a deer. This holds the raw pixel values of the training image as input. Lets define a convolutional layer with specified parameters, including the number of filters, filter size, and padding. In the example above, an image (deer) of width 32, height 32, and with three colour channels (R, G, and B) is used.

It goes through the forward propagation step and finds the output probabilities for each class. This layer ensures the spatial relationship between pixels by learning image features using small squares of input data.

  • Apply the convolution operation to the input image using the defined layer.
  • Add a bias term to the convolved output.
  • Apply an activation function, such as ReLU, to introduce non-linearity.
  • Lets assume the output probabilities for image above are [0.2, 0.1, 0.3, 0.4]
  • The size of the feature map is controlled by three parameters.
    • Depth –  Number of filters used for the convolution operation.
    • Stride – number of pixels by which filter matrix over the input matrix.
    • padding – It’s good to input matrix with zeros around the border, matrix.
    • Calculating total error at the output layer with summation over all 4 classes.
      •  Total Error = ∑  ½ (target probability – output probability) ²
      • Computation of output of neurons that are connected to local regions in the input. This may result in volume such as [32x32x16] for 16 filters.

import tensorflow as tf

#Define a convolutional layer

conv_layer = tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), padding=’same’, activation=’relu’)

#Apply convolution operation

conv_output = conv_layer(input_image)

Each filter scans across the image, computing dot products between its weights and local patches of the image. This process generates feature maps that capture different image features.

Rectified Linear Unit (ReLU) Layer

Activation Function: Apply a non-linear activation function, such as ReLU (Rectified Linear Unit), to introduce non-linearity into the model. ReLU sets negative values to zero and keeps positive values unchanged, helping the network learn more complex representations.

This layer applies an element-wise activation function. ReLU is used after every convolution operation. It is applied per pixel and replaces all negative pixel values in the feature map with zero. This leaves the size of the volume unchanged ([32x32x16]). ReLU is a non-linear operation.

Pooling Layer

Also called subsampling or downsampling. The pooling layer does a downsampling operation along the spatial dimensions (width and height), resulting in a volume such as [16x16x16], i.e., it reduces the dimensionality of each feature map but retains the most important information.

  • Max Pooling operation on a Rectified Feature map.
Everything You Need to Know About Convolutional Neural Networks.

So remember pooling layer does downsample the feature maps generated by convolutional layers.

  • Define a pooling layer with specified parameters, such as pool size and stride.
  • Apply the pooling operation to the output of the previous convolutional layer.

#Define a pooling layer

pool_layer = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2))

#Apply pooling operation

pooled_output = pool_layer(conv_output)

Common pooling operations include max pooling, where the maximum value in each region is retained, or average pooling, where the average value is taken. Pooling reduces spatial dimensions while preserving important features.

Fully Connected Layer – Convolutional Neural Networks

In the fully connected layer, each node is connected to every other node in the adjacent layer. The FC layer computes the class scores with a traditional multilayer perceptron that uses a softmax activation function in the output layer. It results in a volume of size [1x1x10], where each of the 10 numbers corresponds to a class score, such as among the 10 categories of CIFAR-10.

After several convolutional and pooling layers, the flattened feature maps are passed to fully connected layers. These layers connect every neuron to every neuron in the previous and subsequent layers. They learn complex patterns and relationships between features and make final predictions.

  • Flatten the output from the last pooling layer to a 1D vector.
  • Define fully connected layers with specified number of neurons and activation functions.
  • Connect the flattened output to the fully connected layers.

#Flatten the output from the last pooling layer

flattened_output = tf.keras.layers.Flatten()(pooled_output)

#Define fully connected layers

fc_layer1 = tf.keras.layers.Dense(units=64, activation=’relu’)
fc_layer2 = tf.keras.layers.Dense(units=10, activation=’softmax’)

#Connect the flattened output to fully connected layers

fc_output1 = fc_layer1(flattened_output)
fc_output2 = fc_layer2(fc_output1)

The main job of this layer is to basically take an input volume as it comes as output from Conv, ReLU, or pool layer proceedings. Arrange the output in an N-dimensional vector, where N is the number of classes that the program has to choose from.

Activation Function (Again)

We now apply an activation function again, such as ReLU or sigmoid, to the outputs of fully connected layers to introduce non-linearity and ensure the model can learn complex mappings.

Output Layer

The final layer of the CNN is the output layer, which produces the network’s predictions. The number of neurons in this layer depends on the specific task, such as binary classification, multi-class classification, or regression. The activation function used in the output layer depends on the task as well.

Loss Function

This is used and also used here to define a loss function to measure the discrepancy between the predicted value I got and actual values. For classification tasks, common loss functions include cross-entropy loss, while for regression tasks, mean squared error (MSE) loss is often used.

#Define the loss function

loss_function = tf.keras.losses.CategoricalCrossentropy()


Now it’s time for us to choose an optimizer, such as stochastic gradient descent (SGD) or Adam, to update the network’s weights based on the gradients computed during backpropagation. The optimizer helps us to minimize the loss function and improve our network’s performance. So let’s compile the model with the chosen optimizer and loss function.

#Define the optimizer

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

#Compile the model

model.compile(optimizer=optimizer, loss=loss_function)

*Defined Adam as an optimizer, with a specified learning rate.


Training our CN Network by iterative feed batches of training data through the network, computing the loss, and updating the weights through backpropagation. This process we continues for a defined number of epochs, or until a stopping criterion is met.

#Train the model, y_train, batch_size=32, epochs=10, validation_data=(x_val, y_val))

Evaluation – The moment of Truth

Now it’s time to evaluate our it’s trained CNN using the validation or test set to measure its performance. In our case evaluation metrics include accuracy, precision, recall, and F1 score, in general, it is depending on the specific task but in our case it’s our deer. Evaluating the trained model using our validation test set.

#Evaluate the model

test_loss, test_accuracy = model.evaluate(x_test, y_test)

    This step-by-step process provides a high-level overview of how Convolutional Neural Networks work. However, keep in mind that implementing and training a CNN can involve more technical details, such as hyper-parameter tuning, regularization techniques, and handling overfitting, which require further understanding and expertise in deep learning.

    As technology continues to advance, CNNs are expected to play a crucial role in advancing image recognition, object detection, and other visual understanding tasks, revolutionizing industries such as healthcare, autonomous driving, and security.

    Training a CNN involves feeding batches of data through the model, computing loss, and updating weights using optimization techniques. The trained CNN can then be evaluated on validation or test sets and used for making predictions on new, unseen data.

    The combination of convolutional layers, pooling layers, and fully connected layers allows CNNs to capture intricate patterns and relationships in images, making them invaluable for various computer vision tasks. With the availability of deep learning frameworks like TensorFlow, implementing CNNs has become more accessible, enabling researchers and practitioners to leverage the power of CNNs in their applications. As technology continues to advance,

    CNNs are expected to play a crucial role in advancing image recognition, object detection, and other visual understanding tasks, revolutionizing industries such as healthcare, autonomous driving, and security.

    Convolutional Neural Networks – Real-life Business Use Cases

    Many modern companies are using CNNs as the backbone of their business, e.g., Pinterest uses it for home feed personalization, and Instagram uses it for search infrastructure. Three of the biggest users are listed below.

    • Automatic Tagging Algorithms: Tagging, or social bookmarking, refers to the action of associating a relevant keyword or phrase with an entity (e.g., a document, image, or video). Our experiment (above) showed us that effective time-frequency representation for automatic tagging and more complex models benefit from more training data.
    • Photo Search: To find images that are similar to the user’s  input or text input, use the Google search results. It works well on the Chrome app. Google’s algorithms rely on more than 200 unique signals, or “clues,” that make it possible to guess a search. Attributes here are websites, the age of content, IP address-based regions, and PageRanks. Sadly, this is highly biased based on your color of skin. You can give it a try, though.
    • Product Recommendations: Large-scale recommender systems are in use in almost every e-commerce, retail, video-on-demand, or music-streaming business. Algorithms in recommender systems are typically classified into two categories: content-based and collaborative filtering methods, although modern recommenders combine both approaches.

    The process of a CNN entails four crucial stages: convolution, pooling, flattening, and full connection. These steps were comprehensively explored. Decide on the parameters, apply filters using strides, and include padding if necessary. Apply convolution to the image, followed by matrix activation using ReLU. The fundamental process at CNN, which is crucial to its success, is immensely important. If it is not correctly executed, the entire process is doomed to fail.

    Books Referred & Other material referred

    • Open Internet & AILabPage member’s hands-on lab work.
    • LeNet5  documentation.

    Points to Note:

    All credits, if any, remain with the original contributor only. We have covered the convolutional neural network, a kind of machine learning, in this post, where we find hidden gems from unlabeled historical data. The last post was on supervised machine learning. In the next post, I will talk about reinforcement machine learning.

    Feedback & Further Question

    Do you have any questions about deep learning or machine learning? Leave a comment or ask your question via email. I will try my best to answer it.

    SECaaS - Security as a Service Is the Next Big Thing

    Conclusion- An endeavor was made in this article to elucidate the fundamental ideas behind convolutional neural networks in plain language. Convolutional Neural Networks (CNNs) are powerful deep learning models specifically designed for image recognition and processing tasks. By leveraging the hierarchical structure of convolutional layers, pooling layers, and fully connected layers, CNNs excel at extracting meaningful features from images and making accurate predictions. The step-by-step process outlined above demonstrates the key components and operations involved in building a CNN, from data preparation and convolutional layers to pooling, fully connected layers, and optimization.

    ============================ About the Author =======================

    Read about Author at : About Me

    Thank you all, for spending your time reading this post. Please share your opinion / comments / critics / agreements or disagreement. Remark for more details about posts, subjects and relevance please read the disclaimer.

    FacebookPage                        ContactMe                          Twitter         ====================================================================

    Posted by V Sharma

    A Technology Specialist boasting 22+ years of exposure to Fintech, Insuretech, and Investtech with proficiency in Data Science, Advanced Analytics, AI (Machine Learning, Neural Networks, Deep Learning), and Blockchain (Trust Assessment, Tokenization, Digital Assets). Demonstrated effectiveness in Mobile Financial Services (Cross Border Remittances, Mobile Money, Mobile Banking, Payments), IT Service Management, Software Engineering, and Mobile Telecom (Mobile Data, Billing, Prepaid Charging Services). Proven success in launching start-ups and new business units - domestically and internationally - with hands-on exposure to engineering and business strategy. "A fervent Physics enthusiast with a self-proclaimed avocation for photography" in my spare time.


    1. STEWARS at

      Excellent post you made such complex subject easy to understand.. Thank you


    2. Hi,
      This is the latest booming technology to learn more about it your post help me.
      Wonderful illustrated information. I thank you for that. No doubt it will be very useful for my future projects. Would like to see some other posts on the same subject!

      Thank you for sharing…


    3. Simple and easy to understand …


    4. Sebastian Mouyana at

      This is very high level info not much of details to learn.
      Do we loose any information when using a feature detector at Convolution + Pooling layers which act as feature extractors?


    5. Ronald Chikanya at

      I am student and worker at same time and I loved your narrative Convolutional Neural Networks are very similar to ordinary Neural Networks from the previous. Please help to answer in details how the flow to FCL happens, pls let me know bit by bit


    6. James Sopit at

      Great post! Thanks. so much for the work for people like me really appreciate. I have few questions though if you can answer please

      1 – What makes convolutional filters in the first convolutional layer “unique”?
      2 – Are all 5×5 filters have same behaviour.
      3 – Are they just being passed through different non-linear functions or something?
      4 – Why don’t they produce the same representations?
      5 – What informs such decisions? makes


    7. […] credits if any remains on the original contributor only.  Last post was on Convolutional Neural Networks. In the next upcoming post will talk about Reinforcement machine […]


    8. […] classifier to determine input. Whether given input looks like real or fake. Discriminator works as convolutional neural network architecture call […]


    9. […] Convolutional Neural Networks. […]


    10. Natalia Polish at

      Greate post, I think you are mising on showing or hiding the down side which are as below, you may add it tou your post if you want. CNNs have many drawback than a weakness.

      Convolutional neural networks like any neural network model are computationally expensive. This can be overcome with better computing hardware such as GPUs and Neuromorphic chips.


    11. Giorgio Zabierowski at

      IN CNNs has issues in class imbalance and overfitting when there many classes (+/- 50 classes).
      Something that a lot of people are concerned about is that no theory gives bounds on the amounts of layers to be used, therefore, it is usually a trial and error thing.


    12. […] or neural networks or neural nets. There are some specialized versions also available. Such as convolution neural networks and recurrent neural networks. These addresses special problem domains. Two of the best use […]


    13. George Chirwa at

      This is super basic but very very informative and useful info for people to start with. I would love to see part-2 and subsequent parts to check for details. Seriously you are helping students and professional a lot, keep writing and keep learning. Many Many Thanks for London School


    14. […] Convolutional neural network – CNN’s are inspired by the structure of the brain but our focus will not be on neural science here as we do not have any expertise or academic knowledge in any of the biological aspects. We are going artificial in this post. CNN’s are a class of Neural Networks that have proven very effective in areas of image recognition, processing, and classification. In this article, we will explore and discuss our intuitive explanation of convolutional neural networks (CNN’s) on a high level and in simple language. […]


    15. Your amazing insightful information entails much to me and especially to my peers. Thanks a ton; from all of us. ExcelR Machine Learning Course Pune


    16. […] Convolutional Neural Networks – CNN a neural network with some convolutional and other layers. The convolutional layer has a number of filters that do a convolutional operation. In other words, CNN’s are a class of Neural Networks that have proven very effective in areas of image recognition processing, and classification. […]


    17. Your amazing insightful information entails much to me and especially to my peers. Thanks a ton; from all of us. ExcelR Machine Learning Course


    18. […] can be used to overcome the shortcoming of CNN’s? etc. I guess if you read this post on “Convolutional Neural Networks“; you will find out the […]


    19. […] Convolutional Neural Network Architecture. Source […]


    20. […] Read the complete article at: […]


    21. […] Advisor: Dr David (Wei) DaiCourse: CS 591ABSTRACTThe project was done in the Spring 2021 semester for the Advanced Artificial Intelligence (CS 591) Class. The project topic was “Sound Classification using Deep Learning”. I choose this project because Sound Classification is one of the most generally used applications in Audio Deep Learning. Deep learning is a subset of machine learning in which multi-layered neural networks are modeled to work like the human brain – ‘learn’ from a large amount of data. The use of deep learning in an automation environment is growing. Such as personal security to critical surveillance, classifying music clips to identify the genre of the music, or classifying short utterances by a set of speakers to identify the speaker based on the voice. The learning capabilities of the deep learning architectures can be used to develop sound classification systems to overcome the efficiency issues of the traditional systems. The project demonstrates the use of a deep learning algorithm, and CNN (Conventional neural network) to find the accuracy of the sound. Furthermore, python programming and its libraries, google collab was used to implement sound classification.6. INTRODUCTIONThe project goal is to differentiate the various type of sound (Urban Sound 8K dataset) using a Deep learning algorithm and visualizing them in the electromagnetic spectrum. Audio/Sound signals are all around us. Humans can differentiate, recognize sound through their and hearing senses, imagine how it will feel when a computer differentiates the sound using machine learning algorithms. There is a growing interest in sound classification for different scenarios. For example, fire alarm detection for hearing impaired people, engine sound analysis for maintenance and patient monitoring in hospitals, etc. The project shows the use of Deep Learning techniques for the classification of different environmental sounds, specifically focusing on the identification of Urban Sounds. The result shows the accuracy of the sound, higher accuracy better prediction.6.1 Keywords1. Deep Learning algorithm to find the accuracy (Sound Classification).2. Google Colab, Python programing, packages, and its libraries.3. Dataset, Urban Sound 8K, CSV that contains sound files.4. Dataset contains 8732 sound excerpts (< = 4s) of urban sounds from 10 classes.5. Visualization of sound prediction, accuracy in spectrogram using matplotlib.7. COLLECTIONS OF DATA FROM THE DATASET7.1 DatasetProject starts with the collection of the dataset downloaded from the given source UrbanSounds website.Fig 5:7.2 Segregation of Data into Various FoldersSegregate data into various folders, and then metadata is prepared (CSV file).Fig 6:8. WRITING CODE AND DATA MODELING/TRAININGThe python programming and jyputer notebook with the package Librosa, Keras, pandas, and matplotlib for visualization. The data sets are used to build and train a deep neural network for prediction. Three parameters are used for the model compile where it finds pre-training accuracy of 6.9834%.Fig 6:8.1 Design Diagram of the ProjectFig 7:9. RESULTThe dataset “Urban Sound 8K” was used for the data training, where the metadata contains information about each audio file in CSV files. The dataset contains about 8732 sound excerpts (< = 4s) of urban sounds from 10 classes. 1. Air Conditioner 2. Car Horn 3. Dog Bark 4. Drilling 5. Engine Idling 6. Gun Shot 7. Jackhammer 8. Siren 9. Street Music 10. Children Playing. The sound excerpts are digital audio files in.wav format. Sound waves are digitized by sampling rate (typically 44.1kHz for CD-quality audio – 44,100 times per second). The audio files are modified with helpers.wav filehelper that reads and writes into .wav audio files. Audio files are converted with the help of pandas data frame features with two columns ‘feature’ and ‘class_label’ that extract 8732 files. A sequential model is used with a sample model architecture which consists of four Conv2D convolution layers with the final output layer. Fig 8:After successfully implementing Deep learning algorithms and training datasets. The result showed the training accuracy 93.43% and testing accuracy 89.24%, which is considered as good accuracy. Good data provides better accuracy and better prediction. In the scenario of the project, the Gunshot sound was clear and had high accuracy among 10 other sounds (class).Fig 9: Police Siren  Fig 10: Gun Shot10. CONCLUSION/FUTURE WORKHowever, the project was successfully implemented, it took some time to gather information and set up the environment because of the large files, dataset, new data science tools. The project taught me more about how to use python programming and its libraries in data science (Machine Learning, Deep Learning, and AI). Moreover, it gave a clear understanding of how AI can play important role in sound classification/prediction, which can be useful for various tasks like fire alarm detection for hearing impaired people, engine sound analysis for maintenance and patient monitoring in hospitals, etc.Future work would be examining how we can extend the project (Sound Classification) both real-time streaming audio and real-world sounds. As we know, audio is complex because it accounts for various background sounds, target sound volume levels, and different types of echoes. The model and MFCC (Mel Frequency Cepstral Coefficient) measurement should work well with low latency while synchronizing with the audio buffer thread without any delays. As I learned from the project, I would suggest next-generation students have a clear understanding of the project requirements. Such as, gathering information, programming tools, and resources before you start the project. Since the project is one of the best projects I have done so far, I would suggest implementing (collecting) a good dataset and finding high accuracy from the selected sound file (class) for better prediction.REFERENCES[1] F.Rong, “Audio Classification Method Bases on Machine Learning” 2016 International Confrence on Intelligent Transportation, Big Data & Smart City (ICTBS), Changsha, China, 2016, pp. 81-84, doi: 10.1109/ICITBS.2016.98.[2] Kons, Zvi & Toledo-Ronen, Orith. (2013) Audio event classification using deep neural networks. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEACH. 1482-1486.[3] Malowany, Dan. “Audio Classification” towards data science, oct 18, 2020. Smales, Mike. “Sound Classification using Deep Learning” Feb 26, 2019. 8bc2aa1990b7[5] K. Jaiswal and D. Kalpeshbhai Patel, “Sound Classification Using Convolutional Neural Networks,” 2018 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), 2018, pp. 81-84, doi: 10.1109/CCEM.2018.00021.Fruits, Vegetables and Deep Learning Processing Image Datasetswith Neural Convolutional Networks using PyTorchNagendra MokaraDecember 2021Southeast Missouri State UniversityAdvisor: Dr Wei DaiCourse: Advance Artificial IntelligenceABSTRACTArtificial Intelligence (AI) is a significant technological achievement that is currently in widespread use. Deep Learning has a wide range of applications due to its ability to construct robust representations from images. A Convolutional Neural Network (CNN) is a Deep Learning system that commands an input image, assigns meaning to various aspects/objects in the image and can distinguish between them. For picture categorization, CNN is the most used Deep Learning architecture. To improve our outcomes, we used a variety of automated processing actions for fruit and vegetable photos. The amount of pre-processing required by a CNN model is far smaller than that required by other deep learning techniques for categorization. The learning capabilities of Deep Learning architectures can also be used to improve sound classification to address efficiency difficulties. The CNN is used in this project, and layers are created to classify images into different categories.Keywords: convolutional neural network; deep learning; image classification.11. INTRODUCTIONEverything you can think of can be classified into a classification or class, and we humans enjoy examining things. It is a common occurrence in business; the daily routine necessitates the analysis of parts, installations, gatherings, and products. To automate the arranging period, people have invented procedures such as Machine Learning (ML), Neural Networks (NN), and Deep Learning (DL), among other calculations. One of the topics we’ll look into is deep learning. Deep learning is an AI function that mimics how the human brain processes data and creates patterns in order to make judgments. It is extremely difficult to classify images of fruits and vegetables with the naked eye. As a result, we’re using pyTorch to do Deep Learning on image datasets. Using these datasets, we’re building a CNN model for picture detection and categorization. For the purposes of this study, a custom CNN is introduced and then compared to a ResNet CNN.12. BACKGROUND AND RELATED WORKConvolutional Neural Networks or Deep Learning architectures were inspired by the human brain and how it processes information. CNNs are a type of Neural Network that excels at image processing, recognition, and categorization. As the title of this article suggests, a CNN model is necessary in this situation.Convolutional Neural Networks are a subset of Deep Learning. The human brain and how it processes information inspired Convolutional Neural Networks. CNNs (Convolutional Neural Networks) are a form of Neural Network that excel at image processing, recognition, and classification. As the title of the article suggests, a CNN model is necessary. CNNs are a sort of artificial neural network that filters data using convolutional layers. To create a changed image, the input data (feature map) is combined with a convolution kernel (filter).The input layer, hidden layers (which can range from 1 to the number required by the application), and output layer are the three basic components of a CNN. The fact that the layers of a CNN are structured in three dimensions distinguishes it from a standard Neural Network (width, height, and depth). Convolution, pooling, normalizing, and completely linked layers make up the hidden layers.ConvolutionReLURecfied linear unitsMaxPoolingKernelsConvolutionReLURecfied linear unitsMaxPoolingKernelsconvolutionalReLURecfied linear unitsMaxPoolingKernelsFully Connected LayersOutputLabelLossFunconBackward PropagaonStage 1Stage 2StagenInput ImageForward PropagaonFig: General Representation of CNN Model.To put it another way, a CNN is a Deep Learning algorithm that can take images as input, check them for patterns or artifacts in a variety of methods, and then output the ability to distinguish one image from another.13. IMPLEMENTATIONThe main purpose of this challenge is to classify fruits and vegetables using CNN and the PyTorch library.The “Fruits 360 Dataset” will be used because the purpose of this Question is to explore image categorization. This dataset, which is available on Kaggle, contains images of fruits and vegetables with the following essential properties:· There are a total of 90483 photographs.· The training set contains 67692 photographs (one fruit or vegetable per image).· In the test set, there are 22688 images (one fruit or vegetable per image).· 103 pictures in the multi-fruits series (each image contains multiple fruits (or fruit classes)) There are a total of 131 classes (fruits and vegetables).· The image size is 100×100 pixels.· Dataset Size: 700 MBGo to the toggling sidebar and look for the add data option to add this dataset to Kaggle. Click it, then search for and add the dataset fruits360.We need a GPU processor to execute our models quickly because our study covers a large dataset. For new users on Kaggle, the first 40 hours are free. In Kaggle’s settings, we must also make sure that the internet is turned on.Now that the data set has been added, we’ll continue on to the programming procedure. To begin, we must load the dataset’s directory paths and ensure that each directory has the same number of classes. To make room, we’ll place all of the classes, as well as images, in each folder in the root directory.When creating certifiable AI models, partitioning the dataset into three parts is extremely simple: The training set is utilized to get the model ready for jobs like digesting the misfortune and modifying the model’s burdens with the inclination drop.Validation set for evaluating the model while it’s being developed, modifying hyperparameters (for example, learning rate), and selecting the best model form.Test set: utilized to compare multiple models or demonstration methods, as well as to report on the model’s most current accuracy.While loading photographs from the training dataset, “Randomized Data Augmentations” will apply transformations at random. Before being flipped horizontally 50% of the time, each image will be paid by 10 pixels. Finally, a random 20-degree rotation will be applied. Because each time a new image is loaded, the alteration is applied at random and dynamically.While running AI models, you’ll be dealing with a lot of data. A computer should be able to manage such data, but computers have limited resources. In order for a machine to process all 67692 photos in this dataset in real time, it would be impossible. Data loaders will be required as a result. Fortunately, PyTorch has them. We’ll need to use CNN to develop a model. It’ll be our own version of CNN. Let’s create an ImageClassificationBase class and an accuracy function before diving into the details of each model.The model’s performance will be evaluated using the accuracy function. Counting the number of labels that were successfully predicted, or the precision of the forecasts, is a natural way to achieve this. Residual Blocks and Batch Normalization will be used to build the architecture of this custom CNN model. This enables for a comparison of the effects of the bespoke CNN and the ResNet model (ResNet stands for residual neural networks, which are pre-trained models in the ImageNet dataset). The original input is added back to the output feature map formed by moving the input through one or more convolutional layers via Residual Block. Batch normalization reduces the size of the convolutional layers’ inputs to the same size, as the name implies. This cuts down on the time it takes to train the neural network.14. RESULTSFollowing the creation of the custom model, we must use data to train the custom CNN model. The ResNet CNN Model, which operates similarly to the custom CNN model, must then be trained. The training results are used to calculate the Learning Rate, Training Loss, Validation Loss, and Validation Accuracy. The accuracy of our models must be more than 90% to be utilized in forecasts.With the validation dataset, you can now use the trained models to generate predictions. The forecasts would be identical because both models achieved greater than 90% accuracy.14.1 Screenshot Of ResultFig 1: Shows the image of Cantaloupe 1 (22)Fig 2: Shows Apple Braeburn (0) Fig 3: Shows the CNN Model Graph for Accuracy vs No. of epochs Fig 4: Graph between Loss vs No. of epochs of CNN ModelFig 5: Graph between Learning Rate vs Batch number of CNN Model Fig (a) Fig (b)Fig 6: (a) and (b) shows the Graph for ResNet CNN Model for Accuracy, Loss and No. of epochsFig 7: Graph between Learning Rate vs Batch no.Fig 8: Shows the image of ResNet CNN Trained Predicted.Fig 9: CNN Trained Model Prediction of Avocado ripe.15. FUTURE WORKSEven when the outcomes are excellent, there are still areas where improvements can be made:· With the ResNet Model, you can reduce training and validation losses.· Reduce the amount of time it takes to train the Custom Model.· To evaluate the Custom Model’s performance, use a different dataset.16.CONCLUSIONIn this work, we show two alternative Convolutional Neural Network (CNN) architectures for image classification. The author’s Custom Model and a ResNet Model available in the PyTorch module.The findings revealed that the Custom Model produced better results than the ResNet Model implemented in the PyTorch module, even when training took longer.The results demonstrate that, despite the greater training time, the Custom Model beat the ResNet Model implemented in the PyTorch module. The Custom Model was 99.21% accurate, whereas the ResNet Model was just 92.45% accurate.Unlike the ResNet Model, the Custom Model was able to reduce training and validation losses.REFERENCES[1] Aguilar, F. (2020, July 19). Fruits, Vegetables and Deep Learning – Level Up Coding. Medium. Sharma, V. “Deep Learning — Introduction to Convolutional Neural Networks”, 2018. […]


    22. […] Convolutional Neural Networks. […]


    23. […] –  While BMs were useful in the past, certain deep learning architectures, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have taken precedence over others because of their […]


    24. Thank you for sharing this helpful information. You may find more information on this topic here Machine Learning Course in Pune


    25. Naina at

      Thank you for sharing this helpful information. You may find more information on this topic here, Sevenmentor is a prestigious education training institute in Pune that offers a high-quality Machine Learning course. This course will teach you the fundamentals of Machine Learning.


    26. […] programs help to analyze images, sequence data and generate new content. These programs are called Convolutional neural networks (CNNs), Recurrent neural networks (RNNs), and Generative adversarial networks (GANs). These networks are […]


    27. Jasmine Rosewell at

      Well said “CNNs have revolutionized the field of computer vision and are essential in applications ranging from image classification and object detection to facial recognition and medical image analysis.”


    Leave a Reply