Convolutional Neural Networks – CNN is a neural network with some convolutional and other layers. The convolutional layer has several filters that do a convolutional operation. In other words, CNNs are a class of Neural Networks that have proven very effective in areas of image recognition processing, and classification.
Also convolutional neural networks (CNNs) are a distinct multilayer neural network architecture type. They are designed in such a manner that each filter is convolved with the input volume, resulting in an activation map consisting of neurons.
Layers of a Convolutional Neural Network-CNN
CNNs are built by concatenating individual blocks or putting together a series of tasks in an order that achieves different tasks. These blocks or layers make up what is called the convolutional neural network, with their structure, functionality, benefits, and shortcomings. Some of the layers are below in CNNs.
- Convolutional Layer
- Non-Linearity Layer
- Rectification Layer
- Rectified Linear Units (ReLU)
- Pooling Layer
- Fully Connected Layer
In this article, we will explore and discuss our intuitive explanation of the Convolutional Layer only in detail but in simple language.
Convolutional Layer – An Outlook
Convolutional Layer – This is the first layer and one of the main building blocks of convolutional neural networks (CNNs) and is used as the first layer. When a computer looks at a picture, it starts by looking at the individual dots that make up the image. These dots are called pixels. The computer uses these dots to find important parts of the image that it can use to understand what’s going on in the picture.
The unprocessed pixel data is the raw information that the computer starts with before it can start figuring out what’s in the picture. This layer uses small square pieces of information to capture different parts of an image while still keeping track of how the pixels are arranged. Convolutional neural networks use a tool to see what’s in the picture and make a simplified map of the things they find.
Something very interesting happened in 2017 i.e. “Transformers”, no not talking about a new music album by a Japanese music band. They are actually a type of neural network architecture that came into the spotlight ever since then and have been gaining popularity. Transformers solve the problem of parallelization by using Convolutional Neural Networks together with attention models. The idea is to combine them with attention models to boost the speed, i.e. it solves the problem of how fast the model can translate from one sequence to another. Anyways Transformers are not part of this post, will discuss them at length in a later blog post.
In short, this is the most important layer that contains a set of filters whose parameters need to be learned.
It’s simply a mathematical operation (referred to as term convolution) that takes two inputs such as image matrix and a set of filters whose parameters need to be learned. It merges two sets of information.
The CNN Layers
The indispensable feature of convolutional neural networks, also denoted as CNNs, is the convolutional layer, which is commonly perceived as the pivotal and fundamental component that confers CNNs their appellation. The current layer is engaged in conducting a computational procedure commonly known as a “convolution”.
After the successful initialization of all variables and assigning them with random values, the image of a deer is utilized with specific parameters, namely 32 width and 32 height, accompanied by three backup color channels (R, G, and B).
The expansion process is executed through an evaluation of the probability of success associated with various avenues of action.
- Lets assume the output probabilities for image above are [0.2, 0.1, 0.3, 0.4]
- The size of the feature map is controlled by three parameters.
- Depth – Number of filters used for the convolution operation.
- Stride – number of pixels by which filter matrix over the input matrix.
- padding – It’s good to input matrix with zeros around the border, matrix.
- Calculating total error at the output layer with summation over all 4 classes.
- Total Error = ∑ ½ (target probability – output probability) ²
- Computation of output of neurons that are connected to local regions in the input. This may result in volume such as [32x32x16] for 16 filters.
Because of the convolution of neuronal networks, the image is split into perceptrons, creating local receptive fields, and finally compressing the perceptrons into feature maps of size m2×m3m2 × m3. If an input image has three channels, then a filter applied must have three channels as well. In short, a filter must always have the same number of channels as the input, often referred to as “depth“.
Thus, this map stores information about where the feature occurs in the image and how well it corresponds to the filter. Hence, each filter is trained spatially regarding the position in the volume it is applied to. The height and weight of the filters are smaller than those of the input volume.
The output volume of the convolutional layer is obtained by stacking the activation maps of all filters along the depth dimension.
Convolutional Layer – Operation
The convolutional layer, as mentioned above, consists of sets of filters, or kernels. They have the key job of carrying out the convolution operation in the first part of the layer. The filters use a subset of the input data.
The operations performed by this layer are linear multiplications to extract high-level features, such as edges, from the input image as a convolutional operational activity.
Since convolutional operation at this layer is a linear operation and the output volume is obtained by stacking the activation maps of all filters along the depth dimension, Linear operations mostly involve the multiplication of weights, with the input actually being the same as in traditional neural networks.
A filter could be related to anything; for instance, in the below pictures, the objective is to recognize the traffic signal; to achieve the same, one filter could be associated with seeing different objects (signs), and our object filter would give us an indication of how strongly a sign seems to appear in an image.
Another point is to understand how many times and in what locations they occur. This reduces the number of weights that the neural network must learn compared to the usual neural network. This also means that when the location of these features changes, it does not throw the neural network off.
Convolutional Neural Networks Adoption
CNNs have had huge adoption and success within computer vision applications, but mainly with supervised learning as compared with unsupervised learning, which has gotten very little attention. A convolutional layer generally has far fewer weights than a fully connected or dense layer, thus leading to a non-linear activation function.
In summary, the convolutional layer detects a local association of features from the previous layer and helps in mapping their appearance to a feature map.
The layer convolves the input by moving the filters along the input vertically and horizontally, computing the dot product of the weights and the input, and then adding a bias term.
Beyond this layer, the architecture of CNNs gets a little complex, starting from high-level to low-level features. The CNN networks have a complete understanding of the images in the dataset, similar to how a human does.
Dynamic Convolution Layer vs Conventional Convolution Layer
Convolutional networks are not just limited to only one Convolutional Layer. The first layer is responsible for capturing the low-level features such as colour, edges, gradient orientation etc.
- Conventional Convolution Layer – This layer receives a single input which is a feature map and it computes its output by convolving filters across the feature maps from the previous layer.
- Dynamic Convolution Layer – This layer receives two inputs, the loneliest one as a feature map from the previous layer and the second one e a filter.
The network architecture here is quite similar to the network architecture of whole image synthesis.
Books Referred & Other material referred
- Open Internet & AILabPage member’s hands-on lab work.
- LeNet5 documentation.
- MatLab networks documentation
Points to Note:
All credits if any remain on the original contributor only. We have covered the Convolutional neural network a kind of machine learning in this post, where we find hidden gems from unlabelled historical data. The last post was on Supervised Machine Learning. InThe the next upcoming post will talk about Reinforcement machine learning.
Feedback & Further Question
Do you have any questions about Deep Learning or Machine Learning? Leave a comment or ask your question via email. Will try my best to answer it.
Conclusion- Convolutional Neural Networks have a different architecture than regular Neural Networks. The convolutional layer contains a set of filters whose parameters need to be learned. The process of building the Convolutional Neural Networks always involves four major steps i.e. Convolution, Pooling, Flattening and Full connection.
Choosing parameters, applying filters with strides, padding if requires. Perform convolution on the image and apply ReLU activation to the matrix. is the main core process in CNN and if you get this incorrect the whole joy gets over then and there
============================ About the Author =======================
Read about Author at : About Me
Thank you all, for spending your time reading this post. Please share your opinion / comments / critics / agreements or disagreement. Remark for more details about posts, subjects and relevance please read the disclaimer.