Convolutional Neural Networks – CNN a neural network with some convolutional and other layers. The convolutional layer has a number of filters that do a convolutional operation. In other words, CNN’s are a class of Neural Networks that have proven very effective in areas of image recognition processing, and classification.
Also, CNNs are a special kind of multi-layer neural networks where each filter is convolved with the input volume to compute an activation map made of neurons.
Layers of a Convolutional Neural Network-CNN
CNNs are built by concatenating individual blocks or putting together a series of tasks in an order that achieves different tasks.
These blocks or layers with its structure, functionality, benefits and shortcoming actually make what is called as the convolutional neural network. Some of the layers are below in CNNs are as below.
- Convolutional Layer
- Non-Linearity Layer
- Rectification Layer
- Rectified Linear Units (ReLU)
- Pooling Layer
- Fully Connected Layer
In this article, we will explore and discuss our intuitive explanation of Convolutional Layer only in details but in a simple language.
Convolutional Layer – An Outlook
Convolutional Layer – This is the first layer and one of the main building blocks of a Convolutional Neural Networks (CNNs). They hold the raw pixel values of the training image as input i.e. extract features from it. This layer ensures the spatial relationship between pixels by learning image features using small squares of input data.
Something very interesting happened in 2017 i.e. “Transformers”, no not talking about new music album by Japanese music band. They are actually a type of neural network architecture that came into spotlight ever since then and have been gaining popularity. Transformers solves the problem of parallelization by using Convolutional Neural Networks together with attention models. The idea to combine them with attention models to boosts the speed, i.e. it solve the problem of how fast the model can translate from one sequence to another. Anyways Transformers are not part of this post, will discuss them at length in later blog post.
In short, this is the most important layer that contains a set of filters whose parameters need to be learned.
It’s simply a mathematical operation (referred to as term convolution) that takes two inputs such as image matrix and a set of filters whose parameters need to be learned. It merges two sets of information.
The CNN Layers
The convolutional layer can be called as main or epicentre of convolutional neural network and CNNs borrow its name from here. This layer performs an operation called a “convolution“.
After initialisation of all filters and parameters/weights with random values, the picture below (deer) with parameters with width 32, height 32, and with three colour channels R, G, B is used. It goes through the forward propagation step and finds the output probabilities for each class.
- Lets assume the output probabilities for image above are [0.2, 0.1, 0.3, 0.4]
- The size of the feature map is controlled by three parameters.
- Depth – Number of filters used for the convolution operation.
- Stride – number of pixels by which filter matrix over the input matrix.
- padding – It’s good to input matrix with zeros around the border, matrix.
- Calculating total error at the output layer with summation over all 4 classes.
- Total Error = ∑ ½ (target probability – output probability) ²
- Computation of output of neurons that are connected to local regions in the input. This may result in volume such as [32x32x16] for 16 filters.
Because of convolution of neuronal networks, the image is split into perceptrons, creating local receptive fields and finally compressing the perceptrons in feature maps of size . For an input image 3 channels, then a filter applied must have 3 channels as well. In short, A filter must always have the same number of channels as the input, often referred to as “depth“.
Thus, this map stores the information where the feature occurs in the image and how well it corresponds to the filter. Hence, each filter is trained spatial in regard to the position in the volume it is applied to. The height and weight of the filters are smaller than those of the input volume.
The output volume of the convolutional layer is obtained by stacking the activation maps of all filters along the depth dimension.
Convolutional Layer – Operation
Convolutional layer, as mentioned above this layer consist of sets of Filters or Kernel. They have a key job of carrying out the convolution operation in the first part of the layer. The filters take a subset of the input data.
The operations performed by this layer are linear multiplications with the objective of extract the high-level features such as edges, from the input image as a convolution operational activity.
Since convolutional operation at this layer is a linear operation and the output volume is obtained by stacking the activation maps of all filters along the depth dimension. Linear operation mostly involves the multiplication of weights with the input actually same as in traditional neural network.
A filter could be related to anything, for an instance, in the below pictures the objective is to recognise the traffic signal, in order to achieve the same one filter could be associated with seeing different objects(signs), and our object filter would give us an indication of how strongly a sign seems to appear in an image.
Another point to understand how many times and in what locations they occur. This reduces the number of weights that the neural network must learn compared to the usual neural network. This also means that when the location of these features changes it does not throw the neural network off.
Convolutional Neural Networks Adoption
CNNs got huge adoption and success within computer vision applications but mainly it is with supervised learning as compare with unsupervised learning which has got very little attention. Convolutional layer generally has far fewer weights than in a fully connected/dense layer thus followed by a non-linear activation function.
In a summary convolutional layer detects a local association of features from the previous layer and helps in mapping their appearance to a feature map.
The layer convolves the input by moving the filters along the input vertically and horizontally and computing the dot product of the weights and the input and then adding a bias term.
Beyond this layer, the architecture of CNNs gets little complex though starting from high-level to low-level features. The CNN networks have a complete understanding of images in the dataset, similar to how a human does.
Dynamic Convolution Layer vs Conventional Convolution Layer
Convolutional networks are not just limited to only one Convolutional Layer. The first layer is responsible for capturing the low-level features such as colour, edges, gradient orientation etc.
- Conventional Convolution Layer – This layer receives a single input which is a feature map and it computes its output by convolving filters across the feature maps from the previous layer.
- Dynamic Convolution Layer – This layer receives two input, first one as a feature map from the previous layer and second one a filter.
The network architecture here is quite similar to the network architecture of whole image synthesis.
Books Referred & Other material referred
- Open Internet & AILabPage member’s hands-on lab work.
- LeNet5 documentation.
- MatLab networks documentation
Points to Note:
All credits if any remains on the original contributor only. We have covered the Convolutional neural network a kind of machine learning in this post, where we find hidden gems from unlabelled historical data. The last post was on Supervised Machine Learning. In the next upcoming post will talk about Reinforcement machine learning.
Feedback & Further Question
Do you have any questions about Deep Learning or Machine Learning? Leave a comment or ask your question via email. Will try my best to answer it.
Conclusion- Convolutional Neural Networks have a different architecture than regular Neural Networks. The convolutional layer contains a set of filters whose parameters need to be learned. The process of building the Convolutional Neural Networks always involves four major steps i.e Convolution, Pooling, Flattening and Full connection.
Choosing parameters, apply filters with strides, padding if requires. Perform convolution on the image and apply ReLU activation to the matrix. is the main core process in CNN and if you get this incorrect the whole joy gets over then and there
============================ About the Author =======================
Read about Author at : About Me
Thank you all, for spending your time reading this post. Please share your opinion / comments / critics / agreements or disagreement. Remark for more details about posts, subjects and relevance please read the disclaimer.