Deep Learning

Deep Learning – Introduction to Convolutional Neural Networks

Convolutional neural network – In this article, we will explore our intuitive explanation of convolutional neural networks (CNN’s) on high level. CNN’s are inspired by the structure of the brain but our focus will not be on neural science in here as we do not specialise in any biological aspect. We are going artificial in this post.

Convolutional Neural Networks are a special kind of multi-layer neural networks.

What are Convolutional Neural Networks

Convolutional neural networks (CNN) –  Might look or appears like magic to many but in reality, it’s just a simple science and mathematics only. CNN’s are a class of Neural Networks that have proven very effective in areas of image recognition, processing and classification.

Everything You Need to Know About Convolutional Neural Networks.


As per Wiki – In machine learning, a convolutional neural network (CNN, or ConvNet) is a class of deep, feed-forward artificial neural networks, most commonly applied to analysing visual imagery.

Artificial Intelligence solutions behind CNN’s amazingly transform how businesses and developers create user experiences and solve real-world problems. CNN’s are also known as application of neuroscience to machine learning. They employe mathematical operations known as “Convolution”; which is a specialised kind of linear operation.

Convolutional Neural Networks applications includes high caliber AI systems such as AI based robots, virtual assistants, and self-driving cars. Other common applications are used for

  • Image Processing
    • Recognition
    • Classification
    • Video labelling
    • Text analysis,
  • Speech Recognition
    • Natural language processing
    • Text classification processing


Everything You Need to Know About Convolutional Neural Networks.


Data Processing Convolutional Neural Network

CNN’s have grid topology for processing data. Data points in this are called as grid-like topology as processing of data happens in a spatial correlation between the neighbouring data points.

  • 1D Grid – Time series data – Takes samples at regular time intervals
  • 2D Grid – Image data – Grid of pixels

These neural networks uses convolution method as oppose to general matrix multiplication in at least one of layer. Convolution leverages on

  • Equivariant Representations –  This simply means that if the input changes, the output changes in the same way
  • Sparse Interactions – This allows the network to efficiently describe complicated interactions between many variables from simple building blocks.
  • Parameter sharing – Using the same parameter for more than one function in a model

The above structure is created to improve a machine learning system. CNN’s also allows for working with inputs of variable size and efficiently describe complicated interactions between many variables from simple building blocks.

There are significant limitation of these Neural Networks is their constrains at API level. Input e.g. an image and output e.g. classes of probabilities are both fixed-size vectors. Even the computation through its data models is performed by mapping using fixed number of layers.


Some history around – Convolutional Neural Networks (CNN’s)

ConvNets have been successful in identifying faces, objects and traffic signs apart from powering vision in robots and self driving cars. Yann LeCun was named LeNet5 after many previous successful iterations since the year 1988. LeNet was one of the very first convolutional neural networks which has pushed forward  Deep Learning. 

Everything You Need to Know About Convolutional Neural Networks.

In 2012, Alex Krizhevsky used Convolutional neural network in ImageNet competition and ever since then all big companies are running for this. CNNs are the most influential innovations in computer vision field. in 1990’s LeNet architecture was used mainly for character recognition tasks such as reading zip codes, digits, etc.


Image Processing – Human vs Computers

For humans recognition of objects is the first skill we learn right from birth. New born baby starts recognising faces as Papa, Mumma etc. By the time turn to an adult; recognition becomes effortless and kind of automated process.

Human behaviour for processing image is very different from machines. Humans give label to each image automatically by just looking around and immediately characterise the scene and give each object a label without even consciously noticing.

For computer recognising objects are slightly complex as they see everything as input and output which come as a class or set of classes. This is known as image processing which we will discuss in next section in detail. In computers CNN’s do image recognition, image classifications. It is very useful in object detection, face recognition and successful in various text classification with word embedding tasks etc.

Everything You Need to Know About Convolutional Neural Networks.

So in simple words computer vision is the ability to automatically understand any image or video based on visual elements and patterns.


Inputs and Outputs – How it works

Our focus in this post will be on image processing only

CNN’s require models to train and test. Each input image passes through a series of convolution layers with filters (Kernals), Pooling, fully connected layers (FC) and apply softmax function (Generalisation of the logistic function that “squashes” a K-dimensional vector of arbitrary real values to real values Kd vector) to classify an object with probabilistic values between 0 and 1. This is the reason every image in CNN’s gets represented as a matrix of pixel values.

The Convolutional Neural Network classifies an input image into categories e.g dog, cat, deer, lion or bird.

Everything You Need to Know About Convolutional Neural Networks

the Convolution + Pooling layers act as feature extractors from the input image while fully connected layer acts as a classifier. In above image figure, on receiving a dear image as input, the network correctly assigns the highest probability for it (0.94) among all four categories. The sum of all probabilities in the output layer should be one though. There are four main operations in the ConvNet shown in image above:

  1. Convolution
  2. Non Linearity (ReLU)
  3. Pooling or Sub Sampling
  4. Classification (Fully Connected Layer)

These operations are the basic building blocks of every Convolutional Neural Network, so understanding how these work is an important step to developing a sound understanding of ConvNets or CNN’s.


Convolutional Neural Networks – Architecture

Layers used to build Convolutional Neural Networks as we have mentioned in above picture. Simple ConvNet is a sequence of layers, and every layer of a ConvNet transforms one volume of activations to another through a differentiable function. We use four main types of layers to build our ConvNet architectures above.

Convolutional Layer, ReLU, Pooling, and Fully Connected Layer .



Initialisation of all filters and parameters / weights with random values


Convolution Layer 

This holds raw pixel values of the training image as input. In example above an image (deer) of width 32, height 32, and with three colour channels R,G,B is used. It goes through the forward propagation step and finds the output probabilities for each class. This layer ensures spatial relationship between pixels by learning image features using small squares of input data.

  • Lets assume the output probabilities for image above are [0.2, 0.1, 0.3, 0.4]
  • The size of the feature map is controlled by three parameters.
    • Depth –  Number of filters used for the convolution operation.
    • Stride – Number of pixels by which filter matrix over the input matrix.
    • padding – It’s good to input matrix with zeros around the border, matrix.
    • Calculating total error at the output layer with summation over all 4 classes.
      •  Total Error = ∑  ½ (target probability – output probability) ²
      • Computation of output of neurons that are connected to local regions in the input. This may result in volume such as [32x32x16] for 16 filters.


Rectified Linear Unit (ReLU) Layer

A non-linear operation. This layer applies an element wise activation function. ReLU is used after every Convolution operation. It is applied per pixel and replaces all negative pixel values in the feature map by zero. This leaves the size of the volume unchanged ([32x32x16]). ReLU is a non-linear operation.


Pooling Layer

Also called as subsampling or downsampling. Pool layer do a downsampling operation along the spatial dimensions (width, height), resulting in volume such as [16x16x16] i.e reduces the dimensionality of each feature map but retains the most important information.

  • Max Pooling operation on a Rectified Feature map.

Everything You Need to Know About Convolutional Neural Networks.


Fully Connected Layer

In Fully Connected Layer -each node is connected to every other node in the adjacent layer. FC layer compute the class scores with traditional multi layer perceptron that uses a softmax activation function in the output layer. It result in volume of size [1x1x10], where each of the 10 numbers correspond to a class score, such as among the 10 categories of CIFAR-10.

Everything You Need to Know About Convolutional Neural Networks.

The main job of this layer is basically takes an input volume as is coming as output from conv or ReLU or pool layer proceedings. Arrange the output in N dimensional vector where N is the number of classes that the program has to choose from.


Convolutional Neural Networks – Real life Business Use Cases

Many modern companies are using CNN’s as backbone of their business e.g. Pinterest use it for home feed personalisation and Instagram for search infrastructure. 3 of the biggest users are as below.

Everything You Need to Know About Convolutional Neural Networks

  • Automatic Tagging Algorithms – Tagging, or social bookmarking, refers to the action of associating a relevant keyword or phrase with an entity (e.g. document, image, or video). Our experiment (above) showed us that effective time-frequency representation for automatic tagging and more complex models benefit from more training data.
  • Photo Search  To find images that are similar to the ones user input or on text input is available in google search results. It works well on Chrome app. Google’s algorithms rely on more than 200 unique signals or “clues” that make it possible to guess search. Attributes here are websites, age of content, IP address based region and PageRanks. Sadly this is highly biased based on your colour of skin. You can give it a try through.
  • Product Recommendations – Large scale recommenders systems are in use in almost every e-commerce, retail, video on demand, or music streaming business.  Algorithms in recommenders systems are typically classified into two categories — content based and collaborative filtering methods although modern recommenders combine both approaches.


Books Referred & Other material referred

  • Open Internet & AILabPage members hands on lab work
  • LeNet5  documentation


Points to Note:

All credits if any remains on the original contributor only. We have covered Unsupervised machine learning in this post, where we find hidden gems from unlabelled historical data. Last post was on Supervised Machine Learning. In the next upcoming post will talk about Reinforcement machine learning.


Feedback & Further Question

Do you have any questions about Deep Learning or Machine Learning? Leave a comment or ask your question via email . Will try my best to answer it.


SECaaS - Security as a Service Is the Next Big ThingConclusion- This post was an attempt to explain the main concepts behind Convolutional Neural Networks in simple terms.CNN is a neural network with some convolutional and some other layers. Convolutional layer has a number of filters that does convolutional operation. The process of building a CNNs always involves four major steps i.e Convolution, Pooling, Flattening and Full connection which was covered in details. Choosing parameters, apply filters with strides, padding if requires. Perform convolution on the image and apply ReLU activation to the matrix. is main core process  in CNN and if you get this incorrect the whole joy gets over then and there


============================ About the Author =======================

Read about Author at : About Me

Thank you all, for spending your time reading this post. Please share your opinion / comments / critics / agreements or disagreement. Remark for more details about posts, subjects and relevance please read the disclaimer.

FacebookPage                        ContactMe                          Twitter         ====================================================================

Facebook Comments

11 replies »

  1. Hi,
    This is the latest booming technology to learn more about it your post help me.
    Wonderful illustrated information. I thank you for that. No doubt it will be very useful for my future projects. Would like to see some other posts on the same subject!

    Thank you for sharing…

  2. This is very high level info not much of details to learn.
    Do we loose any information when using a feature detector at Convolution + Pooling layers which act as feature extractors?

  3. I am student and worker at same time and I loved your narrative Convolutional Neural Networks are very similar to ordinary Neural Networks from the previous. Please help to answer in details how the flow to FCL happens, pls let me know bit by bit

  4. Great post! Thanks. so much for the work for people like me really appreciate. I have few questions though if you can answer please

    1 – What makes convolutional filters in the first convolutional layer “unique”?
    2 – Are all 5×5 filters have same behaviour.
    3 – Are they just being passed through different non-linear functions or something?
    4 – Why don’t they produce the same representations?
    5 – What informs such decisions? makes

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.