Transformers are a type of neural network architecture that has gained significant popularity due to their unwavering dedication to achieving optimal results in completing assigned tasks. Deep learning, which is widely recognized as a powerful tool, has significantly transformed the way we operate, proving to be both a lifesaver and a solution to disaster. Big players like OpenAI and DeepMind employ Transformers in their AlphaStar applications. By incorporating attention, the transformer model amplifies the pace of training while preserving precision.

It is ok to say that in certain tasks, the performance of transformers exceeds that of the Google Neural Machine Translation model.

Deep Learning – Introduction to Recurrent Neural Networks

Deep Learning – Deep Convolutional Generative Adversarial Networks Basics

Deep Learning – Backpropagation Algorithm Basics

Introduction to the Transformer

An astonishing neural network model named “Transformer” came to light in 2017 by a Google-led team. The transformer is a deep learning model and was proposed in the paper Attention is All You Need. A TensorFlow implementation of it is available as a part of the Tensor2Tensor package. TNN has proven its effectiveness and worth, especially for natural language processing (NLP) tasks. TNNs are not going to replace RNNs, which were introduced by David Rumelhart in 1986. Unfortunately, RNNs have serious limitations. RNNs are trained on long sequences, so gradients tend to explode out of control or vanish to nothing on some occasions. Long-short-term memory neural networks (LSTM) came to the rescue to solve this shortcoming.

 AILabPage defines Transformer “A special kind of computer software code is called a transformer that possesses the ability to acquire knowledge in self-learning mode. It utilizes a unique approach to attentiveness, focusing on specific segments of the data provided in order to identify the significant elements”.

Transformers are mainly applied in the domains of computer vision and natural language processing.

Thanks to advancements in machine learning algorithms, price and size reductions in storage capacity, more and more computing power at lower costs, and an explosion in data generation of all kinds, More and more new models in deep learning are being introduced at a speed that is difficult to keep track of. The beauty of TNNs is in how they contribute to and add value to neural networks through the staunch use of parallelization.

When machines are able to learn to classify and analyze the data (any kind of data) by themselves, we can safely say “yes.”, We have achieved a small percentage of our deep learning goals. Deep learning-powered tools are able to recognize images that contain dogs, cats, or any other object, and that too without the need to specify what a dog or cat looks like. It’s even going to the next level where it’s able to recognize the species, bio-specifics, and other details in the image like a table, chair, carpet, room size, etc. What kind of details and how to recognize those details, etc., are getting more advanced almost every day. The year 2020 saw many downsides besides COVID (pandemic), like how much hype and how much reality there is in the AI industry. Let’s discuss these in brief below.

Deep Learning and Human Brain

Deep learning is a subfield of the machine learning domain. Deep learning is entirely concerned with algorithms inspired by the structure and function of artificial neural networks, which are inspired by the human brain (inspired only, please). Deep learning is used with too much ease to predict the unpredictable. In our opinion, “we are all so busy creating artificial intelligence by using a combination of non-biological neural networks and natural intelligence rather than exploring what we have in hand.

AILabPage defines Deep learning as “An innovative approach that is undoubtedly remarkable and extremely speedy and relies on three crucial elements: substantial volumes of information, tremendous computational capability, and cutting-edge algorithms and proficiency”. The potential of deep learning seems to have no boundaries.

Deep learning requires a specialist with an extremely complex skillset to achieve far better results from the same data set. It is purely based on the NI (natural intelligence) mechanics of the biological neuron system. It has a complex skill set because of the methods it uses for training, i.e., learning in deep learning is based on “learning data representations” rather than “task-specific algorithms,” which is the case for other methods.

“I think people need to understand that deep learning is making a lot of things, behind the scenes, much-better” – Sir Geoffrey Hinton

Human Brain: It is a special or critical point of discussion for everyone and a puzzling game at all times as well. How our brain is designed and how it functions can’t be covered in this post, as I am nowhere near or even can dream of being close to a neuroscientist. Out of curiosity, I am tempted to compare artificial neural networks with the human brain (with the help of talk shows on such topics).

It’s so fascinating to me to know how the human brain can decode technologies, numbers, puzzles, handle entertainment, understand science, set body modes into pleasure, aggression, art, etc. How does the brain train itself to name a certain object by just looking at 2-3 images when ANNs need millions of those?

What Exactly is a Transformer?

The best deep learning model is helpful for everything from question answering to grammar correction to many more tasks. Transformer is in the same state as convolutional neural networks in 2012, and its architecture is going through transformation (which should be for the better). The good part is that it’s out of incubation already.

Transformers are designed to handle sequential data, unlike recurrent neural networks (RNNs). The good news is that the transformer is much more effective, efficient, and speedy, which reduces training time as it does not require that the sequential data be processed in the same order. Transformers perform their tasks in a high-performance parallelization environment, so their architectures have extremely high resilience. The immediacy of the transformer speaks to the rapid rate of progress in machine learning and artificial intelligence.


In the above example, the Transformer is represented as a black box. An entire Thai sequence is parsed simultaneously in a feed-forward manner, resulting in a transformed output tensor. In the above picture, the output sequence is more concise than the input sequence. For NLP tasks, depending on the input language, word order, spacing (in Thai sentences, there is no spacing), and sentence length may vary substantially.

Now for the below English text: if this needs to be translated to Thai, it would require almost three times more efforts in RNN compared to the transformer.

“Accomplishments are those that get stuck delightfully in your memories and distinguish you from others; remember, it’s not about winning over others all the time or competing with others, as that can lead to fiascos sometimes. It’s all about your own achievements and rewards to yourself for being staunch, tranquil, and harmonious.”

Remember, as mentioned above, unlike other architectures (recurrent neural networks) and LSTMs for NLP, there are no recurrent connections, and thus no real memory of previous states happens in the case of Transformers. Transformers are even smarter, and they easily perceive entire sequences simultaneously.

Recurrent neural networks and Transformers

Unlike recurrent neural networks, the transformers are also designed to handle sequential data, but with a much more efficient and powerful method.

Recurrent neural networks are a linear architectural variant of recursive networks. They have a “memory,” so they differ from other neural networks. This memory remembers all the information about what was calculated in the previous state. It uses the same parameters for each input as it performs the same task on all the inputs or hidden layers to produce the output.

The transformers come in as a panacea for all the issues of #RNN and thus do not require the sequential data to be processed in order. So you really don’t need to worry if you are putting your hands directly into the transformer without other neural networks. Transformers are the latest trendy deep learning (neural networks) model most prominent in machine translation that deals with sequences.

Let’s look at Transformers in Little Depth

We will pick up the same example as above to translate the Thai sentence (the first language), and the machine translation tool will translate it to another language (English).

“ผมต้องการ PS5” (I need ps5)

As per the Transformer architecture, we can magnify a little bit this transformer to see an encoding/decoding component and some connections between the two


  • Encoders with identical structure ( 2 Level: Self-attention –> Feedforward neural network)
  • Decoders with its identical structure( 3 Level : Self-attention –> encoder-decoder-attention –> Feedforward neural network) 

They both create a stack of multiple levels with the same numbers i.e. if encoders have 5-level stakes then the decoder will also have the same.  So it’s simple to understand that the encoder’s input first enters at “Self-Attention” layer and the output from this layer becomes the input for the feedforward layer. In decoders, it’s the same way treatment but with one exception that’s the layer that helps the decoder to focus on relevant parts of the input sentence.

Books Referred & Other material referred

Points to Note:

In Fine-Tuning Language Models from Human Preferences paper by OpenAI, it has been demonstrated how transformer models GPT-2 and GPT-3 can generate extremely humanlike texts. All credits if any remain on the original contributor only. We have covered the Convolutional neural network a kind of machine learning in this post, where we find hidden gems from unlabelled historical data. The last post was on Supervised Machine Learning. The next upcoming post will talk about Deep Reinforcement Learning.

Feedback & Further Question

Do you have any questions about Deep Learning or Machine Learning? Leave a comment or ask your question via email. Will try my best to answer it.


Conclusion- This post was an attempt to explain the main concepts behind the Transformer. Furthermore, the post was an attempt to outline recent key advancements in the technology and provide insight into areas, in which deep learning can improve investigation. CNN is a neural network with some convolutional and some other layers. The convolutional layer has a number of filters that do a convolutional operation.

The process of building a CNN always involves four major steps i.e. Convolution, Pooling, Flattening and Full connection which was covered in detail. Choosing parameters, applying filters with strides, padding if requires. Perform convolution on the image and apply ReLU activation to the matrix. is the main core process  in CNN and if you get this incorrect the whole joy gets over then and there

============================ About the Author =======================

Read about Author atAbout Me

Thank you all, for spending your time reading this post. Please share your opinion / comments / critics / agreements or disagreement. Remark for more details about posts, subjects and relevance please read the disclaimer.

FacebookPage                        ContactMe                          Twitter         ====================================================================

Posted by V Sharma

A Technology Specialist boasting 22+ years of exposure to Fintech, Insuretech, and Investtech with proficiency in Data Science, Advanced Analytics, AI (Machine Learning, Neural Networks, Deep Learning), and Blockchain (Trust Assessment, Tokenization, Digital Assets). Demonstrated effectiveness in Mobile Financial Services (Cross Border Remittances, Mobile Money, Mobile Banking, Payments), IT Service Management, Software Engineering, and Mobile Telecom (Mobile Data, Billing, Prepaid Charging Services). Proven success in launching start-ups and new business units - domestically and internationally - with hands-on exposure to engineering and business strategy. "A fervent Physics enthusiast with a self-proclaimed avocation for photography" in my spare time.

Leave a Reply