Recurrent Neural Networks – Main use of RNNs are when using google or facebook these interfaces are able to predict next word what you are about to type. RNNs have loops to allow information to persist. RNN’s are considered to be fairly good for modeling sequence data. Recurrent neural networks are linear architectural variant of recursive networks.
This post is a high level over view for creating basic understanding. Don’t expect too much if you are PHD or master degree student. We will only focus on the intuition behind RNNs instead. This post will give you little comfort to start digging deeper in RNNs.
Artificial Neural Networks – What Is It
In 1943, McCulloch and Pitts designed the first neural network. Artificial neural networks are modelled on simplified version of human brain neurons.
As per wiki “Recurrent neural network is a class of artificial neural network where connections between nodes form a directed graph along a sequence.” This allows it to exhibit temporal dynamic behavior for a time sequence.
There are several kinds of Neural Networks in deep learning.
- Multi-Layer Perceptron
- Radial Basis Network
- Recurrent Neural Networks
- Generative Adversarial Networks
- Convolutional Neural Networks.
As per AILabPage – “Artificial neural networks (ANNs) are biologically inspired computing code written with number of simple, highly interconnected processing elements for simulating human brain workings to process information model”.
Point to note artificial neural networks are way different then computer program though so please don’t get wrong perception from above definition. Neural networks consist of input and output layers and at-least one hidden layer.
Training neural networks can be hard, complex and time consuming. The reasons are simply known to data scientists. One of the major reason for this hardship is “Weights”. In Neural networks weights are highly interdependent with hidden layers. The three main steps to train neural networks are
- Forward pass and makes a prediction.
- Compare prediction to the ground truth using a loss function.
- Error value to do back propagation
Algorithm to train ANNs depends to two basic concepts first to reduce the sum squared error to an acceptable value and second to have reliable data to train network under supervision.
Recurrent Neural Networks- Introduction
Recurrent neural networks are not too old neural network, they were developed in the 1980s.
- RNNs takes input as time series and provide output as time series,
- They have at least one connection cycle.
One of the biggest uniqueness RNNs have is “UAP- Universal Approximation Property” thus they can approximate virtually any dynamical system. This unique property force us to say recurrent neural networks have something magical about them.
There is a strong perception about recurrent neural networks training part. The training is assumed as super complex, difficult, expensive and time consuming. Matter of fact after few times being hands on in our lab; our response is just opposite. So common wisdom is completely opposite then reality. The power of robustness and scalability of RNNs is super exiting compared to traditional neural networks and even convolutional neural networks.
Recurrent Neural Networks are way special as compared to other neural networks. Non RNNs API have too much constrains and limitations (Some time RNNs also does same though). Non RNNs API take
- Input – Fixed size vector : For example an “image” or a “character”
- Output – Fixed size vector : Probability matrix
- Size of Neuron – Fixed number of layers / computational steps
We need to answer “What kind problems can be solved with “Recurrent Neural Networks”?, before we do any deeper in this.
Real life examples – Recurrent Neural Networks
When we deal with RNNs they shows excellent and dynamic ability to deal with various input and output types. Before we go deeper lets have below real life examples.
- Varying Inputs & Fixed Outputs – Speech, text recognition & sentiment classification – In todays time this can be biggest relief for bomb like social media to kick out negative comments. People who like to give only negative comments for anything and everything rather then helping as they have one motive PHD (Pull him/her down) some ones efforts. Classifying tweets/fb comments into positive and negative sentiment becomes easy here. Inputs with varying lengths, while output is of a fixed length.
- Fixed Inputs & Varying Outputs – Image recognition (Captioning) – This is to describe the content in an image. Images as a single input but caption can be series or sequence of words as output. Kid riding bike, children playing park, young girls paying football or two girls dancing etc.
- Varying Inputs & Varying Outputs – Machine Translation – Language translation : Translating one language to another can be tedious task for humans if done word by word from dictionary but thanks for this mazing tool from google online translation to full text. This tool is so powerful which takes care of sentiments in each language, length and meanings with context. This is the case of varying inputs as well as varying outputs.
As seen above cases RNNs are used for mapping inputs to outputs of varying types, lengths. The underlying foundation for RNNs are always generalised in their application
Recurrent Neural Networks & Sequence Data
As we know by now that RNN’s are considered to be fairly good for modeling sequence data. Lets understand sequential data a bit while playing cricket we predict and run in the direction where ball moves. This means recurrent networks, takes current input example they see and also what they have perceived previously in time.
This happens with no guess or any calculation because our brain is programmed so well that we don’t even realise why we run in balls direction.
Now if we look at recording of ball movement later we will have enough data to understand and match our action. So this is a sequence, a particular order in which one thing follows another. With this information, we can now see that the ball is moving to the right. Sequence data can be obtained from
- Audio files – This is considered as natural sequence. Audio files clips can be broken down in audio spectrogram and fed that into RNN’s.
- Text file – Text is another form of sequence, text data can be broken into characters or words (remember search engines guessing your next word or character)
Can we know comfortably say that RNN’s are good at processing sequence data for predictions on basis of our examples above. RNNs are getting more attraction and popularity for one core reason that hey allow us to operate over sequences of vectors for input and output but not just fixed size vectors. On down side RNNs suffers from short-term memory.
Use cases – Recurrent Neural Networks
Lets understand some of the use cases of recurrent neural networks. There are numerous exciting applications got lot more easy, advanced and fun filled because of RNNs. Some of the them are as below.
- Music synthesis
- Speech, text recognition & sentiment classification
- Image recognition (captioning)
- Machine Translation – Language translation
- Chatbots & NLP
- Stock predictions
To understand how to build and train Recurrent Neural Networks (RNNs), and commonly-used variants such as GRUs and LSTMs.
There are lot of free/paid courses available on internet. At AILabPage we also conduct hands on class room trainings in our labs to train deep learning enthusiast. These course can help you to solve natural language problems, including text synthesis. Ultimately you will have the opportunity to build a deep learning projects with cutting-edge, industry-relevant content.
RNNs model has been proven to perform extremely well on temporal data. It has several variants including LSTMs (long short-term memory), GRUs (gated recurrent unit) and Bidirectional RNNs. Building models for natural language, audio files, and other sequence data got lot easier with sequence algorithms.
Vanishing and Exploding Gradient Problem
Deep neural network has major issue around gradient as its very unstable. Due to its unstable nature it tends to either explode or vanish in earlier layers quickly. Vanishing gradient problem emerged in 1990’s as a major obstacle for RNNs performance. In this problem adjusting weights to decrease errors “synch problem” lead network to ceases to learn at early stage it self.
This problem sent major set back for RNNs to get popularity. Values in RNNs can can explode or vanish due to simple reason of remembering previous values. Previous values are good enough to confuse them and cause current values to keep increasing or decreasing & take-over algorithm. Indefinite loops gets formed that brings whole network halt.
For example neurons might get stuck in loop where it keeps multiplying previous number to new number which can go to infinity if all numbers are more than one or gets stuck at zero if any number is zero.
Not Covered here
Topics we have not covered in this post but are extremely critical and important to understand in-order to get little more strong hands on RNNs are as below.
- Sequential Memory
- Back propagation in a Recurrent Neural Network(BPTT)
- LSTM’s and GRU’s
Points to Note:
All credits if any remains on the original contributor only. We have covered all basics around Recurrent Neural Network. RNNs are all about modeling units in sequence. The perfect support for Natural Language Processing – NLP tasks. Though often such tasks struggle to find best companion between CNNs and RNNs algorithms to look for information.
Books + Other readings Referred
- Research through open internet, news portals, white papers and imparted knowledge via live conferences & lectures.
- Lab and hands on experience of @AILabPage (Self taught learners group) members.
- This useful pdf on NLP parsing with Recursive NN.
- Amazing information in this pdf as well.
Feedback & Further Question
Do you have any questions about Deep Learning or Machine Learning? Leave a comment or ask your question via email . Will try my best to answer it.
Conclusion – I particularly think that getting to know the types of machine learning algorithm actually helps to see somewhat clear picture. The answer to the question “What machine learning algorithm should I use?” is always “It depends.” It depends on the size, quality, and nature of the data. Also what is the objective / motive data torturing. As more we torture data more useful information comes out. It depends on how the math of the algorithm was translated into instructions for the computer you are using. And it depends on how much time you have. To us at AILabPage we say machine learning is crystal clear and ice cream eating job. It is not only for PhDs aspirants but its for you, us and every one.
======================= About the Author =======================
Read about Author at : About Me
Thank you all, for spending your time reading this post. Please share your opinion / comments / critics / agreements or disagreement. Remark for more details about posts, subjects and relevance please read the disclaimer.
Categories: Neural Networks