Reinforcement Learning (RL) – involves a twofold essence, encompassing both a learning predicament and a distinct specialization within the realm of machine learning. As an emerging third pillar, RL is gaining significance for Data Scientists, who now recognize its importance alongside traditional Machine Learning techniques. By combining learning and decision-making, RL equips machines to interact with dynamic environments, learning from experiences and optimizing their actions to achieve desired outcomes.
This powerful paradigm opens new possibilities for creating intelligent systems capable of tackling complex challenges in various fields, reinforcing the importance of RL as a fundamental component of modern machine learning endeavors.
Some Basics – Reinforcement Learning
Reinforcement learning (RL) is more general than supervised or unsupervised learning. It learns from interaction with the environment to achieve a goal or simply from rewards and punishments. In other words, algorithms learn to react to the environment. RL’s TD-learning seems to be closest to how humans learn in this type of situation, but Q-learning and others also have their own advantages.
Reinforcement learning can be referred to as a learning problem and a subfield of machine learning at the same time. As a learning problem, it refers to learning to control a system so as to maximize some numerical value, which represents a long-term objective.
Although RL has been around for many years as the third pillar of machine learning, it is now becoming increasingly important for data scientists to know when and how to implement it. RL’s increasing importance and focus as an equally important player with the other two machine learning types reflects its rising importance in AI.
RL has some goals mentioned below.
- Decision Process
- Reward/Penalty System
- Recommendation System
In AILabPage terms “reinforcement learning” is the process of getting mature or attaining maturity in anything or everything we do“. The corrections we make while learning and working give us a far greater sense of accomplishment. For example, when you learn to drive a bicycle, The reward comes from maintaining balance, and the punishment comes when you lose it. Similarly, algorithms in reinforcement learning get adjusted to learning by making adjustments either in a negative or positive manner.
What is Reinforcement Learning
Before we get into the what and why of RL, let’s find out some history of RL and how it originated. From the best research, I got the answer as it was termed in the 1980s while some research was conducted on animal behavior. Especially how some newborn baby animals learn to stand, run, and survive in a given environment. Rewards are a means of survival from learning, and punishment can be compared with being eaten by others.
RL compared with a scenario like “how some new born baby animals learns to stand, run, and survive in the given environment.”
Reinforcement learning can be understood by using the concepts of agents, environments, states, actions, and rewards. This is an area of machine learning where there’s no answer key, but the RL agent still has to decide how to act to perform its task. The agent is inspired by behaviorist psychology, which decides how and what actions will be taken in an environment to maximize some notion of cumulative reward. In the absence of existing training data, the agent learns from experience. It collects training examples.
- this action was good
- that action was bad
We can’t learn to drive via reinforcement learning in the real world; failure cannot be tolerated. This is impossible when safety is a concern.
The learner is not told explicitly which action to take but is expected to discover which action yields the most lucrative result in the form of a reward and try the method. Typically, an RL setup is composed of two components: an agent and an environment.
Reinforcement Learning Algorithms
Although the number of RL algorithms doesn’t seem to be an easy thing to know, there are a great number of RL algorithms. It’s not even an easy or thinkable task to have a comprehensive comparison between each of them. Below is a one- or two-line description of some of the widely used RL algorithms. Please note that these will be described in full chapters with calculations, codes, and examples in subsequent posts.
- Q-Learning is a model-free RL algorithm based on the well-known Bellman Equation. This learning is off-policy. In Q-learning, such a policy is called the greedy policy. Q learning is one form of reinforcement learning in which the agent learns an evaluation function over states and actions.
- Policy Iteration
- Value Iteration
- State-Action-Reward-State-Action (SARSA) – Almost a replica or resembles with Q-learning. SARSA is an on-policy algorithm and that could be the only difference. here.
- Deep Q Network (DQN)- DQN’s main ability is to estimate value for unseen states, which is missing in the Q-learning agent. DQN gets rid of the q-learning two-dimensional array by introducing neural network techniques.
- Deep Deterministic Policy Gradient (DDPG): To get rid of action space that is too large, DQN gets refined somehow and called DDPG.
Some Extra Complex Algorithms
- Trust Region Policy Optimization (TRPO) – It has consistent high performance but the computation and implementation this is extremely complicated.
- Proximal Policy Optimization (PPO, OpenAI version) – PPO proposes a clipped surrogate objective function.
- Back Propagation – I am keeping it separate from DQN. The reinforcement learning uses machine learning, including a neural network, to make a model. Since it may use a neural network, the backdrop may be used in reinforcement learning.
Comparison of Discussed Algorithms
Which specific RL algorithm to use when deciding which algorithms to apply to a specific task is quite a head-spinning experience. I am attempting to provide an introduction to some of the well-known algorithms here.
Reinforcement Learning Process Flow
Reinforcement learning is most useful when there is no supervised learning set but there are reinforcement signals. Learning comes from interactions, which are highly influenced by goals. An action is evaluated and gets rewarded or punished. Most of the RL algorithms follow this pattern. In the following paragraphs, I will briefly talk about some terms used in RL to facilitate our discussion in the next section.
- Action (A): All the possible moves that the agent can make
- State (S): Current situation returned by the environment.
- Reward (R): Immediate return sends back from the environment to evaluate the last action.
- Policy (π): Agents strategy to determine the next action based on the current state.
- Value (V): Expected long-term return with discount, as opposed to the short-term.
- Q-value or action-value (Q): Q-value is similar to Value, except that it takes an extra parameter, the current action “a”.
Vπ(s) is defined as the expected long-term return of the current state “s” under policy π. Qπ(s, a) refers to the long-term return of the current state “s”, taking action “a” under policy π.
Reinforcement Learning vs Supervised Learning vs Unsupervised Learning
Reinforcement learning addresses a very broad and relevant question; How can we learn to survive in our environment?
- There are many extensions to speed up learning.
- There have been many successful real-world applications.
Semi-supervised learning, which is essentially a combination of supervised and unsupervised learning can also be compared with RL. It differs from reinforcement learning as it has direct mapping whereas reinforcement does not.
Reinforcement Learning – When to use
The answer to the question above is not simple (trust me, though it could be purely my own opinion). Which kind of ML algorithm should use does not depend as much on your problem than on your dataset.
Real life business use cases for Reinforcement Learning
Some major domains where RL has been applied are as follows:
- Robotics- Robot uses deep reinforcement learning to pick a device from one box and putting it in a container. Whether it succeeds or fails, it memorizes the object, gains knowledge, train’s itself to do the job with great speed and precision.
- FinTech – Leveraging reinforcement learning for evaluating trading strategies can be a good strategy along with supervised learning. It can turn out to be a robust tool for training systems to optimize financial objectives.
- Game Theory and Multi-Agent Interaction – Reinforcement learning and games have a long and mutually beneficial common history. The best example is alpha go or chess.
There are a lot of other industries and areas where this set of skills is in use and changing the game, like computer networking, inventory management, vehicle navigation, and many more. Markov decision processes (MDP) solve the partially observable problem, and POMDPs have received a lot of attention in the reinforcement learning community. There are so many things unexplored, and with the current craze of data science and machine learning, applied reinforcement learning is certainly a breakthrough.
Points to Note:
All credits, if any, remain with the original contributor only. We have covered reinforcement machine learning in this post, where we reward and punish algorithms for predictions and controls. The technique for data used here is either very little or waiting for the data. The last posts on supervised machine learning and unsupervised machine learning got some decent feedback, and I would love to hear some feedback here also. Our next post will talk about reinforcement learning and Markov decision processes.
last post in this subseries “Machine Learning Type” under the master series “Machine Learning Explained“. The next subseries, “Machine Learning Algorithms Demystified,” is coming up. This post talks only about reinforcement machine learning. The previous posts on supervised learning and unsupervised learning are available.
Books + Other readings Referred
- Research through the open internet, news portals, white papers, and imparted knowledge via live conferences and lectures.
- Lab and hands-on experience of @AILabPage (Self-taught learners group) members.
- Machine Learning – An Introduction
- Reinforcement Machine Learning – An Introduction
- Asynchronous Methods for Deep Reinforcement M Learning
- Data-efficient Deep Reinforcement M Learning for Dexterous Manipulation
Feedback & Further Question
Do you have any questions about Reinforcement Learning or Machine Learning? Leave a comment or ask your question via email. Will try my best to answer it.
Conclusion: Reinforcement learning addresses the problem of learning control strategies for autonomous agents with little or no data. RL algorithms are powerful in machine learning as collecting and labelling a large set of sample patterns costs more than the data itself.
RL learns itself continuously, so it continually gets better at doing the task at hand. Learning chess can be a tedious task under supervised learning, but RL works swiftly for the same task. The trial-and-error method, as it attempts its task with the goal of maximizing long-term reward, can show better results here. Reinforcement learning is closely related to dynamic programming approaches to Markov decision processes (MDP).
============================ About the Author =======================
Read about Author at : About Me
Thank you all, for spending your time reading this post. Please share your feedback / comments / critics / agreements or disagreement. Remark for more details about posts, subjects and relevance please read the disclaimer.