Reinforcement Learning (RL) – A more general form of machine learning than supervised learning or unsupervised learning. It learn from interaction with environment to achieve a goal or simply learns from reward and punishments. This learning is inspired by behaviourism phycology.
Reinforcement Learning (RL) – History
From the best research I got the answer as it got termed in 1980’s while some research study was conducted on animals behaviour. Especially how some new-born baby animals learns to stand, run, and survive in the given environment. Rewards is a survival from learning and punishment can be compared with being eaten by others.
Reinforcement learning (RL) can be understood by using the concepts of agents, environments, states, actions and rewards. It collects the training examples
- this action was good
- that action was bad
Learner is not told explicitly about which action to take but expected to discover which action yields the most lucrative result in form of reward by hit and try method. Typically, a RL setup is composed of two components, an agent and an environment. In other words algorithms learns to react to the environment. TD-learning seems to be closest to how humans learn in this type of situation, but Q-learning and others also have their own advantages.
Formulating- Reinforcement Learning (RL) Problem
Reinforcement Learning – Enables an agent (Including human) to learn in an interactive environment by trial and error using feedback from its own actions and experiences. Key terms to formulate reinforcement learning (rl) as as follows.
- Environment: Physical environment in which the agent (any agent) operates.
- Action (A): All the possible moves that the agent can make
- State (S): Current situation returned by the environment or State / situation in which agent is operating currently.
- Reward (R): Immediate return send back from environment to evaluate last action. Feedback from the environment on the work done.
- Policy (π): Agents strategy to determine next action based on the current state. Its a method to map agent’s state to actions.
- Value (V): Expected long-term return with discount, as opposed to the short-term. Future reward that an agent would receive by taking an action in a particular state / states.
- Q-value or action-value (Q): Q-value is similar to Value, except that it takes an extra parameter, the current action a.
Vπ(s) is defined as the expected long-term return of the current state s under policy π. Qπ(s, a) refers to the long-term return of the current state s, taking action a under policy π.
Learning style here is more concerned with how agent ought to take actions. It learn from interaction with environment to meet a goal or simply learns from reward and punishments.
Demystifying – Reinforcement Learning (RL)
Reinforcement Learning (RL) algorithms learns to react to the environment. TD-learning and Q-Learning are two of the best algorithms in this learning. I remember reading a book on reinforcement learning some years back with focus on “Intelligent Machines”. Three methods of 3 methods of reinforced learning was discussed at greater level as below.
- Q-Learning – This is commonly used in model free approach. The value update rule is the core of the Q-learning algorithm. Q-learning policies are is greedy. Q and TD are related, but not the same. Q learning is one form of reinforcement learning in which the agent learns an evaluation function over states and actions from policy and value iteration
- Temporal Difference Learning – TD-learning seems to be closest to how humans learn in this type of situation.
- Model-Based – best when MDP can’t be learned.
Reinforcement Learning (RL) is around for many years as the third pillar for Machine Learning. It is now becoming increasingly important for Data Scientist to know when and how to carry out. RL has some goals such as decision process, reward/penalty systems and recommendation systems.
Reinforcement Learning Algorithms
In below picture you will find one/two liner description for widely used RL algorithms. Please note these will be described in full chapters with calculation and examples in later posts. How all limitations will react when the same will be performed on quantum computers? It will be amazing and astonishing to see how each model will be created and parameterized.
In this article we have overviews the major algorithms in reinforcement learning. Each algorithm will be explained in detail in upcoming posts with formula, graphics, python code and live examples.
Some extra complex algorithms
- Trust Region Policy Optimization (TRPO) – It has consistent high performance but the computation and implementation this is extremely complicated.
- Proximal Policy Optimization (PPO, OpenAI version) – PPO proposes a clipped surrogate objective function.
- Back Propagation – I am keeping it separate from DQN. The reinforcement learning uses machine learning, including neural network, to make a model. Since it may use neural network, the backdrop may be used in the reinforcement learning.
Markov decision processes(MDP)
Reinforcement learning is closely related to dynamic programming approaches to Markov decision processes(MDP). MDP solve partially observable problem. POMDPs received a lot of attention in the reinforcement learning community. There are so many things unexplored and with the current craze of data science and machine learnings applied reinforcement learning, is certainly a breakthroughs.
Reinforcement Learning vs Supervised Learning vs Unsupervised Learning
Reinforcement learning addresses a very broad and relevant question; How can we learn to survive in our environment?
- There are many extensions to speed up learning.
- There have been many successful real world applications.
Semi-supervised learning, which is essentially a combination of supervised and unsupervised learning can also be compared with RL. It differs from reinforcement learning as it has direct mapping whereas reinforcement does not.
When to use Reinforcement Learning
Answer to the question above is not simple (trust me, though it could be purely my own opinion). Which kind of ML algorithm should use does not depend as much on your problem than on your dataset.
Real life business use cases for Reinforcement Learning
Some major domains where RL has been applied are as follows:
- Robotics- Robot uses deep reinforcement learning to pick a device from one box and putting it in a container. Whether it succeeds or fails, it memorizes the object, gains knowledge, train’s itself to do the job with great speed and precision.
- FinTech – Leveraging reinforcement learning for evaluating trading strategies can be the good strategy along with supervised learning. It can turn out to be a robust tool for training systems to optimize financial objectives.
- Game Theory and Multi-Agent Interaction – Reinforcement learning and games have a long and mutually beneficial common history. Best example is alpha go or chess.
There are lot of other industries and areas where this set of learning is in use and changing the game like Computer Networking, Inventory Management, Vehicle Navigation and many many more.
Points to Note:
All credits if any remains on the original contributor only. We have covered reinforcement machine learning in this post, where we reward and punish algorithms for predictions and controls. Data used here is either very less or waiting for the data. Last posts on Supervised Machine Learning and Unsupervised Machine Learning got some decent feedbacks . Our next post will talk about Reinforcement Learning — Markov Decision Processes
Books & Other Material Referred
- Open Internet & AILabPage (group of self-taught engineers) members hands-on lab work.
- Reinforcement Machine Learning – An Introduction
Feedback & Further Question
Do you have any questions about Deep Learning or Machine Learning? Leave a comment or ask your question via email. Will try my best to answer it.
Conclusion – Reinforcement Learning addresses the problem of learning control strategies for autonomous agents with least or no data. RL algorithms are powerful in machine learning as collecting and labelling a large set of sample patterns cost more than data itself. Learning chess game can be a tedious task under supervised learning but RL works swiftly for same task. Trial-and-error method as it attempts its task, with the goal of maximizing long-term reward can show better results here.
============================ About the Author =======================
Read about Author at : About Me
Thank you all, for spending your time reading this post. Please share your feedback / comments / critics / agreements or disagreement. Remark for more details about posts, subjects and relevance please read the disclaimer.
Categories: Machine Learning