Reinforcement Learning

Prerna Aditi
5 min readJun 29, 2018


Reinforcement learning is an approach of machine learning that is used for the purpose of target-directed learning and decision-making. It is inspired by the behavioral psychology. In this approach, machine learns from direct interaction with its environment without depending on the some predefined labeled dataset. The objective behind the Reinforcement Learning is- a software agent or machine could learn from environment by interacting with it and receiving rewards for performing actions. In this, the machine or a software agent usually determines the ultimate behavior within a specific context for maximizing the performance. This agent can be a self-driving car or an application playing chess. As said above, it (agent) interacts with its environment, receives a reward on the basis of how it acts upon, like: driving safely to the destination or winning a game. When performed incorrectly, such as going off the road or being checkmated, the agent then receives a penalty. The agent makes decisions in a way to make the most of its reward and to lessen the penalties through dynamic programming. This approach has advantage in artificial intelligence- any AI program can learn without the help of programmer instructing the agent about the actions to be taken.

image source:

As we know, we can learn from interacting with the environment through our usual experiences. Imagine there is a child in a room. The child sees a burning candle and the child approaches it. Upon approaching the candle the child feels warmth of the candle and so he/she feels good. The child understands that fire is a positive object. But, when the child touches fire he/she burns his/her hand. So now, the child learns that fire is positive only when it is at some distance, and gives warmth. But when it is too close it can cause burn.

This is how we humans learn through interaction with the environment. Reinforcement Learning is just a computational approach for acquiring knowledge through performing action.

Why do we need Reinforcement Learning?

· Less human effort: Since, the learner (machine) is not told about what actions should be taken, but instead, it has to determine on its own which action yields large amount of rewards, by attempting the actions. Therefore, there is little need for a human expert.

· No need of complex rules: A smaller amount of time will be used for designing a solution because there’s no need for making complex rules as we do with the Expert Systems.

· Through the Reinforcement Learning our machine or software agent is able to learn by interacting and on the basis of reaction from the situation. Now this learning is once and for all and can also keep on adapting from the passing time.

· It helps in increasing the efficiency of a tool or program.

image source:

There are some challenging and interesting occurrences, where some actions affects not only the instant reward but also the upcoming condition and through this all later rewards. These two characteristics: trial-and-error search and delayed reward are the distinguishing features of Reinforcement Learning.

As said above, in reinforcement learning, the software agent has to select an action that will maximize the reward for long term. In practice, this is done by learning to estimate the value of a state. This estimate is regulated by propagating the reward of the next state. When all the states and all actions are tried for enough amount of time, then an optimal policy is defined; the action which maximizes the next state’s value is chosen.

image source:

Reinforcement learning is basically the interaction between two components- the environment and the agent (the one who learns). The learning agent has two mechanisms:

1. Exploration: When learning agent acts on trial and error basis then it is known as exploration.

2. Exploitation: When it acts according to the knowledge it has gained from the environment then it is termed as exploitation.

The game of chess is one of the best ways to understand reinforcement learning. In the game of chess the machine takes a decision by making a move. Then it comes to know if the move was appropriate or not. And then on this basis the reward is given. If the move is good then suppose +5 points is rewarded. And if the move is not good then suppose -5 is rewarded. Now, the machine will learn which move is good and will make moves accordingly. This is how reinforcement learning works.

The mathematical framework for defining a solution in reinforcement learning scenario is called Markov Decision Process. This has:

  • Set of states, S
  • Set of actions, A
  • Reward function, R
  • Policy, π
  • Value, V

We will take an action (A) to transition from start state to the end state (S). In return we get rewards (R) for each action. These actions can lead to a positive or negative reward. The set of actions we take defines our policy (π) and the rewards we gain, defines our value (V). We need to maximize the rewards so that:

for all possible values of S for a time t.

Applications of Reinforcement Learning:

1. PC Games: Reinforcement learning is widely being used in PC games like Assasin’s Creed, Chess, etc. where in the enemies change their moves and approach based on your performance.

2. Robotics: Most of the robots that you see in the present world are running on Reinforcement Learning.

3. AlphaGO: Go is a Chinese board game which is said to be more complex than chess. Recently scientists created a program named ‘AlphaGo’ that competed with the world champion in this game and won.

4. Autonomous vehicle development — It is a vehicle that is able to navigate without human helps by sensing its environment.

Apart from these there are many more applications of reinforcement learning.