Reinforcement learning

Reinforcement learning For Absolute Beginners

This blog is the continuation of the Machine Learning blog series for Absolute beginners. In the previous blog, we give a brief introduction to the three categories of machine learning. Now we will dive deep into one category of Machine learning i.e Reinforcement Learning.

Let’s start with a basic introduction to Reinforcement Learning which basically helps an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences. The agent takes some action in a particular situation in the environment. And that action may or may not affect the state of the environment. But the environment in return gives a reward that may be positive or may be negative depending on the performed action on the environment.

In Reinforcement Learning, the agent learns automatically using feedback without any labeled data, unlike supervised learning. Since there is no labeled data, so the agent is bound to learn by its experience only. The agent interacts with the environment and explores it by itself. The primary goal of an agent in reinforcement learning is to improve performance by getting the maximum positive rewards. Let’s discuss one example.

Example: Suppose there is an AI agent present within a maze environment, and his goal is to find the diamond. The agent interacts with the environment by performing some actions, and based on those actions, the state of the agent gets changed, and it also receives a reward or penalty as feedback.

In the above example, we discuss that an AI agent performs one action and will collect the negative or positive reward. And similarly, the AI agent performs the second action and will get the second negative or positive reward but that is not the case all the time. Sometimes in Reinforcement learning, the reward comes late. Let’s discuss another example.

Example: We will take an example of an AI agent driving a vehicle and it hits another vehicle by accident. AI agent’s vehicle is moving at 100 km/h in a city. And just before hitting another vehicle AI agent applied a brake. So, his last action before an accident is applying a brake. So, If we give a negative reward for the action of applying the brake, then the AI agent learns that applying a brake is not a good action, because after applying the brake, we had an accident. So, the Conclusion will be applying a brake is the cause of the accident, but the actual reason for the accident is overspeeding. So, accelerating the speed is one action but the reward of this action comes later after taking another action. So, the agent should be able to learn that driving fast is not a good habit. So, the actual problem in reinforcement learning is dealing with a reward function. How to design the reward function is the real challenge? we will discuss later, how we resolved that issue. But first, we need to understand the basic terminologies used in reinforcement learning.

Read More »