In one of my previous posts, I have explained what Imitation Learning is. You can check out the post over here. Although Imitation Learning(IL) and Reinforcement Learning(RL) look more or less the same, there are some well-defined differences. In this blog post, I will take some time and effort to talk about the differences between Imitation Learning and Reinforcement Learning.
Feedback Mechanism
Imitation Learning and Reinforcement Learning, both use feedback during their training process to improve their performance at the task. However, it is the nature of feedback that’s different in both of them.
Imitation Learning
In Imitation Learning, the model learns by imitating the teacher/ trainer Learn More. Therefore, the training is done by demonstrating the model about the correct actions. For example, a self-driving car model can be trained by showing how the trainer behaves while driving and the model can be expected to do the same on its own after the training episode is over. Since the model tries to imitate the instructor, it is called Imitation Learning. The key point is that the feedback is in the form of a demonstration of the task.
Reinforcement Learning
In Reinforcement Learning, the model learns on its own by a combination of exploration and exploitation of its environment. The model, therefore, trains on its own without the external help of a teacher/ trainer. The model tries to maximize a scalar reward(a number) to train itself. For example, a model can also learn to drive a car on its own by trying to maximize its rewards by not doing any crashes or violating any law. As opposed to Imitation Learning, the reward here is a scalar quantity.
However, the difference also lies in the Learning Efficiency of both methods.
Learning Efficiency
Imitation Learning
During an Imitation Learning training episode, the trainer can only use actions that are in the action space of the model. Imitation Learning can be really helpful where it is easy and natural for the trainer to provide demonstrations to the model, i.e, the action space of the model and that of the trainer has complete or a significant overlap. For example, the action space of a self-driving model and a human driver is exactly the same, they both have the same actions available to them, such as acceleration, braking, and steering. However, this is not always the case. The number of training episodes required with Imitation Learning is fewer than that required with Reinforcement Learning because of the ‘dense’ nature of feedback in Imitation Learning.
Reinforcement Learning
In Reinforcement Learning, there is no need for a trainer to provide demonstrations. However, there is still a need to provide the rules of the reward mechanism. The setting up of the rewards mechanism is easier as compared to providing feedback, and especially so when it is unnatural for the trainer to provide feedback because their action space is much different from that of the model’s. But Reinforcement Learning, however, also requires more training episodes, the reason being that the learning efficiency of the model is reduced due to the sparse nature of scalar rewards.
The graph below compares IL and RL on Learning Efficiency and Ease and Natural-ness of Feedback.
Therefore, the key difference between Imitation Learning and Reinforcement Learning can be summarized as-
- In Imitation Learning, the feedback is a demonstration of the actual task, whereas, in Reinforcement Learning, the feedback is a scalar reward.
- The learning efficiency during Imitation Learning is much higher than that of Reinforcement Learning.
- Imitation Learning is more suited when it is easy and natural for the trainer to provide demonstrations. Otherwise, Reinforcement Learning is more suited.
References
Interactive Learning from Activity Description paper by Nguyen et al., ICML 2021. Click here.