What is Self Supervised Learning and Why is everyone talking about it?

It doesn’t really matter whether it’s a podcast on Artificial Intelligence, a top-rated Machine Learning conference, or a Community Discussion on Deep Learning. Today everyone in the field of Artificial Intelligence and Machine Learning is talking about Self Supervised Learning. Everything ranging from advancements in the field to its application and results is a hot topic. In this short blog post, I will talk (very) briefly about Self Supervised Learning, and the reason why everyone is so excited about it.

Self Supervised Learning. Featured Image.
Image by Free-Photos from Pixabay.

Self Supervised Learning

Self Supervised Learning is a subset of Unsupervised Learning. The term supervision means that labels do exist for the data on which a model is trained. However, as opposed to Supervised Learning, where the data is annotated with the help of a human, in Self Supervised Learning, the labels are extracted from the data itself.

Let’s consider an example. Say you have a lot of unlabeled data(images). You can create randomly rotate the image of the data by 0, 90, 190, or 270 degrees and train your model to predict this rotation. Even though you do not have the labels for the images, you rotated the image so you know the rotation applied to each image. And in training the model to predict the rotation of the data, the model is learning something useful about the data. In this case, we created the labels of the data with the data itself. Now, this model can be fine-tuned on a small set of labeled data. It is an example of Pretext Training which is used for Self-Supervised Learning. One of my previous blog posts is about pretext training, which you can read over here.

Although there are many other pretext tasks on which you can train your Neural Network. I have written a blog on the most used pretext tasks used for training Computer Vision models on images which you can read over here.

Why is Self Supervised Learning so Hot Right Now

Everyone in the field of Machine Learning knows very well that Supervised Learning cannot scale. The human effort and cost required to label datasets are way too high, and it simply isn’t the way going forward. At the same time, there is no shortage of data itself in this era of big data. This calls for some innovation and change in the techniques used to train models. And it is exactly this gap between the abundance of data and scarcity of labeled data that Self Supervised Learning aims to cut down, and maybe, just maybe, totally eliminate it.

The non-scalable nature of Self Supervised Learning has been talked about by many pioneers of Machine Learning such as Jitendra Malik, Yann LeCun, Ishan Mishra, etc. Self Supervised Learning promises to address all the issues of Supervised Learning, and that is the reason why it is such a topic among all circles of Artificial Intelligence- everyone wants to get in on the action.

The famous ImageNet dataset, for example, which is used for benchmarking performances of Neural Networks took 22 human years to label. However, there are other issues too that are associated with manually labeling a dataset. One of the issues that come with human labeling is the lack of proper labels for real-world objects due to difficult categorization and poorly defined hierarchal structure for real-world objects. For example, how easy would it be to distinguish a table from a dining table? Given the variation in size and shapes of both of them, it will be nearly impossible to label them correctly because there would be a significant overlap between the two. Additionally, Self Supervised Learning can be useful when dealing with data having multiple modalities(multiple modes of measuring data, for eg- a video with sound is multimodal) because any correlation between multiple modalities can also be learned without having to label it all manually.