In the past few years deep learning has been a subject of huge media hype. We are regularly promised a future of autonomous cars, chatbots, robots, virtual assistants and what not? A future in which we, humans will have little or no roles, and all our jobs will be transferred to robots and intelligent systems. Some have also expressed concerns about the survival of the human race in such a future. But what is Deep Learning?
To answer this question we need to first step into the history of Artificial Intelligence.
Artificial Intelligence
Artificial Intelligence was born in the early 1950’s after a few pioneers in the field of Computer Science questioned whether computers can be made to ‘think’. Artificial Intelligence is a general field of study that deals with the development of ‘intelligent systems’ that can be used to automate intelligent tasks that are typically performed by humans. This ‘intelligent’ behavior can be achieved by any means possible, such as knowledge, searching, logic, etc. An intelligent computer program to play chess in the early days if AI for example, was developed by writing hard coded rules which were based on human knowledge. So there was no ‘learning’ involved. This approach which involves hardcoding rules based on human knowledge is called symbolic AI. John Haugeland gave them the name GOFAI(Good Old Fashioned Artificial Intelligence) in his book titled ‘Artificial Intelligence: The very idea’. Although symbolic AI worked well when solving well defined, logical problems, solving problems such as image classification, natural language processing and speech recognition were beyond the scope of symbolic AI.
Machine Learning
Machine Learning is a subset of Artificial Intelligence, which does not rely on hard coded rules by humans based on knowledge. Machine learning is a new paradigm in which the system is ‘trained’. It is presented with some data and its ‘answers’ and the system tries to ‘learn’ the patterns in the data. This can be illustrated with the flow chart represented below.
For example, if you want to build a system which can classify whether an email is spam or not, then you start by showing the model some spam emails and some non-spam emails along with their labels or ‘answers’-spam or not spam. The system learns some rules about how a classify an email as a spam or not, for example, say presence of words such as ‘unsubscribe’ mostly means that the email is spam. It then uses these learned rules to classify a new email as spam or not.
To train a Machine Learning system, along with the data and and its target values, we also need a way to measure how well our system is performing. We can do this by using either a function which tells us how good our system is performing- fitness function or how bad our system is performing- loss function. The value from this function(fitness or loss function) is used as a feedback to the system. So if the system is not working the way it should, then changes are made to the system in such a way that the system works correctly. And this very process of adjusting the system so that it works correctly is what we mean by ‘learning’. Since this is a very important concept we will go through it again, step by step and then take an example.
The system takes some data and some ‘answers’ based on which it creates a model and using this model it tries to predict the ‘answers’ for some new data. The accuracy of the system is measured using some function. The value from the fitness or loss function is used as a feedback to the model. Hence, the model is then changed slightly so that it predicts the ‘answers’ more accurately. Hence the accuracy of the model keeps on increasing with every iteration. Now let’s understand this with the example of a system that learns to classify an email as a spam or not. We first provide the system with some ‘data’- emails and some ‘answers’-weather they are spam or not as marked by a human. By going through this data, the system creates a model. Now we use this model to classify some new data or some new emails.
The answers, as predicted by the model are matched with the actual answers provided by the human. And we measure the accuracy of this model using a function, say, the percentage of emails correctly classified by the model as spam. The model is then changed slightly so that when we present it with another data, say a new set of emails next time, it classifies it them more correctly than the previous time.
Deep Learning
Deep Learning is a subfield of Machine Learning which uses a ‘layered’ architecture to learn representations. Each successive layer in the layered architecture works with more meaningful and sophisticated representations. We will talk about what that implies and how that works in just a few moments.
The word ‘deep’ in deep learning refers to many successive layers of representation present in a model. In deep learning, the depth of a model is a measure of the amount of layers present in the model. The depth of models can range anywhere between 2-3 to a few tens or in some cases to even a few hundreds. When the depth of the model is 2 or less, the model is often called a shallow model and is associated with shallow learning. Deep Learning only deals with models having layered architecture and have at least more than 2 layers.
Each layer works on the output provided by its previous layer. Hence, each layer works works on detecting pattern over the output of its previous layer. This makes each subsequent layer to work with more complicated, sophisticated and meaningful pattern recognition than its previous layers. This makes deep learning models excellent for working with complex data. For example, consider a deep learning model for image classification. The first layer will detect simple pattern like straight line or curve. The second layer will detect more complex patterns formed from simple patterns detected in the first layer, say lines forming a rectangle or curves forming a circle or an ellipse. The successive layers will keep on detecting more and more complex patterns in the data until it is able to learn how to classify images.
These layered architecture are modelled and learned by something known as a Neural Network- A stack of layers on top of each other where each layer consists of many units called a neuron. Though the word neuron and neural networks are a reference to biology. Also it is very important to note here that many concepts in deep learning are motivated and inspired from the neurons in the human brain and hence their name, the neurons in the Deep Learning Models do not replicate the neurons present in the human brain. Neither is the neural network models a model of the brain.
In deep learning also, the various parameters of the model are iteratively changed based on the feedback from a fitness or loss function such that the accuracy of the model improves each time. The precise structure and working of neural networks are not discussed in this post and will be covered in another post.