Importance of Feature Engineering in Machine Learning

Often I have been asked questions about the importance of Feature Engineering in Machine Learning so I decided to write a post about it. Every beginner in the field of Machine Learning has questions about feature engineering and its importance. Questions like “I mean wouldn’t it work better if I put all the data in the model and let it do all the learning stuff on itself?” are there in every beginners mind. So in this post I will be discussing about the importance of Feature Engineering in the field of Machine Learning.

There is an old saying that goes like this- “Garbage In, Garbage Out”. And this holds true for Machine Learning Algorithms too. The capability of your system to ‘learn’ depends on the amount of relevant data or ‘good’ data rather than the amount of irrelevant or ‘bad’ data. And this is where Feature Engineering comes into play. Feature Engineering helps us to select or create features about our data that can help our system learn effectively. Before we discuss the different ways of feature engineering, let’s take an example about how having irrelevant features can screw up our model. Say we have to create a Machine Learning model that predicts the final test marks of students based on various features such as hours studied per week, weekly test marks among others. How much do you think features such as name of the student influences the marks scored during the final exam. But if the data we give to the system to ‘learn’ from has that all students whose name start with ‘S’ have good marks whereas all students whose name starts with ‘T’ have bad marks(which by the way happens, no doubt by accident). And since our system is just learning the pattern from the data, it will learn this pattern too causing the system to learn that students whose name starts with the letter ‘S’ perform better at test than those whose name starts with the letter ‘T’. This will cause the system to not ‘learn’ the representations in the data properly due to irrelevant features such as these. Hence to improve the learning of the system it is essential to remove all the irrelevant features from the data. Now that the importance of feature learning is clear, let’s discuss about their various types. Feature Engineering can involve 3 things-

  • Feature Selection- In feature selection we select the features that are most relevant and remove all the irrelevant features. We then train our system on these relevant features. One way see how relevant the feature is by measuring their correlation with the target variable or the output we are trying to predict. The features which have high correlation(both, positive and negative) with the output are kept for training the model, whereas the features which have low correlation with the output are removed. Removing the name of the student from data before feeding it to the system for training in the above example is an example of feature selection.
  • Feature Extraction- In feature extraction we create new and useful features by combining the features already present in the data. This can be done by using dimensional reduction algorithms such as Principal Components Analysis(PCA). For example, while predicting the price of the house, we use features such as total area of the house and total bedrooms in the house. We can create another feature these two features such as the average area of the bedroom(let’s consider for this example that other areas such as kitchen areas, bathroom areas, living areas, etc don’t occupy much area as a percentage of the total area of the house). Now this new feature that tells the average size of a bedroom could have a high correlation with the total price of the house. It could very well happen that price depends more on this new feature than any of the previous two features. Hence we add this feature to the data before feeding it to the system.
  • Creating new features by gathering new data- We get new features about the data by getting new data. This usually happens only when feature selection and feature extraction are not able to provide good enough results.

I hope this post helped you in getting an insight about the usefulness of feature engineering and the various ways by which it is achieved.