Decision Trees are very powerful Machine Learning models that can be used for both classification and regression tasks. This makes them one of the most important topics for a Machine Learning interview and having a good grasp of Decision Trees is very important for anyone who is aspiring for the role of a Machine Learning Engineer or Data Scientist. In one of my previous articles, I discussed some of the interview questions asked during a Machine Learning Interview. The post focussed solely on Machine Learning Questions related to Decision Trees. This article will be a follow-up of the previous article and I will talk about some other Machine Learning Interview Questions related to Decision Tree.
data:image/s3,"s3://crabby-images/90976/90976d9b3f43d948ab27ddf3474dea0bbd291d8f" alt="Machine Learning Interview Questions on Decision Tree. Image for representation purpose only."
When creating a Decision Tree from the Training Data, how is the attribute decided for splitting a non-leaf node?
When growing a Decision Tree, each attribute is used to calculate the usefulness of splitting on that attribute. The ‘best‘ and ‘most useful‘ attribute among all the attributes is selected for splitting a non-leaf node. Various quantitative measures exist to determine the usefulness of a split. However, the most commonly used ones are- Decrease in Gini Impurity and Information Gain(i.e, Decrease in Entropy).
How is Gini Impurity for a Node calculated?
Gini Impurity for a node is calculated as the sum of squares of the ratios of all the classes that are present at a particular node subtracted from 1.
Mathematically, the Gini Impurity of a node is calculated with the help of the following formula-
data:image/s3,"s3://crabby-images/11b29/11b2909c0d1b2728fa2dd221efb8382d08e8c176" alt="Formula of Gini Impurity"
So, for example, at a particular node there are 50 samples, of which 10 belong to a particular class(say class A) and 40 belong to another class(say class B), then the Gini Impurity of the node is calculated as-
data:image/s3,"s3://crabby-images/50f72/50f726adca9251dcdc25a25f0eb0aa06d2da92ef" alt="Calculating Gini Impurity for Machine Learning Interview"
How is Entropy calculated?
Mathematically, Entropy for a node is calculated using the following formula-
data:image/s3,"s3://crabby-images/7fb80/7fb804f66f47a59fc9bc1b399c3e9385ccb72674" alt="Entropy"
So, for example, at a particular node there are 50 samples, of which 10 belong to a particular class(say class A) and 40 belong to another class(say class B), then the ENTROPY of the node is calculated as-
data:image/s3,"s3://crabby-images/58906/589069f7a3e3e74576f5ec9e0209b44d47c48686" alt="Calculating Entropy for Machine Learning Interview"