Decision Trees are very powerful Machine Learning models that can be used for both classification and regression tasks. This makes them one of the most important topics for a Machine Learning interview and having a good grasp of Decision Trees is very important for anyone who is aspiring for the role of a Machine Learning Engineer or Data Scientist. In one of my previous articles, I discussed some of the interview questions asked during a Machine Learning Interview. The post focussed solely on Machine Learning Questions related to Decision Trees. This article will be a follow-up of the previous article and I will talk about some other Machine Learning Interview Questions related to Decision Tree.
When creating a Decision Tree from the Training Data, how is the attribute decided for splitting a non-leaf node?
When growing a Decision Tree, each attribute is used to calculate the usefulness of splitting on that attribute. The ‘best‘ and ‘most useful‘ attribute among all the attributes is selected for splitting a non-leaf node. Various quantitative measures exist to determine the usefulness of a split. However, the most commonly used ones are- Decrease in Gini Impurity and Information Gain(i.e, Decrease in Entropy).
How is Gini Impurity for a Node calculated?
Gini Impurity for a node is calculated as the sum of squares of the ratios of all the classes that are present at a particular node subtracted from 1.
Mathematically, the Gini Impurity of a node is calculated with the help of the following formula-
So, for example, at a particular node there are 50 samples, of which 10 belong to a particular class(say class A) and 40 belong to another class(say class B), then the Gini Impurity of the node is calculated as-
How is Entropy calculated?
Mathematically, Entropy for a node is calculated using the following formula-
So, for example, at a particular node there are 50 samples, of which 10 belong to a particular class(say class A) and 40 belong to another class(say class B), then the ENTROPY of the node is calculated as-