Machine Learning Interview Questions- Decision Tree (Set-1)

Machine Learning and Data Science are one of the fastest-growing professions in the world. The requirement for Machine Learning Engineers and Data Scientists has gone up a lot in recent years. In addition, basics of Machine Learning and Data Science are required in jobs such as Data Analyst, Analyst, Data Engineer, etc.

Machine Learning Interview Questions on Decision Tree. Image for representation purpose only.
Machine Learning Interview Questions on Decision Tree. Image for representation purpose only.

Decision trees are very important and powerful Machine Learning models and having a good understanding of their working is very important for any Machine Learning Engineer or Data Scientist. For this reason, questions about decision trees are often asked in interviews. This makes the Decision Tree an important topic when preparing for interviews.

A Machine Learning Interview can be really intimidating, especially so because of the depth and breadth of the topics you are supposed to know. In this article, we will be going through some of the most asked questions related to Decision Trees in a Machine Learning interview.

What is a Decision Tree?

Decision Trees are a class of non-parameterized Machine Learning algorithms that fall under the paradigm of Supervised Learning. They are very powerful algorithms and can fit complex datasets with ease. They can be used for both Classification and Regression tasks.

What are the algorithms for building a Decision Tree model?

There are several algorithms that can be used to build a Decision Tree model. Some of the most popular algorithms are ID3, C4.5(successor of ID3 algorithm), ASSISTANT, CART- Classification And Regression Trees. CART is used to build Decision Tree models in the Scikit-Learn library.

What are the different ways to Regularize a Decision Tree Model and prevent it from over-fitting the training data?

Decisions trees are very powerful Machine Learning models. A Decision Tree if not regularized will fit the training data very closely, hence causing overfitting. For this reason, it is very important to regularize Decision Tree models.

Decision Trees can be regularized by-

  1. Early Stopping when growing the Decision Tree.
  2. Building the complete Decision Tree and Pruning it later.

Generally, Pruning produces better results than early stopping.

When using Early Stopping, the following methods can be used to reduce overfitting of the decision tree-

  • Decreasing the depth of the Decision Tree.
  • Increasing the minimum number of samples in a node to split it.
  • Increasing the minimum number of samples in a leaf node.

When using Pruning to avoid overfitting, first the Decision Tree is completely built and allowed to overfit the training data. Later, the ‘unnecessary’ nodes are removed. The nodes of the complete Decision Tree are removed if the decrease in the GINI Impurity index or Entropy after splitting is not statistically significant.

How do you address the issue of under-fitting in Decision Trees?

Under-fitting occurs when the model is not sophisticated enough to identify patterns in the input data and map them to outputs. Under-fitting results in low accuracy on both, the training dataset and test dataset. Under-fitting in decision trees is a rather rare phenomenon, but it can be addressed by taking one or many of the following steps-

  • If possible, increase the size of the training dataset.
  • Increasing the depth of the Decision Tree.
  • Decreasing the number of samples in the node for it to split.
  • Decreasing the minimum number of samples in a leaf node.
  • Stopping further splitting of nodes if the Entropy or GINI Impurity is below a certain threshold.

What are some of the advantages of using Decision Tree models?

The following are some of the key advantages of using Decision Trees-

  • They are powerful Machine Learning algorithms that can be used for both Classification and Regression.
  • They are ‘White Box’ Machine Learning models that are easy to visualize and represent. It is possible to represent them as a couple of ifelse ifelse statements.
  • Though Decision Trees can handle continuous-valued data as well, they tend to do really well with categorical data.
  • Decision Trees are robust to errors and missing data.

What are some of the drawbacks or limitations of using Decision Tree models?

Decision Trees have the following drawbacks or limitations-

  • Decision Trees are sensitive to the rotation of data. This is primarily because they have orthogonal decision boundaries. This problem can be solved by performing Principal Component Analysis(PCA) in the data before feeding it to Decision Tree algorithms. This will also help in reducing the dimension of input data which will speed up the process of building the Decision Tree.
  • Decision Trees overfit the training data very easily.
  • They are very sensitive to small variations in data.

Follow Up Articles

Machine Learning Interview Questions- Decision Tree (Set-2)