Skip to content
Ankit Tomar
Ankit Tomar

AI Products

  • AIML
  • Product Management
  • Interview Prep
    • Data Science Interview Questions and Answers
  • Books
  • Blog
    • Generic
    • GenAI
    • Data Pipeline
    • Education
    • Cloud
    • Working in Netherlands
  • About Me
Schedule
Ankit Tomar

AI Products

Decision Trees – A Complete Guide

Ankit Tomar, June 28, 2025June 27, 2025

Decision Trees are one of the most intuitive and interpretable models in machine learning. They are widely used in both classification and regression problems due to their simplicity and flexibility. Below, we cover their internal workings, strengths, limitations, and answer key interview questions.


🌳 What Is a Decision Tree?

A decision tree splits data into branches like a flowchart. Each internal node asks a question about a feature, each branch represents an outcome of the question, and each leaf node gives a final prediction. The aim is to create branches where each final group (leaf) contains similar target values (either class labels or numeric predictions).


πŸ” How Do Decision Trees Split the Data?

At the core, a decision tree tries to make the data at each node as “pure” as possible:

  1. For each feature, the algorithm tries all possible split points.
  2. It calculates how good each split is:
    • Classification: Uses either Gini impurity or entropy to evaluate purity.
    • Regression: Uses variance reduction (often Mean Squared Error reduction).
  3. The split that results in the largest decrease in impurity (or variance) is chosen.
  4. This process continues recursively for each child node until stopping criteria are met.

❓ Gini vs. Entropy – What’s the Difference?

MetricInterpretation
Gini ImpurityMeasures misclassification probability
EntropyMeasures information disorder (uncertainty)
  • Gini is slightly faster to compute and is often used by default.
  • Entropy is more theoretically grounded in information theory.

Both aim to create pure nodes, but Gini tends to favor larger partitions.


πŸ’‘ What Does “Greedy” Mean in Decision Trees?

“Greedy” means that the tree algorithm picks the best split at each step, without looking ahead. It does not evaluate what might be optimal in future steps. This local optimization helps reduce complexity but can lead to suboptimal global trees.


πŸ”’ How Do You Prevent Overfitting in Decision Trees?

  • Max Depth: Limit how deep the tree can grow.
  • Min Samples Split: Minimum number of samples required to split a node.
  • Min Samples Leaf: Minimum number of samples required in a leaf.
  • Pruning: Remove branches that do not improve generalization (post-pruning).
  • Cross-validation: Helps find the right parameters.

Without these techniques, a decision tree can perfectly memorize training data (high variance).


πŸ“Š Can Decision Trees Be Used for Regression?

Yes. Instead of Gini or entropy, regression trees use:

  • Variance reduction or
  • Mean Squared Error (MSE) decrease to choose splits.

Each leaf predicts the average of the target values in that region.


🌲 Why Are Decision Trees Used in Random Forests and Gradient Boosting?

  • Random Forests: Combine many decision trees trained on bootstrapped data with random feature selection. This reduces overfitting and improves generalization.
  • Gradient Boosting: Sequentially builds trees where each new tree corrects the errors of the previous one.

In both, decision trees are chosen because they are:

  • Fast to train
  • Easy to interpret
  • Naturally handle both categorical and numerical data

Boosted trees are typically shallow (depth 3–5), while trees in random forests can be deeper.


βœ… Summary

Decision Trees are excellent for understanding patterns and are powerful building blocks in ensemble models. With the right controls, they balance interpretability with performance.

In upcoming posts, we’ll cover how trees become forests β€” diving into Random Forests and Gradient Boosting next.

Loading

Post Views: 106
Machine Learning ML

Post navigation

Previous post
Next post

Related Posts

Machine Learning

πŸˆβ€β¬› How CatBoost Handles Categorical Features, Ordered Boosting & Ordered Target Statistics πŸš€

July 3, 2025July 3, 2025

CatBoost isn’t just β€œanother gradient boosting library.”Its real magic lies in how it natively handles categorical variables, avoids target leakage, and reduces prediction shift β€” three major pain points in traditional boosting. Let’s break this down step by step. 🧩 Problem: Categorical variables in tree models Most boosting libraries (like…

Loading

Read More
Machine Learning

Gradient Boosting

July 1, 2025July 1, 2025

As we continue our journey into ML algorithms, in this post, we’ll go deeper into gradient boosting β€” how it works, what’s happening behind the scenes mathematically, and why it performs so well. 🌟 What is gradient boosting? Gradient boosting is an ensemble method where multiple weak learners (usually shallow…

Loading

Read More
Machine Learning

9. Feature Engineering – The Unsung Hero of Machine Learning

June 26, 2025June 26, 2025

As we continue our journey through machine learning model development, it’s time to shine a light on one of the most critical yet underrated aspects β€” Feature Engineering. If you ever wondered why two people using the same dataset and algorithm get wildly different results, the answer often lies in…

Loading

Read More

Search

Ankit Tomar

AI product leader, Amsterdam

Archives

  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • December 2024
  • August 2024
  • July 2024
Tweets by ankittomar_ai
©2025 Ankit Tomar | WordPress Theme by SuperbThemes