ML8 min read

Decision Trees: How They Work

Understand Decision Trees - one of the most intuitive and interpretable ML algorithms.

Sarah Chen
December 19, 2025
0.0k0

Decision Trees: How They Work

Decision Trees are exactly what they sound like—a tree of decisions. They're intuitive, interpretable, and surprisingly powerful.

The Concept

Think of playing 20 Questions:

  • Is it alive? → Yes
  • Is it an animal? → Yes
  • Does it have 4 legs? → Yes
  • Is it bigger than a cat? → Yes
  • Is it a dog? → Yes!

That's a decision tree!

Visual Example

                    [Age > 30?]
                    /          \
                 Yes            No
                /                 \
        [Income > 50k?]      [Student?]
        /          \         /        \
      Yes          No       Yes        No
       |            |        |          |
    [BUY]      [MAYBE]    [BUY]    [NO BUY]

Each internal node = a question about a feature
Each leaf node = a prediction

How Does It Build the Tree?

The Goal: Find the Best Splits

At each step, the algorithm asks: "Which question separates the data best?"

Measuring "Best" - Information Gain

Imagine you have 50 spam and 50 non-spam emails.

Bad split: Left has 45 spam + 40 non-spam, Right has 5 spam + 10 non-spam
(Still mixed up!)

Good split: Left has 48 spam + 2 non-spam, Right has 2 spam + 48 non-spam
(Much cleaner!)

This "cleanness" is measured using Gini Impurity or Entropy.

Gini Impurity

gini = 1 - (p_class1² + p_class2² + ...)

# Pure node (all same class): gini = 0
# Mixed node (50-50): gini = 0.5

Lower Gini = Better split

Code Example

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load famous iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Create and train tree
tree = DecisionTreeClassifier(max_depth=3)  # Limit depth to prevent overfitting
tree.fit(X_train, y_train)

# Evaluate
accuracy = tree.score(X_test, y_test)
print(f"Accuracy: {accuracy:.2%}")

# See feature importance
for name, importance in zip(iris.feature_names, tree.feature_importances_):
    print(f"{name}: {importance:.3f}")

Output:

Accuracy: 95.56%
sepal length (cm): 0.000
sepal width (cm): 0.000
petal length (cm): 0.587
petal width (cm): 0.413

Petal features are most important for classifying iris types!

Visualizing the Tree

from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

plt.figure(figsize=(15, 10))
plot_tree(tree, 
          feature_names=iris.feature_names,
          class_names=iris.target_names,
          filled=True)
plt.show()

For Regression Too!

Decision Trees can predict numbers, not just categories:

from sklearn.tree import DecisionTreeRegressor

# Predict house prices
tree = DecisionTreeRegressor(max_depth=5)
tree.fit(X_train, y_train)
predictions = tree.predict(X_test)

Instead of voting (classification), leaf nodes average the training values.

The Overfitting Problem

Decision Trees LOVE to overfit. Without limits, they'll create a rule for every single training example.

No limits:
- Accuracy on training: 100%
- Accuracy on test: 65%
(Memorized, didn't learn!)

How to Prevent Overfitting

tree = DecisionTreeClassifier(
    max_depth=5,              # Limit tree depth
    min_samples_split=10,     # Need at least 10 samples to split
    min_samples_leaf=5,       # Each leaf needs at least 5 samples
    max_features='sqrt'       # Only consider sqrt(n) features per split
)

Pros and Cons

Pros ✅

  • Interpretable: You can explain every prediction
  • No scaling needed: Works with raw features
  • Handles mixed data: Numbers and categories
  • Finds non-linear patterns: Unlike linear models
  • Fast: Quick to train and predict

Cons ❌

  • Overfits easily: Needs careful tuning
  • Unstable: Small data changes = very different tree
  • Greedy: Might miss globally optimal splits
  • Biased: Prefers features with many values

Decision Tree vs Linear Models

Aspect Decision Tree Linear Model
Decision boundary Rectangular Straight line
Interpretability Visual rules Coefficients
Feature scaling Not needed Usually needed
Handles non-linearity Yes No
Stability Low High

Key Insight

Decision Trees are powerful alone but even more powerful together. Random Forest (many trees) and Gradient Boosting (trees that learn from mistakes) dominate machine learning competitions.

We'll cover those next!

#Machine Learning#Decision Trees#Classification#Beginner