Understand Decision Trees - one of the most intuitive and interpretable ML algorithms.

Decision Trees: How They Work

Decision Trees are exactly what they sound like—a tree of decisions. They're intuitive, interpretable, and surprisingly powerful.

The Concept

Think of playing 20 Questions: - Is it alive? → Yes - Is it an animal? → Yes - Does it have 4 legs? → Yes - Is it bigger than a cat? → Yes - Is it a dog? → Yes!

That's a decision tree!

Visual Example

``` [Age > 30?] / \ Yes No / \ [Income > 50k?] [Student?] / \ / \ Yes No Yes No | | | | [BUY] [MAYBE] [BUY] [NO BUY] ```

Each internal node = a question about a feature Each leaf node = a prediction

How Does It Build the Tree?

### The Goal: Find the Best Splits

At each step, the algorithm asks: "Which question separates the data best?"

### Measuring "Best" - Information Gain

Imagine you have 50 spam and 50 non-spam emails.

**Bad split:** Left has 45 spam + 40 non-spam, Right has 5 spam + 10 non-spam (Still mixed up!)

**Good split:** Left has 48 spam + 2 non-spam, Right has 2 spam + 48 non-spam (Much cleaner!)

This "cleanness" is measured using **Gini Impurity** or **Entropy**.

### Gini Impurity

```python gini = 1 - (p_class1² + p_class2² + ...)

Pure node (all same class): gini = 0 # Mixed node (50-50): gini = 0.5 ```

Lower Gini = Better split

Code Example

```python from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split

Load famous iris dataset iris = load_iris() X, y = iris.data, iris.target

Split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

Create and train tree tree = DecisionTreeClassifier(max_depth=3) # Limit depth to prevent overfitting tree.fit(X_train, y_train)

Evaluate accuracy = tree.score(X_test, y_test) print(f"Accuracy: {accuracy:.2%}")

See feature importance for name, importance in zip(iris.feature_names, tree.feature_importances_): print(f"{name}: {importance:.3f}") ```

Output: ``` Accuracy: 95.56% sepal length (cm): 0.000 sepal width (cm): 0.000 petal length (cm): 0.587 petal width (cm): 0.413 ```

Petal features are most important for classifying iris types!

Visualizing the Tree

```python from sklearn.tree import plot_tree import matplotlib.pyplot as plt

plt.figure(figsize=(15, 10)) plot_tree(tree, feature_names=iris.feature_names, class_names=iris.target_names, filled=True) plt.show() ```

For Regression Too!

Decision Trees can predict numbers, not just categories:

```python from sklearn.tree import DecisionTreeRegressor

Predict house prices tree = DecisionTreeRegressor(max_depth=5) tree.fit(X_train, y_train) predictions = tree.predict(X_test) ```

Instead of voting (classification), leaf nodes average the training values.

The Overfitting Problem

Decision Trees LOVE to overfit. Without limits, they'll create a rule for every single training example.

``` No limits: - Accuracy on training: 100% - Accuracy on test: 65% (Memorized, didn't learn!) ```

### How to Prevent Overfitting

```python tree = DecisionTreeClassifier( max_depth=5, # Limit tree depth min_samples_split=10, # Need at least 10 samples to split min_samples_leaf=5, # Each leaf needs at least 5 samples max_features='sqrt' # Only consider sqrt(n) features per split ) ```

Pros and Cons

### Pros ✅ - **Interpretable**: You can explain every prediction - **No scaling needed**: Works with raw features - **Handles mixed data**: Numbers and categories - **Finds non-linear patterns**: Unlike linear models - **Fast**: Quick to train and predict

### Cons ❌ - **Overfits easily**: Needs careful tuning - **Unstable**: Small data changes = very different tree - **Greedy**: Might miss globally optimal splits - **Biased**: Prefers features with many values

Decision Tree vs Linear Models

| Aspect | Decision Tree | Linear Model | |--------|--------------|--------------| | Decision boundary | Rectangular | Straight line | | Interpretability | Visual rules | Coefficients | | Feature scaling | Not needed | Usually needed | | Handles non-linearity | Yes | No | | Stability | Low | High |

Key Insight

Decision Trees are powerful alone but even more powerful together. **Random Forest** (many trees) and **Gradient Boosting** (trees that learn from mistakes) dominate machine learning competitions.

We'll cover those next!

Decision Trees: How They Work

Decision Trees: How They Work

The Concept

Visual Example

How Does It Build the Tree?

Pure node (all same class): gini = 0 # Mixed node (50-50): gini = 0.5 ```

Code Example

Load famous iris dataset iris = load_iris() X, y = iris.data, iris.target

Split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

Create and train tree tree = DecisionTreeClassifier(max_depth=3) # Limit depth to prevent overfitting tree.fit(X_train, y_train)

Evaluate accuracy = tree.score(X_test, y_test) print(f"Accuracy: {accuracy:.2%}")

See feature importance for name, importance in zip(iris.feature_names, tree.feature_importances_): print(f"{name}: {importance:.3f}") ```

Visualizing the Tree

For Regression Too!

Predict house prices tree = DecisionTreeRegressor(max_depth=5) tree.fit(X_train, y_train) predictions = tree.predict(X_test) ```

The Overfitting Problem

Pros and Cons

Decision Tree vs Linear Models

Key Insight

More on ML

What is Machine Learning? A Simple Introduction

Supervised vs Unsupervised Learning Explained

Understanding Training, Validation, and Test Sets