ML7 min read

Introduction to Bias and Variance

Understand the bias-variance tradeoff - a fundamental concept in machine learning.

Sarah Chen
December 19, 2025
0.0k0

Introduction to Bias and Variance

The bias-variance tradeoff is at the heart of machine learning. Understanding it helps you diagnose and fix model problems.

Simple Explanation

Imagine throwing darts at a target:

High Bias, Low Variance:   Low Bias, High Variance:
  ┌─────────┐                ┌─────────┐
  │         │                │ x     x │
  │   xxx   │                │    ◎    │
  │    ◎    │                │  x   x  │
  │         │                │         │
  └─────────┘                └─────────┘
  Consistent but wrong       Scattered around target

Low Bias, Low Variance:    High Bias, High Variance:
  ┌─────────┐                ┌─────────┐
  │         │                │ x       │
  │   x◎x   │                │       x │
  │    x    │                │    ◎    │
  │         │                │  x   x  │
  └─────────┘                └─────────┘
  What we want!              Worst case

What is Bias?

Bias = Error from overly simple assumptions.

A model with high bias:

  • Misses relevant patterns
  • Underfits the data
  • Is "too rigid"

Example: Fitting a straight line to curved data.

Data: curved pattern
Model: straight line

     ∙
   ∙   ∙
  ──────── ← Line misses the curve
 ∙       ∙

What is Variance?

Variance = Error from being too sensitive to training data.

A model with high variance:

  • Learns noise as if it were signal
  • Overfits the data
  • Changes dramatically with different training data

Example: A very complex polynomial that hits every point but generalizes poorly.

The Tradeoff

Error
  │╲
  │ ╲    ╱╱  Total Error
  │  ╲__╱╱   
  │   ╲╱     Variance
  │    ╲____
  │         Bias
  └────────────────
    Simple → Complex
  • Simple models: High bias, low variance
  • Complex models: Low bias, high variance
  • Goal: Find the sweet spot where total error is minimized

Mathematical View

Total Error = Bias² + Variance + Irreducible Noise
  • Bias²: How far off predictions are on average
  • Variance: How much predictions vary with different training data
  • Irreducible Noise: Random error you can't eliminate

Examples by Model

Model Bias Variance
Linear Regression High Low
Polynomial Regression (high degree) Low High
Decision Tree (no pruning) Low High
Decision Tree (limited depth) Medium Medium
k-NN (k=1) Low High
k-NN (k=n) High Low

Diagnosing Your Model

High Bias (Underfitting)

  • Training error is high
  • Training and test error are similar (both bad)
  • Model is too simple

Fix: Increase complexity

  • Add features
  • Use a more complex model
  • Reduce regularization

High Variance (Overfitting)

  • Training error is low
  • Test error is much higher than training
  • Model is too complex

Fix: Decrease complexity

  • Remove features
  • Use simpler model
  • Add regularization
  • Get more data

Learning Curves

Plot training and validation error vs. training size:

High Bias Pattern

Error
  │ ────────── Validation (high)
  │ ────────── Training (high)
  └──────────────────
          Training Size
Both errors are high and converge

High Variance Pattern

Error
  │ ──────  Validation (high)
  │        
  │ ______ Training (low)
  └──────────────────
          Training Size
Big gap between training and validation

Code: Plotting Learning Curves

from sklearn.model_selection import learning_curve
import matplotlib.pyplot as plt
import numpy as np

train_sizes, train_scores, val_scores = learning_curve(
    model, X, y,
    train_sizes=np.linspace(0.1, 1.0, 10),
    cv=5
)

train_mean = train_scores.mean(axis=1)
val_mean = val_scores.mean(axis=1)

plt.plot(train_sizes, train_mean, label='Training')
plt.plot(train_sizes, val_mean, label='Validation')
plt.xlabel('Training Size')
plt.ylabel('Score')
plt.legend()
plt.title('Learning Curve')
plt.show()

Controlling Bias and Variance

Model Complexity

More complex → Less bias, more variance

Regularization

More regularization → More bias, less variance

Training Data

More data → Helps reduce variance (not bias!)

Feature Selection

Fewer features → More bias, less variance

Practical Tips

  1. Start simple: Begin with a simple model, add complexity as needed
  2. Use cross-validation: Don't rely on single train-test split
  3. Plot learning curves: Visualize the bias-variance situation
  4. Regularize: When in doubt, add regularization
  5. Get more data: Often the best solution for high variance

Key Takeaway

You can't minimize both bias and variance simultaneously. The art of machine learning is finding the right balance for your specific problem.

  • High training error? → Reduce bias
  • High gap between train/test? → Reduce variance

Use learning curves to diagnose, then adjust accordingly!

#Machine Learning#Bias#Variance#Beginner