ML7 min read

Model Calibration: Reliable Probability Estimates

Learn how to calibrate your model so predicted probabilities reflect actual likelihoods.

Sarah Chen
December 19, 2025
0.0k0

Model Calibration: Reliable Probability Estimates

When your model says "80% probability of rain," it should actually rain 80% of the time. That's calibration - making probability estimates reliable.

Why Calibration Matters

Uncalibrated model says 90% confidence:
- Sometimes right 90% of the time ✓
- Sometimes right only 60% of the time ✗

Calibrated model says 90% confidence:
- Right 90% of the time ✓

Important for:

  • Medical diagnosis
  • Risk assessment
  • Decision making with costs/benefits

Checking Calibration: Reliability Diagram

from sklearn.calibration import calibration_curve
import matplotlib.pyplot as plt

# Get predicted probabilities
y_prob = model.predict_proba(X_test)[:, 1]

# Create calibration curve
prob_true, prob_pred = calibration_curve(y_test, y_prob, n_bins=10)

# Plot
plt.figure(figsize=(8, 6))
plt.plot(prob_pred, prob_true, 's-', label='Model')
plt.plot([0, 1], [0, 1], '--', label='Perfectly calibrated')
plt.xlabel('Mean predicted probability')
plt.ylabel('Fraction of positives')
plt.title('Calibration Curve')
plt.legend()
plt.show()

Interpreting the Plot

Perfectly Calibrated:
Points on diagonal line

Under-confident:         Over-confident:
Above diagonal           Below diagonal
Says 60%, is 80%        Says 80%, is 60%

Brier Score

Measures calibration quality (lower is better):

from sklearn.metrics import brier_score_loss

brier = brier_score_loss(y_test, y_prob)
print(f"Brier Score: {brier:.4f}")  # 0 = perfect, 0.25 = random

Which Models Need Calibration?

Model Typically Calibrated?
Logistic Regression Usually good
Naive Bayes Often extreme (needs calibration)
Random Forest Usually under-confident
SVM Often needs calibration
Neural Networks Varies
XGBoost Usually good

Calibration Methods

1. Platt Scaling (Sigmoid)

Fits a logistic regression to the outputs:

from sklearn.calibration import CalibratedClassifierCV

# Wrap your model with calibration
calibrated = CalibratedClassifierCV(model, method='sigmoid', cv=5)
calibrated.fit(X_train, y_train)

# Now probabilities are calibrated
y_prob_calibrated = calibrated.predict_proba(X_test)[:, 1]

2. Isotonic Regression

Non-parametric, more flexible:

calibrated = CalibratedClassifierCV(model, method='isotonic', cv=5)
calibrated.fit(X_train, y_train)

Comparing Methods

# Original model
model.fit(X_train, y_train)
prob_original = model.predict_proba(X_test)[:, 1]

# Sigmoid calibration
cal_sigmoid = CalibratedClassifierCV(model, method='sigmoid', cv=5)
cal_sigmoid.fit(X_train, y_train)
prob_sigmoid = cal_sigmoid.predict_proba(X_test)[:, 1]

# Isotonic calibration  
cal_isotonic = CalibratedClassifierCV(model, method='isotonic', cv=5)
cal_isotonic.fit(X_train, y_train)
prob_isotonic = cal_isotonic.predict_proba(X_test)[:, 1]

# Compare
print(f"Original Brier: {brier_score_loss(y_test, prob_original):.4f}")
print(f"Sigmoid Brier: {brier_score_loss(y_test, prob_sigmoid):.4f}")
print(f"Isotonic Brier: {brier_score_loss(y_test, prob_isotonic):.4f}")

Choosing Calibration Method

Method Best When
Sigmoid Enough data, S-shaped miscalibration
Isotonic Lots of data, arbitrary miscalibration

Sigmoid is safer with less data (isotonic can overfit).

Full Pipeline Example

from sklearn.ensemble import RandomForestClassifier
from sklearn.calibration import CalibratedClassifierCV, calibration_curve
from sklearn.model_selection import train_test_split

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train base model
rf = RandomForestClassifier(n_estimators=100)

# Create calibrated version
rf_calibrated = CalibratedClassifierCV(rf, method='sigmoid', cv=5)
rf_calibrated.fit(X_train, y_train)

# Compare calibration curves
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

for ax, (name, model) in zip(axes, [('Uncalibrated', rf), ('Calibrated', rf_calibrated)]):
    if name == 'Uncalibrated':
        model.fit(X_train, y_train)
    prob = model.predict_proba(X_test)[:, 1]
    prob_true, prob_pred = calibration_curve(y_test, prob, n_bins=10)
    
    ax.plot(prob_pred, prob_true, 's-')
    ax.plot([0, 1], [0, 1], '--')
    ax.set_title(f'{name}\nBrier: {brier_score_loss(y_test, prob):.4f}')

plt.tight_layout()

Key Takeaway

Calibration makes probability predictions reliable. Always check calibration with reliability diagrams, especially when probabilities drive decisions. Random forests and Naive Bayes often need calibration, while logistic regression is usually well-calibrated. Use sigmoid calibration by default, isotonic if you have lots of data.

#Machine Learning#Calibration#Probability#Intermediate