ML7 min read

Model Calibration: Reliable Probability Estimates

Learn how to calibrate your model so predicted probabilities reflect actual likelihoods.

Sarah Chen
December 19, 2025
0.0k0

Model Calibration: Reliable Probability Estimates

When your model says "80% probability of rain," it should actually rain 80% of the time. That's calibration - making probability estimates reliable.

Why Calibration Matters

``` Uncalibrated model says 90% confidence: - Sometimes right 90% of the time ✓ - Sometimes right only 60% of the time ✗

Calibrated model says 90% confidence: - Right 90% of the time ✓ ```

Important for: - Medical diagnosis - Risk assessment - Decision making with costs/benefits

Checking Calibration: Reliability Diagram

```python from sklearn.calibration import calibration_curve import matplotlib.pyplot as plt

Get predicted probabilities y_prob = model.predict_proba(X_test)[:, 1]

Create calibration curve prob_true, prob_pred = calibration_curve(y_test, y_prob, n_bins=10)

Plot plt.figure(figsize=(8, 6)) plt.plot(prob_pred, prob_true, 's-', label='Model') plt.plot([0, 1], [0, 1], '--', label='Perfectly calibrated') plt.xlabel('Mean predicted probability') plt.ylabel('Fraction of positives') plt.title('Calibration Curve') plt.legend() plt.show() ```

Interpreting the Plot

``` Perfectly Calibrated: Points on diagonal line

Under-confident: Over-confident: Above diagonal Below diagonal Says 60%, is 80% Says 80%, is 60% ```

Brier Score

Measures calibration quality (lower is better):

```python from sklearn.metrics import brier_score_loss

brier = brier_score_loss(y_test, y_prob) print(f"Brier Score: {brier:.4f}") # 0 = perfect, 0.25 = random ```

Which Models Need Calibration?

| Model | Typically Calibrated? | |-------|----------------------| | Logistic Regression | Usually good | | Naive Bayes | Often extreme (needs calibration) | | Random Forest | Usually under-confident | | SVM | Often needs calibration | | Neural Networks | Varies | | XGBoost | Usually good |

Calibration Methods

### 1. Platt Scaling (Sigmoid)

Fits a logistic regression to the outputs:

```python from sklearn.calibration import CalibratedClassifierCV

Wrap your model with calibration calibrated = CalibratedClassifierCV(model, method='sigmoid', cv=5) calibrated.fit(X_train, y_train)

Now probabilities are calibrated y_prob_calibrated = calibrated.predict_proba(X_test)[:, 1] ```

### 2. Isotonic Regression

Non-parametric, more flexible:

```python calibrated = CalibratedClassifierCV(model, method='isotonic', cv=5) calibrated.fit(X_train, y_train) ```

### Comparing Methods

```python # Original model model.fit(X_train, y_train) prob_original = model.predict_proba(X_test)[:, 1]

Sigmoid calibration cal_sigmoid = CalibratedClassifierCV(model, method='sigmoid', cv=5) cal_sigmoid.fit(X_train, y_train) prob_sigmoid = cal_sigmoid.predict_proba(X_test)[:, 1]

Isotonic calibration cal_isotonic = CalibratedClassifierCV(model, method='isotonic', cv=5) cal_isotonic.fit(X_train, y_train) prob_isotonic = cal_isotonic.predict_proba(X_test)[:, 1]

Compare print(f"Original Brier: {brier_score_loss(y_test, prob_original):.4f}") print(f"Sigmoid Brier: {brier_score_loss(y_test, prob_sigmoid):.4f}") print(f"Isotonic Brier: {brier_score_loss(y_test, prob_isotonic):.4f}") ```

Choosing Calibration Method

| Method | Best When | |--------|-----------| | Sigmoid | Enough data, S-shaped miscalibration | | Isotonic | Lots of data, arbitrary miscalibration |

Sigmoid is safer with less data (isotonic can overfit).

Full Pipeline Example

```python from sklearn.ensemble import RandomForestClassifier from sklearn.calibration import CalibratedClassifierCV, calibration_curve from sklearn.model_selection import train_test_split

Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Train base model rf = RandomForestClassifier(n_estimators=100)

Create calibrated version rf_calibrated = CalibratedClassifierCV(rf, method='sigmoid', cv=5) rf_calibrated.fit(X_train, y_train)

Compare calibration curves fig, axes = plt.subplots(1, 2, figsize=(12, 5))

for ax, (name, model) in zip(axes, [('Uncalibrated', rf), ('Calibrated', rf_calibrated)]): if name == 'Uncalibrated': model.fit(X_train, y_train) prob = model.predict_proba(X_test)[:, 1] prob_true, prob_pred = calibration_curve(y_test, prob, n_bins=10) ax.plot(prob_pred, prob_true, 's-') ax.plot([0, 1], [0, 1], '--') ax.set_title(f'{name}\nBrier: {brier_score_loss(y_test, prob):.4f}')

plt.tight_layout() ```

Key Takeaway

Calibration makes probability predictions reliable. Always check calibration with reliability diagrams, especially when probabilities drive decisions. Random forests and Naive Bayes often need calibration, while logistic regression is usually well-calibrated. Use sigmoid calibration by default, isotonic if you have lots of data.

#Machine Learning#Calibration#Probability#Intermediate