Model Calibration: Reliable Probability Estimates
Learn how to calibrate your model so predicted probabilities reflect actual likelihoods.
Model Calibration: Reliable Probability Estimates
When your model says "80% probability of rain," it should actually rain 80% of the time. That's calibration - making probability estimates reliable.
Why Calibration Matters
``` Uncalibrated model says 90% confidence: - Sometimes right 90% of the time ✓ - Sometimes right only 60% of the time ✗
Calibrated model says 90% confidence: - Right 90% of the time ✓ ```
Important for: - Medical diagnosis - Risk assessment - Decision making with costs/benefits
Checking Calibration: Reliability Diagram
```python from sklearn.calibration import calibration_curve import matplotlib.pyplot as plt
Get predicted probabilities y_prob = model.predict_proba(X_test)[:, 1]
Create calibration curve prob_true, prob_pred = calibration_curve(y_test, y_prob, n_bins=10)
Plot plt.figure(figsize=(8, 6)) plt.plot(prob_pred, prob_true, 's-', label='Model') plt.plot([0, 1], [0, 1], '--', label='Perfectly calibrated') plt.xlabel('Mean predicted probability') plt.ylabel('Fraction of positives') plt.title('Calibration Curve') plt.legend() plt.show() ```
Interpreting the Plot
``` Perfectly Calibrated: Points on diagonal line
Under-confident: Over-confident: Above diagonal Below diagonal Says 60%, is 80% Says 80%, is 60% ```
Brier Score
Measures calibration quality (lower is better):
```python from sklearn.metrics import brier_score_loss
brier = brier_score_loss(y_test, y_prob) print(f"Brier Score: {brier:.4f}") # 0 = perfect, 0.25 = random ```
Which Models Need Calibration?
| Model | Typically Calibrated? | |-------|----------------------| | Logistic Regression | Usually good | | Naive Bayes | Often extreme (needs calibration) | | Random Forest | Usually under-confident | | SVM | Often needs calibration | | Neural Networks | Varies | | XGBoost | Usually good |
Calibration Methods
### 1. Platt Scaling (Sigmoid)
Fits a logistic regression to the outputs:
```python from sklearn.calibration import CalibratedClassifierCV
Wrap your model with calibration calibrated = CalibratedClassifierCV(model, method='sigmoid', cv=5) calibrated.fit(X_train, y_train)
Now probabilities are calibrated y_prob_calibrated = calibrated.predict_proba(X_test)[:, 1] ```
### 2. Isotonic Regression
Non-parametric, more flexible:
```python calibrated = CalibratedClassifierCV(model, method='isotonic', cv=5) calibrated.fit(X_train, y_train) ```
### Comparing Methods
```python # Original model model.fit(X_train, y_train) prob_original = model.predict_proba(X_test)[:, 1]
Sigmoid calibration cal_sigmoid = CalibratedClassifierCV(model, method='sigmoid', cv=5) cal_sigmoid.fit(X_train, y_train) prob_sigmoid = cal_sigmoid.predict_proba(X_test)[:, 1]
Isotonic calibration cal_isotonic = CalibratedClassifierCV(model, method='isotonic', cv=5) cal_isotonic.fit(X_train, y_train) prob_isotonic = cal_isotonic.predict_proba(X_test)[:, 1]
Compare print(f"Original Brier: {brier_score_loss(y_test, prob_original):.4f}") print(f"Sigmoid Brier: {brier_score_loss(y_test, prob_sigmoid):.4f}") print(f"Isotonic Brier: {brier_score_loss(y_test, prob_isotonic):.4f}") ```
Choosing Calibration Method
| Method | Best When | |--------|-----------| | Sigmoid | Enough data, S-shaped miscalibration | | Isotonic | Lots of data, arbitrary miscalibration |
Sigmoid is safer with less data (isotonic can overfit).
Full Pipeline Example
```python from sklearn.ensemble import RandomForestClassifier from sklearn.calibration import CalibratedClassifierCV, calibration_curve from sklearn.model_selection import train_test_split
Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Train base model rf = RandomForestClassifier(n_estimators=100)
Create calibrated version rf_calibrated = CalibratedClassifierCV(rf, method='sigmoid', cv=5) rf_calibrated.fit(X_train, y_train)
Compare calibration curves fig, axes = plt.subplots(1, 2, figsize=(12, 5))
for ax, (name, model) in zip(axes, [('Uncalibrated', rf), ('Calibrated', rf_calibrated)]): if name == 'Uncalibrated': model.fit(X_train, y_train) prob = model.predict_proba(X_test)[:, 1] prob_true, prob_pred = calibration_curve(y_test, prob, n_bins=10) ax.plot(prob_pred, prob_true, 's-') ax.plot([0, 1], [0, 1], '--') ax.set_title(f'{name}\nBrier: {brier_score_loss(y_test, prob):.4f}')
plt.tight_layout() ```
Key Takeaway
Calibration makes probability predictions reliable. Always check calibration with reliability diagrams, especially when probabilities drive decisions. Random forests and Naive Bayes often need calibration, while logistic regression is usually well-calibrated. Use sigmoid calibration by default, isotonic if you have lots of data.