ML9 min read

Debugging Machine Learning Models

Learn systematic approaches to debugging when your ML model isn't performing as expected.

Sarah Chen
December 19, 2025
0.0k0

Debugging Machine Learning Models

Your model isn't working. Before trying random things, follow a systematic debugging approach.

The Debugging Checklist

``` 1. Check your data 2. Start simple 3. Verify the pipeline 4. Analyze errors 5. Check for bugs in code ```

Step 1: Check Your Data

Most ML problems are data problems.

### Look at Your Data

```python # Basic stats print(df.describe()) print(df.info())

Target distribution print(df['target'].value_counts(normalize=True))

Missing values print(df.isnull().sum())

Actually look at examples print(df.head(20)) ```

### Check for Data Issues

```python # Duplicates print(f"Duplicates: {df.duplicated().sum()}")

Outliers for col in numeric_cols: q99 = df[col].quantile(0.99) q01 = df[col].quantile(0.01) print(f"{col}: 1%={q01:.2f}, 99%={q99:.2f}")

Class imbalance print(df['target'].value_counts())

Leakage check - correlations with target correlations = df.corr()['target'].sort_values(ascending=False) print(correlations.head(10)) ```

Step 2: Start Simple

If a complex model doesn't work, try something simpler first.

```python from sklearn.dummy import DummyClassifier from sklearn.linear_model import LogisticRegression

Baseline: How good is random/majority? dummy = DummyClassifier(strategy='most_frequent') dummy.fit(X_train, y_train) print(f"Dummy baseline: {dummy.score(X_test, y_test):.3f}")

Simple model lr = LogisticRegression() lr.fit(X_train, y_train) print(f"Logistic Regression: {lr.score(X_test, y_test):.3f}")

If LR doesn't beat dummy, you have a data problem ```

Step 3: Verify the Pipeline

### Can the Model Overfit?

A model should be able to memorize training data:

```python # Train on small subset X_small = X_train[:100] y_small = y_train[:100]

model.fit(X_small, y_small) train_score = model.score(X_small, y_small)

print(f"Train score on small data: {train_score:.3f}") # Should be close to 1.0! If not, model can't learn at all. ```

### Check Preprocessing

```python # Verify shapes at each step print(f"Original: {X.shape}") print(f"After preprocessing: {X_processed.shape}") print(f"After train/test split: {X_train.shape}, {X_test.shape}")

Check for NaN after preprocessing print(f"NaN in processed: {np.isnan(X_processed).sum()}")

Check for inf print(f"Inf in processed: {np.isinf(X_processed).sum()}") ```

### Verify Labels

```python # Are labels correct? print(f"Unique labels: {np.unique(y)}") print(f"Label types: {y.dtype}")

Check alignment assert len(X) == len(y), "X and y length mismatch!" ```

Step 4: Analyze Errors

### Confusion Matrix

```python from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

cm = confusion_matrix(y_test, y_pred) ConfusionMatrixDisplay(cm).plot()

What's it getting wrong? ```

### Look at Misclassified Examples

```python # Find errors errors = X_test[y_test != y_pred] error_true = y_test[y_test != y_pred] error_pred = y_pred[y_test != y_pred]

Look at them for i in range(min(10, len(errors))): print(f"True: {error_true.iloc[i]}, Pred: {error_pred[i]}") print(errors.iloc[i]) print("---") ```

### Error Analysis by Subgroup

```python # Performance by category for category in df['category'].unique(): mask = X_test['category'] == category score = accuracy_score(y_test[mask], y_pred[mask]) print(f"{category}: {score:.3f}") ```

Step 5: Common Bugs

### Bug: Using Test Data for Training

```python # WRONG scaler.fit(X) # Includes test data! model.fit(X_train, y_train)

RIGHT scaler.fit(X_train) model.fit(X_train, y_train) ```

### Bug: Shuffled Labels

```python # Check if labels match data # After any reshaping, verify alignment print(f"First 5 X rows: {X.head()}") print(f"First 5 y values: {y.head()}") ```

### Bug: Wrong Metric

```python # For imbalanced data, accuracy is misleading print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}") print(f"F1: {f1_score(y_test, y_pred):.3f}") # Much more informative ```

Debugging Workflow

``` Model not learning? ├── Check: Can it overfit small data? │ └── No → Bug in code or data ├── Check: Does simple model work? │ └── No → Data problem └── Check: Is training loss decreasing? └── No → Learning rate / architecture issue

Model overfitting? ├── Try regularization ├── Try simpler model ├── Get more data └── Check for data leakage

Model underfitting? ├── Try more complex model ├── Add features ├── Reduce regularization └── Check preprocessing ```

Key Takeaway

Debug systematically, not randomly. Start with data checks, verify you can overfit, use simple baselines, and analyze actual errors. Most problems are data problems or bugs in preprocessing. When stuck, simplify until something works, then add complexity back gradually.

#Machine Learning#Debugging#Troubleshooting#Intermediate