AI6 min read

Handling Imbalanced Data

Deal with unequal class distributions.

Robert Anderson
December 18, 2025
0.0k0

Fix unbalanced datasets.

What is Imbalanced Data?

When one class has way more examples than others.

**Example**: Fraud detection - 99,000 normal transactions - 1,000 fraud transactions

Why It's a Problem

Model learns to always predict majority class:

Predict "Not Fraud" for everything → 99% accuracy! But catches 0% of actual fraud!

Solution 1: Oversampling

Add more minority class examples:

```python from imblearn.over_sampling import SMOTE

smote = SMOTE() X_resampled, y_resampled = smote.fit_resample(X, y)

print(f"Original: {len(y)}") print(f"After SMOTE: {len(y_resampled)}") ```

Solution 2: Undersampling

Remove majority class examples:

```python from imblearn.under_sampling import RandomUnderSampler

rus = RandomUnderSampler() X_resampled, y_resampled = rus.fit_resample(X, y) ```

Solution 3: Class Weights

Tell model to care more about minority class:

```python from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(class_weight='balanced') model.fit(X, y) ```

Solution 4: Different Metrics

Don't use accuracy! Use:

**Precision**: Of predicted frauds, how many were real? **Recall**: Of real frauds, how many did we catch? **F1-Score**: Balance of precision and recall

```python from sklearn.metrics import classification_report

predictions = model.predict(X_test) print(classification_report(y_test, predictions)) ```

Real Example - Credit Card Fraud

```python from imblearn.over_sampling import SMOTE from sklearn.ensemble import RandomForestClassifier

Apply SMOTE smote = SMOTE() X_train_balanced, y_train_balanced = smote.fit_resample(X_train, y_train)

Train with class weights model = RandomForestClassifier(class_weight='balanced') model.fit(X_train_balanced, y_train_balanced) ```

When to Use What

- **Lots of data**: Undersample - **Little data**: Oversample (SMOTE) - **Can't resample**: Use class weights

Remember

- Never use accuracy alone - Use F1-score or AUC - Combine multiple techniques

#AI#Intermediate#Imbalanced