Ensemble Stacking: Combining Multiple Models

Random Forest averages trees. Gradient Boosting chains them. Stacking goes further: train different model types, then train another model to combine their predictions.

Stacking Architecture

Level 0 (Base Models):
┌─────────────┬─────────────┬─────────────┐
│  Logistic   │   Random    │     SVM     │
│ Regression  │   Forest    │             │
└──────┬──────┴──────┬──────┴──────┬──────┘
       │             │             │
       v             v             v
     pred_1       pred_2       pred_3
       │             │             │
       └─────────────┼─────────────┘
                     │
                     v
Level 1 (Meta-Learner):
              ┌─────────────┐
              │   XGBoost   │
              └──────┬──────┘
                     │
                     v
              Final Prediction

Why Stacking Works

Different models capture different patterns:

Linear models: linear relationships
Trees: non-linear interactions
SVMs: complex boundaries

The meta-learner learns when to trust each model.

Implementation with Scikit-Learn

from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

# Base models
base_models = [
    ('lr', LogisticRegression(max_iter=1000)),
    ('rf', RandomForestClassifier(n_estimators=100)),
    ('svm', SVC(probability=True)),
    ('knn', KNeighborsClassifier())
]

# Meta-learner
meta_model = GradientBoostingClassifier(n_estimators=50)

# Stack them
stacking_clf = StackingClassifier(
    estimators=base_models,
    final_estimator=meta_model,
    cv=5,  # Cross-validation for base model predictions
    stack_method='predict_proba'  # Use probabilities as features
)

stacking_clf.fit(X_train, y_train)
print(f"Accuracy: {stacking_clf.score(X_test, y_test):.3f}")

Manual Implementation (More Control)

from sklearn.model_selection import cross_val_predict
import numpy as np

class SimpleStacker:
    def __init__(self, base_models, meta_model):
        self.base_models = base_models
        self.meta_model = meta_model
        
    def fit(self, X, y):
        # Get cross-validated predictions from base models
        base_predictions = []
        for name, model in self.base_models:
            preds = cross_val_predict(model, X, y, cv=5, method='predict_proba')
            base_predictions.append(preds[:, 1])  # Probability of positive class
        
        # Create meta-features
        meta_features = np.column_stack(base_predictions)
        
        # Train base models on full data
        for name, model in self.base_models:
            model.fit(X, y)
        
        # Train meta-model
        self.meta_model.fit(meta_features, y)
        
        return self
    
    def predict(self, X):
        # Get predictions from base models
        base_predictions = []
        for name, model in self.base_models:
            preds = model.predict_proba(X)[:, 1]
            base_predictions.append(preds)
        
        meta_features = np.column_stack(base_predictions)
        return self.meta_model.predict(meta_features)

Stacking for Regression

from sklearn.ensemble import StackingRegressor
from sklearn.linear_model import Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor

base_models = [
    ('ridge', Ridge()),
    ('lasso', Lasso()),
    ('rf', RandomForestRegressor(n_estimators=100))
]

stacking_reg = StackingRegressor(
    estimators=base_models,
    final_estimator=GradientBoostingRegressor(n_estimators=50),
    cv=5
)

stacking_reg.fit(X_train, y_train)

Tips for Better Stacking

1. Use Diverse Base Models

# Good: Different types
base_models = [
    ('linear', LogisticRegression()),      # Linear
    ('tree', RandomForestClassifier()),     # Tree-based
    ('svm', SVC(probability=True)),         # Kernel-based
    ('nn', MLPClassifier())                 # Neural network
]

# Less good: All trees
base_models = [
    ('rf1', RandomForestClassifier(n_estimators=100)),
    ('rf2', RandomForestClassifier(n_estimators=200)),
    ('gb', GradientBoostingClassifier())
]

2. Include Original Features (Optional)

stacking_clf = StackingClassifier(
    estimators=base_models,
    final_estimator=meta_model,
    passthrough=True  # Include original features for meta-learner
)

3. Use Simple Meta-Learner

Complex meta-learners can overfit to the base predictions:

# Often works well
meta_model = LogisticRegression()

# Or simple tree model
meta_model = GradientBoostingClassifier(n_estimators=50, max_depth=3)

When to Use Stacking

Good for:

Kaggle competitions (squeezing every bit of accuracy)
When different models excel at different subsets
Final boost after individual models are tuned

Not ideal for:

When interpretability is needed
Real-time predictions (slow)
When you need a quick solution

Key Takeaway

Stacking combines diverse models through a meta-learner that learns optimal weighting. Use diverse base models (different types, not just different hyperparameters), cross-validate to prevent data leakage, and keep the meta-learner simple. Stacking often gives 1-3% improvement over the best single model - small but can be decisive in competitions!

Ensemble Stacking: Combining Multiple Models

Ensemble Stacking: Combining Multiple Models

Stacking Architecture

Why Stacking Works

Implementation with Scikit-Learn

Manual Implementation (More Control)

Stacking for Regression

Tips for Better Stacking

1. Use Diverse Base Models

2. Include Original Features (Optional)

3. Use Simple Meta-Learner

When to Use Stacking

Key Takeaway

More on ML

What is Machine Learning? A Simple Introduction

Supervised vs Unsupervised Learning Explained

Understanding Training, Validation, and Test Sets