Ensemble Stacking: Combining Multiple Models
Learn how stacking combines predictions from multiple models using a meta-learner.
Ensemble Stacking: Combining Multiple Models
Random Forest averages trees. Gradient Boosting chains them. Stacking goes further: train different model types, then train another model to combine their predictions.
Stacking Architecture
Level 0 (Base Models):
┌─────────────┬─────────────┬─────────────┐
│ Logistic │ Random │ SVM │
│ Regression │ Forest │ │
└──────┬──────┴──────┬──────┴──────┬──────┘
│ │ │
v v v
pred_1 pred_2 pred_3
│ │ │
└─────────────┼─────────────┘
│
v
Level 1 (Meta-Learner):
┌─────────────┐
│ XGBoost │
└──────┬──────┘
│
v
Final Prediction
Why Stacking Works
Different models capture different patterns:
- Linear models: linear relationships
- Trees: non-linear interactions
- SVMs: complex boundaries
The meta-learner learns when to trust each model.
Implementation with Scikit-Learn
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
# Base models
base_models = [
('lr', LogisticRegression(max_iter=1000)),
('rf', RandomForestClassifier(n_estimators=100)),
('svm', SVC(probability=True)),
('knn', KNeighborsClassifier())
]
# Meta-learner
meta_model = GradientBoostingClassifier(n_estimators=50)
# Stack them
stacking_clf = StackingClassifier(
estimators=base_models,
final_estimator=meta_model,
cv=5, # Cross-validation for base model predictions
stack_method='predict_proba' # Use probabilities as features
)
stacking_clf.fit(X_train, y_train)
print(f"Accuracy: {stacking_clf.score(X_test, y_test):.3f}")
Manual Implementation (More Control)
from sklearn.model_selection import cross_val_predict
import numpy as np
class SimpleStacker:
def __init__(self, base_models, meta_model):
self.base_models = base_models
self.meta_model = meta_model
def fit(self, X, y):
# Get cross-validated predictions from base models
base_predictions = []
for name, model in self.base_models:
preds = cross_val_predict(model, X, y, cv=5, method='predict_proba')
base_predictions.append(preds[:, 1]) # Probability of positive class
# Create meta-features
meta_features = np.column_stack(base_predictions)
# Train base models on full data
for name, model in self.base_models:
model.fit(X, y)
# Train meta-model
self.meta_model.fit(meta_features, y)
return self
def predict(self, X):
# Get predictions from base models
base_predictions = []
for name, model in self.base_models:
preds = model.predict_proba(X)[:, 1]
base_predictions.append(preds)
meta_features = np.column_stack(base_predictions)
return self.meta_model.predict(meta_features)
Stacking for Regression
from sklearn.ensemble import StackingRegressor
from sklearn.linear_model import Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
base_models = [
('ridge', Ridge()),
('lasso', Lasso()),
('rf', RandomForestRegressor(n_estimators=100))
]
stacking_reg = StackingRegressor(
estimators=base_models,
final_estimator=GradientBoostingRegressor(n_estimators=50),
cv=5
)
stacking_reg.fit(X_train, y_train)
Tips for Better Stacking
1. Use Diverse Base Models
# Good: Different types
base_models = [
('linear', LogisticRegression()), # Linear
('tree', RandomForestClassifier()), # Tree-based
('svm', SVC(probability=True)), # Kernel-based
('nn', MLPClassifier()) # Neural network
]
# Less good: All trees
base_models = [
('rf1', RandomForestClassifier(n_estimators=100)),
('rf2', RandomForestClassifier(n_estimators=200)),
('gb', GradientBoostingClassifier())
]
2. Include Original Features (Optional)
stacking_clf = StackingClassifier(
estimators=base_models,
final_estimator=meta_model,
passthrough=True # Include original features for meta-learner
)
3. Use Simple Meta-Learner
Complex meta-learners can overfit to the base predictions:
# Often works well
meta_model = LogisticRegression()
# Or simple tree model
meta_model = GradientBoostingClassifier(n_estimators=50, max_depth=3)
When to Use Stacking
Good for:
- Kaggle competitions (squeezing every bit of accuracy)
- When different models excel at different subsets
- Final boost after individual models are tuned
Not ideal for:
- When interpretability is needed
- Real-time predictions (slow)
- When you need a quick solution
Key Takeaway
Stacking combines diverse models through a meta-learner that learns optimal weighting. Use diverse base models (different types, not just different hyperparameters), cross-validate to prevent data leakage, and keep the meta-learner simple. Stacking often gives 1-3% improvement over the best single model - small but can be decisive in competitions!