Ensemble Stacking: Combining Multiple Models
Learn how stacking combines predictions from multiple models using a meta-learner.
Ensemble Stacking: Combining Multiple Models
Random Forest averages trees. Gradient Boosting chains them. Stacking goes further: train different model types, then train another model to combine their predictions.
Stacking Architecture
``` Level 0 (Base Models): ┌─────────────┬─────────────┬─────────────┐ │ Logistic │ Random │ SVM │ │ Regression │ Forest │ │ └──────┬──────┴──────┬──────┴──────┬──────┘ │ │ │ v v v pred_1 pred_2 pred_3 │ │ │ └─────────────┼─────────────┘ │ v Level 1 (Meta-Learner): ┌─────────────┐ │ XGBoost │ └──────┬──────┘ │ v Final Prediction ```
Why Stacking Works
Different models capture different patterns: - Linear models: linear relationships - Trees: non-linear interactions - SVMs: complex boundaries
The meta-learner learns when to trust each model.
Implementation with Scikit-Learn
```python from sklearn.ensemble import StackingClassifier from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier from sklearn.svm import SVC from sklearn.neighbors import KNeighborsClassifier
Base models base_models = [ ('lr', LogisticRegression(max_iter=1000)), ('rf', RandomForestClassifier(n_estimators=100)), ('svm', SVC(probability=True)), ('knn', KNeighborsClassifier()) ]
Meta-learner meta_model = GradientBoostingClassifier(n_estimators=50)
Stack them stacking_clf = StackingClassifier( estimators=base_models, final_estimator=meta_model, cv=5, # Cross-validation for base model predictions stack_method='predict_proba' # Use probabilities as features )
stacking_clf.fit(X_train, y_train) print(f"Accuracy: {stacking_clf.score(X_test, y_test):.3f}") ```
Manual Implementation (More Control)
```python from sklearn.model_selection import cross_val_predict import numpy as np
class SimpleStacker: def __init__(self, base_models, meta_model): self.base_models = base_models self.meta_model = meta_model def fit(self, X, y): # Get cross-validated predictions from base models base_predictions = [] for name, model in self.base_models: preds = cross_val_predict(model, X, y, cv=5, method='predict_proba') base_predictions.append(preds[:, 1]) # Probability of positive class # Create meta-features meta_features = np.column_stack(base_predictions) # Train base models on full data for name, model in self.base_models: model.fit(X, y) # Train meta-model self.meta_model.fit(meta_features, y) return self def predict(self, X): # Get predictions from base models base_predictions = [] for name, model in self.base_models: preds = model.predict_proba(X)[:, 1] base_predictions.append(preds) meta_features = np.column_stack(base_predictions) return self.meta_model.predict(meta_features) ```
Stacking for Regression
```python from sklearn.ensemble import StackingRegressor from sklearn.linear_model import Ridge, Lasso from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
base_models = [ ('ridge', Ridge()), ('lasso', Lasso()), ('rf', RandomForestRegressor(n_estimators=100)) ]
stacking_reg = StackingRegressor( estimators=base_models, final_estimator=GradientBoostingRegressor(n_estimators=50), cv=5 )
stacking_reg.fit(X_train, y_train) ```
Tips for Better Stacking
### 1. Use Diverse Base Models
```python # Good: Different types base_models = [ ('linear', LogisticRegression()), # Linear ('tree', RandomForestClassifier()), # Tree-based ('svm', SVC(probability=True)), # Kernel-based ('nn', MLPClassifier()) # Neural network ]
Less good: All trees base_models = [ ('rf1', RandomForestClassifier(n_estimators=100)), ('rf2', RandomForestClassifier(n_estimators=200)), ('gb', GradientBoostingClassifier()) ] ```
### 2. Include Original Features (Optional)
```python stacking_clf = StackingClassifier( estimators=base_models, final_estimator=meta_model, passthrough=True # Include original features for meta-learner ) ```
### 3. Use Simple Meta-Learner
Complex meta-learners can overfit to the base predictions:
```python # Often works well meta_model = LogisticRegression()
Or simple tree model meta_model = GradientBoostingClassifier(n_estimators=50, max_depth=3) ```
When to Use Stacking
**Good for:** - Kaggle competitions (squeezing every bit of accuracy) - When different models excel at different subsets - Final boost after individual models are tuned
**Not ideal for:** - When interpretability is needed - Real-time predictions (slow) - When you need a quick solution
Key Takeaway
Stacking combines diverse models through a meta-learner that learns optimal weighting. Use diverse base models (different types, not just different hyperparameters), cross-validate to prevent data leakage, and keep the meta-learner simple. Stacking often gives 1-3% improvement over the best single model - small but can be decisive in competitions!