Ensemble Learning
Combine multiple models for better predictions.
Teamwork makes AI better.
What is Ensemble Learning?
Combining multiple models to get better results.
**Idea**: Ask 10 doctors instead of 1!
Types of Ensembles
**1. Bagging**: Train same model on different data **2. Boosting**: Train models sequentially, fix mistakes **3. Stacking**: Combine different model types
Bagging Example - Random Forest
```python from sklearn.ensemble import BaggingClassifier from sklearn.tree import DecisionTreeClassifier
base_model = DecisionTreeClassifier() bagging = BaggingClassifier( base_model, n_estimators=10, # 10 trees max_samples=0.8 # 80% of data each )
bagging.fit(X_train, y_train) ```
Boosting Example - AdaBoost
Focus on mistakes:
```python from sklearn.ensemble import AdaBoostClassifier
model = AdaBoostClassifier(n_estimators=50) model.fit(X_train, y_train)
Each new model focuses on previous mistakes ```
Gradient Boosting
Most powerful boosting method:
```python from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier( n_estimators=100, learning_rate=0.1, max_depth=3 ) model.fit(X_train, y_train) ```
XGBoost - Industry Standard
```python import xgboost as xgb
model = xgb.XGBClassifier( n_estimators=100, learning_rate=0.1, max_depth=5 ) model.fit(X_train, y_train) ```
Voting Classifier
Combine different models:
```python from sklearn.ensemble import VotingClassifier from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.svm import SVC
voting = VotingClassifier( estimators=[ ('lr', LogisticRegression()), ('dt', DecisionTreeClassifier()), ('svc', SVC()) ], voting='hard' # Majority vote )
voting.fit(X_train, y_train) ```
When to Use
- **Bagging**: Reduce overfitting - **Boosting**: Improve accuracy - **Stacking**: Maximum performance
Remember
- Ensemble usually better than single model - XGBoost often wins competitions - Trade-off: accuracy vs training time