Gradient Boosting: Sequential Ensemble Learning
Understand how Gradient Boosting builds trees sequentially, each fixing the mistakes of the previous ones.
Gradient Boosting: Sequential Ensemble Learning
Random Forest builds trees in parallel. Gradient Boosting builds them sequentially, where each tree learns from the mistakes of all previous trees.
The Core Idea
- Train a simple model
- Calculate errors (residuals)
- Train next model to predict those errors
- Add to ensemble
- Repeat
Each new tree focuses on what the ensemble got wrong.
Random Forest vs Gradient Boosting
| Aspect | Random Forest | Gradient Boosting |
|---|---|---|
| Tree Building | Parallel | Sequential |
| Tree Type | Deep, independent | Shallow, dependent |
| Speed | Faster training | Slower training |
| Overfitting | More resistant | Can overfit easily |
| Tuning | Less sensitive | Needs careful tuning |
Implementation with XGBoost
XGBoost is the most popular gradient boosting library:
import xgboost as xgb
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Create and train
model = xgb.XGBClassifier(
n_estimators=100,
max_depth=3, # Shallow trees
learning_rate=0.1, # Step size
subsample=0.8, # Row sampling
colsample_bytree=0.8 # Column sampling
)
model.fit(X_train, y_train)
# Evaluate
accuracy = model.score(X_test, y_test)
print(f"Accuracy: {accuracy:.2f}")
Key Parameters
n_estimators: Number of boosting rounds
- More = potentially better, but slower
- Use early stopping to find optimal
learning_rate: How much each tree contributes
- Lower = more trees needed, but better generalization
- Typical: 0.01 - 0.3
max_depth: Tree depth
- Shallow (3-6) for boosting
- Deeper trees can overfit
Early Stopping
Don't guess the number of trees. Let the data tell you:
model = xgb.XGBClassifier(
n_estimators=1000,
learning_rate=0.1,
early_stopping_rounds=50
)
model.fit(
X_train, y_train,
eval_set=[(X_val, y_val)],
verbose=False
)
print(f"Best iteration: {model.best_iteration}")
Popular Libraries
- XGBoost - Fast, widely used
- LightGBM - Even faster, good for large data
- CatBoost - Great with categorical features
When to Use
Gradient Boosting shines when:
- You need maximum accuracy
- You have time to tune
- Tabular/structured data
- Kaggle competitions!
Consider alternatives when:
- Training speed is critical
- You have limited data (may overfit)
- Interpretability is key
Key Takeaway
Gradient Boosting often gives the best performance on tabular data. Start with XGBoost, use shallow trees (max_depth=3-6), and always use early stopping. The learning_rate and n_estimators work together - lower rate needs more estimators.