Random Forests
Improve predictions using multiple decision trees.
Many trees = Better predictions.
What is Random Forest?
Collection of many decision trees working together!
Like asking 100 experts instead of 1.
How It Works
1. Create many decision trees 2. Each tree votes on answer 3. Majority vote wins
Example
Predicting if customer will buy product in Austin:
- Tree 1: Yes - Tree 2: No - Tree 3: Yes - Tree 4: Yes - Tree 5: Yes
**Final answer**: Yes (3/5 voted yes)
Python Code
```python from sklearn.ensemble import RandomForestClassifier
Customer data: [age, income_k] X = [[25, 40], [30, 60], [35, 80], [40, 100]] y = [0, 0, 1, 1] # 0=No buy, 1=Buy
Train with 100 trees model = RandomForestClassifier(n_estimators=100) model.fit(X, y)
Predict customer = [[32, 70]] prediction = model.predict(customer) print("Will buy!" if prediction[0] == 1 else "Won't buy") ```
Why Better Than Single Tree?
- Less overfitting - More stable - More accurate - Handles noise better
Disadvantages
- Slower to train - Harder to interpret - Uses more memory
Feature Importance
Random Forest shows which features matter most!
```python importances = model.feature_importances_ print(f"Age importance: {importances[0]}") print(f"Income importance: {importances[1]}") ```
Remember
- Very powerful algorithm - Great for most problems - Trade-off: accuracy vs speed