Learn how AutoML automates model selection, hyperparameter tuning, and feature engineering.

Introduction to AutoML

Tired of trying hundreds of model configurations? AutoML does it for you. It automatically searches for the best model and hyperparameters.

What AutoML Does

``` Manual ML: You choose model → You tune hyperparameters → You engineer features

AutoML: You provide data → AutoML searches → Best model returned ```

Popular AutoML Libraries

| Library | Strengths | Best For | |---------|-----------|----------| | Auto-sklearn | Solid, well-tested | Classification, regression | | TPOT | Genetic algorithm | Finding novel pipelines | | H2O AutoML | Fast, scalable | Large datasets | | AutoGluon | Easy, powerful | Quick wins |

Auto-sklearn Example

```python import autosklearn.classification from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Create AutoML classifier automl = autosklearn.classification.AutoSklearnClassifier( time_left_for_this_task=300, # 5 minutes total per_run_time_limit=30, # 30 seconds per model n_jobs=-1 )

Fit - it will try many models automatically automl.fit(X_train, y_train)

Evaluate predictions = automl.predict(X_test) print(f"Accuracy: {accuracy_score(y_test, predictions):.3f}")

See what it found print(automl.leaderboard()) ```

TPOT Example

TPOT uses genetic programming to evolve ML pipelines:

```python from tpot import TPOTClassifier

tpot = TPOTClassifier( generations=5, population_size=50, cv=5, random_state=42, verbosity=2, n_jobs=-1 )

tpot.fit(X_train, y_train) print(f"Score: {tpot.score(X_test, y_test):.3f}")

Export the best pipeline as Python code tpot.export('best_pipeline.py') ```

AutoGluon Example (Easiest)

```python from autogluon.tabular import TabularPredictor

Just give it a DataFrame with target column predictor = TabularPredictor(label='target').fit( train_data, time_limit=300 # 5 minutes )

Predict predictions = predictor.predict(test_data)

See leaderboard predictor.leaderboard() ```

What AutoML Actually Searches

1. **Algorithm selection:** Tries multiple model types 2. **Hyperparameter tuning:** Optimizes parameters 3. **Feature preprocessing:** Scaling, encoding, selection 4. **Ensembling:** Combines best models

Time Budget Tips

```python # Quick exploration (5 minutes) time_left_for_this_task=300

Serious attempt (1 hour) time_left_for_this_task=3600

Overnight run (8 hours) time_left_for_this_task=28800 ```

More time = better results (usually), but diminishing returns.

When to Use AutoML

**Good for:** - Quick baselines - When you don't know which algorithm to use - Hyperparameter search - Kaggle competitions

**Not ideal for:** - When you need to understand the model - Highly specialized problems - When inference speed matters (AutoML often picks ensembles) - Very large datasets (can be slow)

AutoML Limitations

1. **Black box:** May not explain why it chose certain models 2. **Compute intensive:** Tries many configurations 3. **May overfit:** Especially with small datasets 4. **No domain knowledge:** Doesn't understand your problem

Best Practice: Use AutoML as a Starting Point

```python # 1. Run AutoML to find good candidates automl.fit(X_train, y_train) print(automl.leaderboard())

2. See what models/parameters it found print(automl.show_models())

3. Take best ideas and build simpler, interpretable model # If AutoML found XGBoost with certain params works well, # train your own XGBoost with those params ```

Key Takeaway

AutoML is a powerful tool for quickly finding good models. Use it for baselines, exploration, and when you're stuck. But don't treat it as a complete solution - understand what it found, and consider building simpler models based on its insights. It's a tool to augment your skills, not replace understanding!

Introduction to AutoML

Introduction to AutoML

What AutoML Does

Popular AutoML Libraries

Auto-sklearn Example

Create AutoML classifier automl = autosklearn.classification.AutoSklearnClassifier( time_left_for_this_task=300, # 5 minutes total per_run_time_limit=30, # 30 seconds per model n_jobs=-1 )

Fit - it will try many models automatically automl.fit(X_train, y_train)

Evaluate predictions = automl.predict(X_test) print(f"Accuracy: {accuracy_score(y_test, predictions):.3f}")

See what it found print(automl.leaderboard()) ```

TPOT Example

Export the best pipeline as Python code tpot.export('best_pipeline.py') ```

AutoGluon Example (Easiest)

Just give it a DataFrame with target column predictor = TabularPredictor(label='target').fit( train_data, time_limit=300 # 5 minutes )

Predict predictions = predictor.predict(test_data)

See leaderboard predictor.leaderboard() ```

What AutoML Actually Searches

Time Budget Tips

Serious attempt (1 hour) time_left_for_this_task=3600

Overnight run (8 hours) time_left_for_this_task=28800 ```

When to Use AutoML

AutoML Limitations

Best Practice: Use AutoML as a Starting Point

2. See what models/parameters it found print(automl.show_models())

3. Take best ideas and build simpler, interpretable model # If AutoML found XGBoost with certain params works well, # train your own XGBoost with those params ```

Key Takeaway

More on ML

What is Machine Learning? A Simple Introduction

Supervised vs Unsupervised Learning Explained

Understanding Training, Validation, and Test Sets