AutoML and Neural Architecture Search
Automate machine learning pipeline.
AI that builds AI.
What is AutoML?
Automatically find best model and hyperparameters.
**Goal**: Make ML accessible to everyone
Why AutoML?
- Save time on experimentation - Find models you wouldn't try manually - Optimize better than humans - Democratize ML
Auto-Sklearn
Automated sklearn pipeline:
```python from autosklearn.classification import AutoSklearnClassifier
Create AutoML model automl = AutoSklearnClassifier( time_left_for_this_task=3600, # 1 hour per_run_time_limit=300, # 5 min per model )
Fit - tries many models automatically automl.fit(X_train, y_train)
Get best model print(automl.show_models())
Predict predictions = automl.predict(X_test)
See what worked best print(automl.leaderboard()) ```
TPOT - Genetic Programming
Evolves ML pipelines:
```python from tpot import TPOTClassifier
Genetic algorithm to find best pipeline tpot = TPOTClassifier( generations=5, population_size=50, verbosity=2, random_state=42 )
tpot.fit(X_train, y_train)
Get accuracy print(tpot.score(X_test, y_test))
Export best pipeline as Python code tpot.export('best_pipeline.py') ```
Generated pipeline might look like:
```python # Auto-generated by TPOT from sklearn.ensemble import RandomForestClassifier from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline
pipeline = Pipeline([ ('scaler', StandardScaler()), ('classifier', RandomForestClassifier(n_estimators=100, max_depth=10)) ]) ```
H2O AutoML
Enterprise-grade AutoML:
```python import h2o from h2o.automl import H2OAutoML
h2o.init()
Load data train = h2o.import_file("train.csv")
Specify target and features y = "target" X = train.columns X.remove(y)
Run AutoML aml = H2OAutoML(max_models=20, max_runtime_secs=3600) aml.train(x=X, y=y, training_frame=train)
View leaderboard lb = aml.leaderboard print(lb.head())
Best model best_model = aml.leader predictions = best_model.predict(test) ```
Neural Architecture Search (NAS)
Find best neural network architecture:
```python import keras_tuner as kt
def build_model(hp): model = Sequential() # Tune number of layers for i in range(hp.Int('num_layers', 1, 5)): model.add(Dense( units=hp.Int(f'units_{i}', 32, 512, step=32), activation=hp.Choice('activation', ['relu', 'tanh']) )) model.add(Dense(10, activation='softmax')) # Tune learning rate model.compile( optimizer=Adam(hp.Float('learning_rate', 1e-4, 1e-2, sampling='log')), loss='categorical_crossentropy', metrics=['accuracy'] ) return model
Search for best architecture tuner = kt.RandomSearch( build_model, objective='val_accuracy', max_trials=50, directory='nas_results' )
tuner.search(X_train, y_train, epochs=10, validation_data=(X_val, y_val))
Get best model best_model = tuner.get_best_models(num_models=1)[0] ```
Optuna - Hyperparameter Optimization
```python import optuna
def objective(trial): # Suggest hyperparameters n_estimators = trial.suggest_int('n_estimators', 50, 500) max_depth = trial.suggest_int('max_depth', 2, 32) learning_rate = trial.suggest_float('learning_rate', 1e-4, 1e-1, log=True) # Train model model = XGBClassifier( n_estimators=n_estimators, max_depth=max_depth, learning_rate=learning_rate ) model.fit(X_train, y_train) # Return metric to optimize return model.score(X_val, y_val)
Optimize study = optuna.create_study(direction='maximize') study.optimize(objective, n_trials=100)
Best hyperparameters print(f"Best params: {study.best_params}") print(f"Best score: {study.best_value}")
Visualize optimization optuna.visualization.plot_optimization_history(study) optuna.visualization.plot_param_importances(study) ```
Google Cloud AutoML
```python from google.cloud import automl
Create client client = automl.AutoMlClient()
Create dataset dataset = client.create_dataset( parent=f"projects/{project_id}/locations/us-central1", dataset={ "display_name": "my_dataset", "image_classification_dataset_metadata": {} } )
Import images # Train model (automatically) # Deploy model # All handled by Google Cloud ```
Benefits
- Save development time - Try many models automatically - Good starting point - Reproducible
Limitations
- Expensive (compute) - Black box (less control) - May not beat expert tuning - Limited custom features
Best Practices
1. Start with AutoML for baseline 2. Use found architecture as starting point 3. Combine with domain knowledge 4. Set reasonable time budget 5. Always validate manually
Remember
- AutoML is a powerful starting point - Not a replacement for ML knowledge - Great for prototyping - Can discover surprising solutions