Production ML best practices.

What is MLOps?

DevOps for Machine Learning.

**Goal**: Reliably deploy and maintain ML systems

Key Components

1. **Version Control**: Code, data, models 2. **CI/CD**: Automated testing and deployment 3. **Monitoring**: Track model performance 4. **Reproducibility**: Same code = same results

Version Control

```bash # Git for code git add model.py data_processing.py git commit -m "Update model architecture"

DVC for data and models dvc add data/training_data.csv dvc add models/model.pkl git add data/training_data.csv.dvc models/model.pkl.dvc git commit -m "Update data and model" ```

Experiment Tracking

```python import mlflow

Start experiment mlflow.set_experiment("house_price_prediction")

with mlflow.start_run(): # Log parameters mlflow.log_param("n_estimators", 100) mlflow.log_param("max_depth", 10) # Train model model.fit(X_train, y_train) # Log metrics accuracy = model.score(X_test, y_test) mlflow.log_metric("accuracy", accuracy) # Log model mlflow.sklearn.log_model(model, "model") ```

Automated Training Pipeline

```python # training_pipeline.py def training_pipeline(): # 1. Load data data = load_data('s3://bucket/data.csv') # 2. Preprocess X, y = preprocess(data) X_train, X_test, y_train, y_test = train_test_split(X, y) # 3. Train model = train_model(X_train, y_train) # 4. Evaluate metrics = evaluate_model(model, X_test, y_test) # 5. Save if better if metrics['accuracy'] > current_best: save_model(model, 'production_model.pkl') return metrics

Run automatically if name == "main": results = training_pipeline() print(f"New model accuracy: {results['accuracy']}") ```

CI/CD for ML

```.github/workflows/ml_pipeline.yml name: ML Pipeline

on: push: branches: [main]

jobs: train: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Set up Python uses: actions/setup-python@v2 with: python-version: 3.9 - name: Install dependencies run: pip install -r requirements.txt - name: Run tests run: pytest tests/ - name: Train model run: python train.py - name: Validate model run: python validate_model.py - name: Deploy if tests pass run: python deploy.py ```

Monitoring

```python import prometheus_client from flask import Flask

app = Flask(__name__)

Metrics predictions_counter = prometheus_client.Counter( 'predictions_total', 'Total predictions made' )

prediction_time = prometheus_client.Histogram( 'prediction_duration_seconds', 'Time spent making prediction' )

@app.route('/predict', methods=['POST']) @prediction_time.time() def predict(): data = request.json # Make prediction prediction = model.predict([data['features']]) # Track metrics predictions_counter.inc() return jsonify({'prediction': int(prediction[0])}) ```

Data Drift Detection

```python from evidently.dashboard import Dashboard from evidently.tabs import DataDriftTab

Compare training vs production data dashboard = Dashboard(tabs=[DataDriftTab()]) dashboard.calculate(reference_data, production_data)

Alert if drift detected if dashboard.metrics['data_drift']['share_of_drifted_features'] > 0.3: send_alert("Data drift detected!") ```

Model Registry

```python import mlflow

Register model model_uri = "runs:/abc123/model" mlflow.register_model(model_uri, "house_price_model")

Transition to production client = mlflow.tracking.MlflowClient() client.transition_model_version_stage( name="house_price_model", version=3, stage="Production" )

Load production model model = mlflow.pyfunc.load_model("models:/house_price_model/Production") ```

Feature Store

```python from feast import FeatureStore

store = FeatureStore(repo_path=".")

Get features features = store.get_online_features( features=[ 'user_features:age', 'user_features:income', 'transaction_features:avg_amount' ], entity_rows=[{"user_id": 123}] ).to_dict()

Use in prediction prediction = model.predict([list(features.values())]) ```

Best Practices

1. **Version everything**: code, data, models 2. **Automate training**: CI/CD pipelines 3. **Monitor constantly**: accuracy, latency, drift 4. **Test thoroughly**: unit tests, integration tests 5. **Document**: model cards, data sheets

Tools Ecosystem

- **Experiment tracking**: MLflow, Weights & Biases - **Pipelines**: Kubeflow, Airflow - **Monitoring**: Prometheus, Grafana - **Serving**: TensorFlow Serving, Seldon

Remember

- MLOps = DevOps + ML - Automate everything - Monitor in production - Version all artifacts

MLOps Basics

What is MLOps?

Key Components

Version Control

DVC for data and models dvc add data/training_data.csv dvc add models/model.pkl git add data/training_data.csv.dvc models/model.pkl.dvc git commit -m "Update data and model" ```

Experiment Tracking

Start experiment mlflow.set_experiment("house_price_prediction")

Automated Training Pipeline

Run automatically if name == "main": results = training_pipeline() print(f"New model accuracy: {results['accuracy']}") ```

CI/CD for ML

Monitoring

Metrics predictions_counter = prometheus_client.Counter( 'predictions_total', 'Total predictions made' )

Data Drift Detection

Compare training vs production data dashboard = Dashboard(tabs=[DataDriftTab()]) dashboard.calculate(reference_data, production_data)

Alert if drift detected if dashboard.metrics['data_drift']['share_of_drifted_features'] > 0.3: send_alert("Data drift detected!") ```

Model Registry

Register model model_uri = "runs:/abc123/model" mlflow.register_model(model_uri, "house_price_model")

Transition to production client = mlflow.tracking.MlflowClient() client.transition_model_version_stage( name="house_price_model", version=3, stage="Production" )

Load production model model = mlflow.pyfunc.load_model("models:/house_price_model/Production") ```

Feature Store

Get features features = store.get_online_features( features=[ 'user_features:age', 'user_features:income', 'transaction_features:avg_amount' ], entity_rows=[{"user_id": 123}] ).to_dict()

Use in prediction prediction = model.predict([list(features.values())]) ```

Best Practices

Tools Ecosystem

Remember

More on AI

What is AI?

AI vs Machine Learning

How AI Works

MLOps Basics

What is MLOps?

Key Components

Version Control

DVC for data and models dvc add data/training_data.csv dvc add models/model.pkl git add data/training_data.csv.dvc models/model.pkl.dvc git commit -m "Update data and model" ```

Experiment Tracking

Start experiment mlflow.set_experiment("house_price_prediction")

Automated Training Pipeline

Run automatically if __name__ == "__main__": results = training_pipeline() print(f"New model accuracy: {results['accuracy']}") ```

CI/CD for ML

Monitoring

Metrics predictions_counter = prometheus_client.Counter( 'predictions_total', 'Total predictions made' )

Data Drift Detection

Compare training vs production data dashboard = Dashboard(tabs=[DataDriftTab()]) dashboard.calculate(reference_data, production_data)

Alert if drift detected if dashboard.metrics['data_drift']['share_of_drifted_features'] > 0.3: send_alert("Data drift detected!") ```

Model Registry

Register model model_uri = "runs:/abc123/model" mlflow.register_model(model_uri, "house_price_model")

Transition to production client = mlflow.tracking.MlflowClient() client.transition_model_version_stage( name="house_price_model", version=3, stage="Production" )

Load production model model = mlflow.pyfunc.load_model("models:/house_price_model/Production") ```

Feature Store

Get features features = store.get_online_features( features=[ 'user_features:age', 'user_features:income', 'transaction_features:avg_amount' ], entity_rows=[{"user_id": 123}] ).to_dict()

Use in prediction prediction = model.predict([list(features.values())]) ```

Best Practices

Tools Ecosystem

Remember

More on AI

What is AI?

AI vs Machine Learning

How AI Works

Run automatically if name == "main": results = training_pipeline() print(f"New model accuracy: {results['accuracy']}") ```