MLOps Basics
Best practices for ML in production.
Production ML best practices.
What is MLOps?
DevOps for Machine Learning.
**Goal**: Reliably deploy and maintain ML systems
Key Components
1. **Version Control**: Code, data, models 2. **CI/CD**: Automated testing and deployment 3. **Monitoring**: Track model performance 4. **Reproducibility**: Same code = same results
Version Control
```bash # Git for code git add model.py data_processing.py git commit -m "Update model architecture"
DVC for data and models dvc add data/training_data.csv dvc add models/model.pkl git add data/training_data.csv.dvc models/model.pkl.dvc git commit -m "Update data and model" ```
Experiment Tracking
```python import mlflow
Start experiment mlflow.set_experiment("house_price_prediction")
with mlflow.start_run(): # Log parameters mlflow.log_param("n_estimators", 100) mlflow.log_param("max_depth", 10) # Train model model.fit(X_train, y_train) # Log metrics accuracy = model.score(X_test, y_test) mlflow.log_metric("accuracy", accuracy) # Log model mlflow.sklearn.log_model(model, "model") ```
Automated Training Pipeline
```python # training_pipeline.py def training_pipeline(): # 1. Load data data = load_data('s3://bucket/data.csv') # 2. Preprocess X, y = preprocess(data) X_train, X_test, y_train, y_test = train_test_split(X, y) # 3. Train model = train_model(X_train, y_train) # 4. Evaluate metrics = evaluate_model(model, X_test, y_test) # 5. Save if better if metrics['accuracy'] > current_best: save_model(model, 'production_model.pkl') return metrics
Run automatically if __name__ == "__main__": results = training_pipeline() print(f"New model accuracy: {results['accuracy']}") ```
CI/CD for ML
```.github/workflows/ml_pipeline.yml name: ML Pipeline
on: push: branches: [main]
jobs: train: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Set up Python uses: actions/setup-python@v2 with: python-version: 3.9 - name: Install dependencies run: pip install -r requirements.txt - name: Run tests run: pytest tests/ - name: Train model run: python train.py - name: Validate model run: python validate_model.py - name: Deploy if tests pass run: python deploy.py ```
Monitoring
```python import prometheus_client from flask import Flask
app = Flask(__name__)
Metrics predictions_counter = prometheus_client.Counter( 'predictions_total', 'Total predictions made' )
prediction_time = prometheus_client.Histogram( 'prediction_duration_seconds', 'Time spent making prediction' )
@app.route('/predict', methods=['POST']) @prediction_time.time() def predict(): data = request.json # Make prediction prediction = model.predict([data['features']]) # Track metrics predictions_counter.inc() return jsonify({'prediction': int(prediction[0])}) ```
Data Drift Detection
```python from evidently.dashboard import Dashboard from evidently.tabs import DataDriftTab
Compare training vs production data dashboard = Dashboard(tabs=[DataDriftTab()]) dashboard.calculate(reference_data, production_data)
Alert if drift detected if dashboard.metrics['data_drift']['share_of_drifted_features'] > 0.3: send_alert("Data drift detected!") ```
Model Registry
```python import mlflow
Register model model_uri = "runs:/abc123/model" mlflow.register_model(model_uri, "house_price_model")
Transition to production client = mlflow.tracking.MlflowClient() client.transition_model_version_stage( name="house_price_model", version=3, stage="Production" )
Load production model model = mlflow.pyfunc.load_model("models:/house_price_model/Production") ```
Feature Store
```python from feast import FeatureStore
store = FeatureStore(repo_path=".")
Get features features = store.get_online_features( features=[ 'user_features:age', 'user_features:income', 'transaction_features:avg_amount' ], entity_rows=[{"user_id": 123}] ).to_dict()
Use in prediction prediction = model.predict([list(features.values())]) ```
Best Practices
1. **Version everything**: code, data, models 2. **Automate training**: CI/CD pipelines 3. **Monitor constantly**: accuracy, latency, drift 4. **Test thoroughly**: unit tests, integration tests 5. **Document**: model cards, data sheets
Tools Ecosystem
- **Experiment tracking**: MLflow, Weights & Biases - **Pipelines**: Kubeflow, Airflow - **Monitoring**: Prometheus, Grafana - **Serving**: TensorFlow Serving, Seldon
Remember
- MLOps = DevOps + ML - Automate everything - Monitor in production - Version all artifacts