AI7 min read
MLOps Basics
Best practices for ML in production.
Robert Anderson
December 18, 2025
0.0k0
Production ML best practices.
What is MLOps?
DevOps for Machine Learning.
Goal: Reliably deploy and maintain ML systems
Key Components
- Version Control: Code, data, models
- CI/CD: Automated testing and deployment
- Monitoring: Track model performance
- Reproducibility: Same code = same results
Version Control
# Git for code
git add model.py data_processing.py
git commit -m "Update model architecture"
# DVC for data and models
dvc add data/training_data.csv
dvc add models/model.pkl
git add data/training_data.csv.dvc models/model.pkl.dvc
git commit -m "Update data and model"
Experiment Tracking
import mlflow
# Start experiment
mlflow.set_experiment("house_price_prediction")
with mlflow.start_run():
# Log parameters
mlflow.log_param("n_estimators", 100)
mlflow.log_param("max_depth", 10)
# Train model
model.fit(X_train, y_train)
# Log metrics
accuracy = model.score(X_test, y_test)
mlflow.log_metric("accuracy", accuracy)
# Log model
mlflow.sklearn.log_model(model, "model")
Automated Training Pipeline
# training_pipeline.py
def training_pipeline():
# 1. Load data
data = load_data('s3://bucket/data.csv')
# 2. Preprocess
X, y = preprocess(data)
X_train, X_test, y_train, y_test = train_test_split(X, y)
# 3. Train
model = train_model(X_train, y_train)
# 4. Evaluate
metrics = evaluate_model(model, X_test, y_test)
# 5. Save if better
if metrics['accuracy'] > current_best:
save_model(model, 'production_model.pkl')
return metrics
# Run automatically
if __name__ == "__main__":
results = training_pipeline()
print(f"New model accuracy: {results['accuracy']}")
CI/CD for ML
name: ML Pipeline
on:
push:
branches: [main]
jobs:
train:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run tests
run: pytest tests/
- name: Train model
run: python train.py
- name: Validate model
run: python validate_model.py
- name: Deploy if tests pass
run: python deploy.py
Monitoring
import prometheus_client
from flask import Flask
app = Flask(__name__)
# Metrics
predictions_counter = prometheus_client.Counter(
'predictions_total',
'Total predictions made'
)
prediction_time = prometheus_client.Histogram(
'prediction_duration_seconds',
'Time spent making prediction'
)
@app.route('/predict', methods=['POST'])
@prediction_time.time()
def predict():
data = request.json
# Make prediction
prediction = model.predict([data['features']])
# Track metrics
predictions_counter.inc()
return jsonify({'prediction': int(prediction[0])})
Data Drift Detection
from evidently.dashboard import Dashboard
from evidently.tabs import DataDriftTab
# Compare training vs production data
dashboard = Dashboard(tabs=[DataDriftTab()])
dashboard.calculate(reference_data, production_data)
# Alert if drift detected
if dashboard.metrics['data_drift']['share_of_drifted_features'] > 0.3:
send_alert("Data drift detected!")
Model Registry
import mlflow
# Register model
model_uri = "runs:/abc123/model"
mlflow.register_model(model_uri, "house_price_model")
# Transition to production
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
name="house_price_model",
version=3,
stage="Production"
)
# Load production model
model = mlflow.pyfunc.load_model("models:/house_price_model/Production")
Feature Store
from feast import FeatureStore
store = FeatureStore(repo_path=".")
# Get features
features = store.get_online_features(
features=[
'user_features:age',
'user_features:income',
'transaction_features:avg_amount'
],
entity_rows=[{"user_id": 123}]
).to_dict()
# Use in prediction
prediction = model.predict([list(features.values())])
Best Practices
- Version everything: code, data, models
- Automate training: CI/CD pipelines
- Monitor constantly: accuracy, latency, drift
- Test thoroughly: unit tests, integration tests
- Document: model cards, data sheets
Tools Ecosystem
- Experiment tracking: MLflow, Weights & Biases
- Pipelines: Kubeflow, Airflow
- Monitoring: Prometheus, Grafana
- Serving: TensorFlow Serving, Seldon
Remember
- MLOps = DevOps + ML
- Automate everything
- Monitor in production
- Version all artifacts
#AI#Intermediate#MLOps