Learn the essentials of deploying ML models to production, from saving models to serving predictions.

ML in Production: From Notebook to Deployment

A model in a notebook is useless. Let's get it serving real predictions.

The Production Pipeline

``` Training Pipeline: Data → Preprocess → Train → Evaluate → Save Model

Inference Pipeline: Request → Load Model → Preprocess → Predict → Response ```

Step 1: Save Your Model

```python import joblib import pickle

Method 1: joblib (recommended for sklearn) joblib.dump(model, 'model.joblib') model = joblib.load('model.joblib')

Method 2: pickle with open('model.pkl', 'wb') as f: pickle.dump(model, f)

Save preprocessing too! joblib.dump(scaler, 'scaler.joblib') joblib.dump(encoder, 'encoder.joblib') ```

Step 2: Create Inference Pipeline

Keep preprocessing consistent:

```python class ModelPipeline: def __init__(self, model_path, scaler_path): self.model = joblib.load(model_path) self.scaler = joblib.load(scaler_path) def preprocess(self, data): # Same preprocessing as training return self.scaler.transform(data) def predict(self, data): processed = self.preprocess(data) return self.model.predict(processed) def predict_proba(self, data): processed = self.preprocess(data) return self.model.predict_proba(processed) ```

Step 3: Create API Endpoint

Using FastAPI (recommended):

```python from fastapi import FastAPI from pydantic import BaseModel import numpy as np

app = FastAPI()

Load model at startup pipeline = ModelPipeline('model.joblib', 'scaler.joblib')

class PredictionRequest(BaseModel): features: list[float]

class PredictionResponse(BaseModel): prediction: int probability: float

@app.post("/predict", response_model=PredictionResponse) def predict(request: PredictionRequest): features = np.array(request.features).reshape(1, -1) prediction = pipeline.predict(features)[0] probability = pipeline.predict_proba(features)[0].max() return PredictionResponse( prediction=int(prediction), probability=float(probability) )

Health check @app.get("/health") def health(): return {"status": "healthy"} ```

Step 4: Containerize with Docker

```dockerfile FROM python:3.10-slim

WORKDIR /app

COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt

COPY model.joblib . COPY scaler.joblib . COPY app.py .

EXPOSE 8000

CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"] ```

```bash # Build and run docker build -t ml-model . docker run -p 8000:8000 ml-model ```

Step 5: Monitor Your Model

```python import logging from datetime import datetime

logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__)

@app.post("/predict") def predict(request: PredictionRequest): start_time = datetime.now() features = np.array(request.features).reshape(1, -1) prediction = pipeline.predict(features)[0] probability = pipeline.predict_proba(features)[0].max() # Log for monitoring latency = (datetime.now() - start_time).total_seconds() logger.info(f"Prediction: {prediction}, Prob: {probability:.3f}, Latency: {latency:.3f}s") return PredictionResponse( prediction=int(prediction), probability=float(probability) ) ```

Common Production Issues

### 1. Training-Serving Skew

**Problem:** Preprocessing differs between training and serving.

**Solution:** Use the exact same preprocessing code:

```python # Save the entire pipeline from sklearn.pipeline import Pipeline

full_pipeline = Pipeline([ ('scaler', StandardScaler()), ('model', RandomForestClassifier()) ]) full_pipeline.fit(X_train, y_train) joblib.dump(full_pipeline, 'full_pipeline.joblib') ```

### 2. Model Drift

**Problem:** Model performance degrades over time.

**Solution:** Monitor predictions and retrain:

```python # Track prediction distribution def log_prediction_stats(predictions): logger.info(f"Mean: {np.mean(predictions):.3f}") logger.info(f"Std: {np.std(predictions):.3f}") logger.info(f"Class distribution: {np.bincount(predictions)}") ```

### 3. Latency Issues

**Solution:** Optimize or use lighter models:

```python # Measure latency import time

def benchmark_model(model, X_test, n_runs=100): times = [] for _ in range(n_runs): start = time.time() model.predict(X_test[:1]) times.append(time.time() - start) print(f"Mean latency: {np.mean(times)*1000:.2f}ms") print(f"P99 latency: {np.percentile(times, 99)*1000:.2f}ms") ```

Production Checklist

- [ ] Model and preprocessing saved together - [ ] API endpoint tested - [ ] Input validation - [ ] Error handling - [ ] Health check endpoint - [ ] Logging for monitoring - [ ] Docker containerization - [ ] Load testing done - [ ] Rollback plan ready

Key Takeaway

Production ML is about consistency and reliability. Save models with their preprocessing, create clean APIs, containerize for portability, and monitor everything. The best model is worthless if it can't serve predictions reliably. Start simple, add complexity only when needed!

ML in Production: From Notebook to Deployment

ML in Production: From Notebook to Deployment

The Production Pipeline

Step 1: Save Your Model

Method 1: joblib (recommended for sklearn) joblib.dump(model, 'model.joblib') model = joblib.load('model.joblib')

Method 2: pickle with open('model.pkl', 'wb') as f: pickle.dump(model, f)

Save preprocessing too! joblib.dump(scaler, 'scaler.joblib') joblib.dump(encoder, 'encoder.joblib') ```

Step 2: Create Inference Pipeline

Step 3: Create API Endpoint

Load model at startup pipeline = ModelPipeline('model.joblib', 'scaler.joblib')

Health check @app.get("/health") def health(): return {"status": "healthy"} ```

Step 4: Containerize with Docker

Step 5: Monitor Your Model

Common Production Issues

Production Checklist

Key Takeaway

More on ML

What is Machine Learning? A Simple Introduction

Supervised vs Unsupervised Learning Explained

Understanding Training, Validation, and Test Sets