AI7 min read

Model Deployment Basics

Deploy ML models to production.

Robert Anderson
December 18, 2025
0.0k0

Make models available to users.

Why Deploy?

Model in notebook ≠ Value
Model in production = Real impact!

Save Your Model

import joblib
from sklearn.ensemble import RandomForestClassifier

# Train model
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Save model
joblib.dump(model, 'model.pkl')

# Load model
loaded_model = joblib.load('model.pkl')
prediction = loaded_model.predict(X_new)

Flask API

Create a web API for your model:

from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)

# Load model at startup
model = joblib.load('model.pkl')

@app.route('/predict', methods=['POST'])
def predict():
    # Get data from request
    data = request.json
    features = [[data['age'], data['income']]]
    
    # Make prediction
    prediction = model.predict(features)[0]
    
    return jsonify({'prediction': int(prediction)})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Test API

import requests

# Send request
response = requests.post('http://localhost:5000/predict', 
                        json={'age': 30, 'income': 50000})

result = response.json()
print(f"Prediction: {result['prediction']}")

FastAPI (Modern Alternative)

from fastapi import FastAPI
from pydantic import BaseModel
import joblib

app = FastAPI()
model = joblib.load('model.pkl')

class PredictionInput(BaseModel):
    age: int
    income: float

@app.post('/predict')
def predict(input_data: PredictionInput):
    features = [[input_data.age, input_data.income]]
    prediction = model.predict(features)[0]
    return {'prediction': int(prediction)}

# Run with: uvicorn main:app --reload

Docker Container

# Dockerfile
FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY model.pkl .
COPY app.py .

CMD ["python", "app.py"]
# Build and run
docker build -t ml-model .
docker run -p 5000:5000 ml-model

Cloud Deployment

AWS SageMaker:

import sagemaker
from sagemaker.sklearn import SKLearnModel

model = SKLearnModel(
    model_data='s3://bucket/model.tar.gz',
    role=role,
    entry_point='inference.py'
)

predictor = model.deploy(instance_type='ml.m5.large')
prediction = predictor.predict(data)

Google Cloud AI Platform:

gcloud ai-platform models create my_model
gcloud ai-platform versions create v1 \
    --model my_model \
    --origin gs://bucket/model \
    --runtime-version 2.3

Monitoring

Track model performance:

import logging

logging.basicConfig(level=logging.INFO)

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json
    prediction = model.predict([data['features']])
    
    # Log for monitoring
    logging.info(f"Input: {data['features']}, Prediction: {prediction}")
    
    return jsonify({'prediction': int(prediction[0])})

Best Practices

  1. Version your models
  2. Monitor predictions
  3. Handle errors gracefully
  4. Add authentication
  5. Use caching for frequent queries
  6. Set up CI/CD pipeline

Remember

  • Start with simple Flask API
  • Use Docker for consistency
  • Monitor model performance
  • Plan for model updates
#AI#Intermediate#Deployment