AI7 min read
Model Deployment Basics
Deploy ML models to production.
Robert Anderson
December 18, 2025
0.0k0
Make models available to users.
Why Deploy?
Model in notebook ≠ Value
Model in production = Real impact!
Save Your Model
import joblib
from sklearn.ensemble import RandomForestClassifier
# Train model
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Save model
joblib.dump(model, 'model.pkl')
# Load model
loaded_model = joblib.load('model.pkl')
prediction = loaded_model.predict(X_new)
Flask API
Create a web API for your model:
from flask import Flask, request, jsonify
import joblib
app = Flask(__name__)
# Load model at startup
model = joblib.load('model.pkl')
@app.route('/predict', methods=['POST'])
def predict():
# Get data from request
data = request.json
features = [[data['age'], data['income']]]
# Make prediction
prediction = model.predict(features)[0]
return jsonify({'prediction': int(prediction)})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
Test API
import requests
# Send request
response = requests.post('http://localhost:5000/predict',
json={'age': 30, 'income': 50000})
result = response.json()
print(f"Prediction: {result['prediction']}")
FastAPI (Modern Alternative)
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
app = FastAPI()
model = joblib.load('model.pkl')
class PredictionInput(BaseModel):
age: int
income: float
@app.post('/predict')
def predict(input_data: PredictionInput):
features = [[input_data.age, input_data.income]]
prediction = model.predict(features)[0]
return {'prediction': int(prediction)}
# Run with: uvicorn main:app --reload
Docker Container
# Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY model.pkl .
COPY app.py .
CMD ["python", "app.py"]
# Build and run
docker build -t ml-model .
docker run -p 5000:5000 ml-model
Cloud Deployment
AWS SageMaker:
import sagemaker
from sagemaker.sklearn import SKLearnModel
model = SKLearnModel(
model_data='s3://bucket/model.tar.gz',
role=role,
entry_point='inference.py'
)
predictor = model.deploy(instance_type='ml.m5.large')
prediction = predictor.predict(data)
Google Cloud AI Platform:
gcloud ai-platform models create my_model
gcloud ai-platform versions create v1 \
--model my_model \
--origin gs://bucket/model \
--runtime-version 2.3
Monitoring
Track model performance:
import logging
logging.basicConfig(level=logging.INFO)
@app.route('/predict', methods=['POST'])
def predict():
data = request.json
prediction = model.predict([data['features']])
# Log for monitoring
logging.info(f"Input: {data['features']}, Prediction: {prediction}")
return jsonify({'prediction': int(prediction[0])})
Best Practices
- Version your models
- Monitor predictions
- Handle errors gracefully
- Add authentication
- Use caching for frequent queries
- Set up CI/CD pipeline
Remember
- Start with simple Flask API
- Use Docker for consistency
- Monitor model performance
- Plan for model updates
#AI#Intermediate#Deployment