ML7 min read

Reproducibility in Machine Learning

Learn how to make your ML experiments reproducible for yourself and others.

Sarah Chen
December 19, 2025
0.0k0

Reproducibility in Machine Learning

"It worked yesterday!" If you can't reproduce your results, you can't trust them. Reproducibility isn't optional - it's essential.

Why Reproducibility Matters

  • Debug issues (need to recreate the problem)
  • Compare experiments fairly
  • Share with colleagues
  • Deploy to production with confidence
  • Scientific integrity

Level 1: Random Seeds

Set seeds everywhere:

import numpy as np
import random
import os

def set_all_seeds(seed=42):
    np.random.seed(seed)
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    
    # For TensorFlow
    try:
        import tensorflow as tf
        tf.random.set_seed(seed)
    except ImportError:
        pass
    
    # For PyTorch
    try:
        import torch
        torch.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
    except ImportError:
        pass

# Call at start of every experiment
set_all_seeds(42)

Also in model and data splitting:

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Always specify random_state
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = RandomForestClassifier(n_estimators=100, random_state=42)

Level 2: Environment Management

requirements.txt with Versions

# Save exact versions
pip freeze > requirements.txt

# requirements.txt
numpy==1.24.0
pandas==2.0.0
scikit-learn==1.3.0
xgboost==1.7.0

Better: Use Poetry or Conda

# environment.yml
name: ml-project
dependencies:
  - python=3.10
  - numpy=1.24.0
  - pandas=2.0.0
  - scikit-learn=1.3.0
  - pip:
    - xgboost==1.7.0

Level 3: Track Experiments

Simple: Log Everything

import json
from datetime import datetime

def log_experiment(params, metrics, notes=""):
    experiment = {
        'timestamp': datetime.now().isoformat(),
        'params': params,
        'metrics': metrics,
        'notes': notes
    }
    
    with open('experiments.jsonl', 'a') as f:
        f.write(json.dumps(experiment) + '\n')

# Usage
log_experiment(
    params={'n_estimators': 100, 'max_depth': 10},
    metrics={'accuracy': 0.85, 'f1': 0.82},
    notes='First baseline'
)

Better: Use MLflow

import mlflow

mlflow.set_experiment("my_experiment")

with mlflow.start_run():
    # Log parameters
    mlflow.log_param("n_estimators", 100)
    mlflow.log_param("max_depth", 10)
    
    # Train model
    model.fit(X_train, y_train)
    
    # Log metrics
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_metric("f1", f1)
    
    # Log model
    mlflow.sklearn.log_model(model, "model")

Level 4: Version Data

Data changes can break reproducibility:

import hashlib
import pandas as pd

def hash_dataframe(df):
    return hashlib.md5(
        pd.util.hash_pandas_object(df).values
    ).hexdigest()

# Log data hash
data_hash = hash_dataframe(df)
print(f"Data hash: {data_hash}")

# Or use DVC for data versioning
# dvc add data/training_data.csv

Level 5: Code Version Control

Always commit before running experiments:

import subprocess

def get_git_commit():
    try:
        commit = subprocess.check_output(
            ['git', 'rev-parse', 'HEAD']
        ).decode().strip()
        return commit
    except:
        return "unknown"

# Log with experiment
log_experiment(
    params={...},
    metrics={...},
    git_commit=get_git_commit()
)

Reproducibility Checklist

□ Random seeds set (numpy, python, framework)
□ Specific package versions documented
□ Data versioned or hashed
□ Code committed to git
□ Experiment parameters logged
□ Results logged with timestamp
□ Hardware/environment noted (GPU, OS)

Quick Template

import numpy as np
import random
from datetime import datetime

# 1. Set seeds
SEED = 42
np.random.seed(SEED)
random.seed(SEED)

# 2. Log experiment config
config = {
    'seed': SEED,
    'data_path': 'data/train.csv',
    'model': 'RandomForest',
    'params': {'n_estimators': 100, 'max_depth': 10},
    'timestamp': datetime.now().isoformat()
}
print(f"Config: {config}")

# 3. Load data (with hash for verification)
df = pd.read_csv(config['data_path'])
print(f"Data shape: {df.shape}")

# 4. Train with fixed seed
model = RandomForestClassifier(
    **config['params'],
    random_state=SEED
)

# 5. Log results
results = {
    'config': config,
    'metrics': {'accuracy': accuracy, 'f1': f1}
}

Key Takeaway

Reproducibility is a habit, not an afterthought. Set random seeds everywhere, version your environment and data, log all experiments, and commit code before runs. Future you (and your colleagues) will thank you. Start simple with seeds and requirements.txt, add experiment tracking as needed.

#Machine Learning#Reproducibility#Best Practices#MLOps#Intermediate