ML7 min read
Reproducibility in Machine Learning
Learn how to make your ML experiments reproducible for yourself and others.
Sarah Chen
December 19, 2025
0.0k0
Reproducibility in Machine Learning
"It worked yesterday!" If you can't reproduce your results, you can't trust them. Reproducibility isn't optional - it's essential.
Why Reproducibility Matters
- Debug issues (need to recreate the problem)
- Compare experiments fairly
- Share with colleagues
- Deploy to production with confidence
- Scientific integrity
Level 1: Random Seeds
Set seeds everywhere:
import numpy as np
import random
import os
def set_all_seeds(seed=42):
np.random.seed(seed)
random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
# For TensorFlow
try:
import tensorflow as tf
tf.random.set_seed(seed)
except ImportError:
pass
# For PyTorch
try:
import torch
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
except ImportError:
pass
# Call at start of every experiment
set_all_seeds(42)
Also in model and data splitting:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
# Always specify random_state
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = RandomForestClassifier(n_estimators=100, random_state=42)
Level 2: Environment Management
requirements.txt with Versions
# Save exact versions
pip freeze > requirements.txt
# requirements.txt
numpy==1.24.0
pandas==2.0.0
scikit-learn==1.3.0
xgboost==1.7.0
Better: Use Poetry or Conda
# environment.yml
name: ml-project
dependencies:
- python=3.10
- numpy=1.24.0
- pandas=2.0.0
- scikit-learn=1.3.0
- pip:
- xgboost==1.7.0
Level 3: Track Experiments
Simple: Log Everything
import json
from datetime import datetime
def log_experiment(params, metrics, notes=""):
experiment = {
'timestamp': datetime.now().isoformat(),
'params': params,
'metrics': metrics,
'notes': notes
}
with open('experiments.jsonl', 'a') as f:
f.write(json.dumps(experiment) + '\n')
# Usage
log_experiment(
params={'n_estimators': 100, 'max_depth': 10},
metrics={'accuracy': 0.85, 'f1': 0.82},
notes='First baseline'
)
Better: Use MLflow
import mlflow
mlflow.set_experiment("my_experiment")
with mlflow.start_run():
# Log parameters
mlflow.log_param("n_estimators", 100)
mlflow.log_param("max_depth", 10)
# Train model
model.fit(X_train, y_train)
# Log metrics
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("f1", f1)
# Log model
mlflow.sklearn.log_model(model, "model")
Level 4: Version Data
Data changes can break reproducibility:
import hashlib
import pandas as pd
def hash_dataframe(df):
return hashlib.md5(
pd.util.hash_pandas_object(df).values
).hexdigest()
# Log data hash
data_hash = hash_dataframe(df)
print(f"Data hash: {data_hash}")
# Or use DVC for data versioning
# dvc add data/training_data.csv
Level 5: Code Version Control
Always commit before running experiments:
import subprocess
def get_git_commit():
try:
commit = subprocess.check_output(
['git', 'rev-parse', 'HEAD']
).decode().strip()
return commit
except:
return "unknown"
# Log with experiment
log_experiment(
params={...},
metrics={...},
git_commit=get_git_commit()
)
Reproducibility Checklist
□ Random seeds set (numpy, python, framework)
□ Specific package versions documented
□ Data versioned or hashed
□ Code committed to git
□ Experiment parameters logged
□ Results logged with timestamp
□ Hardware/environment noted (GPU, OS)
Quick Template
import numpy as np
import random
from datetime import datetime
# 1. Set seeds
SEED = 42
np.random.seed(SEED)
random.seed(SEED)
# 2. Log experiment config
config = {
'seed': SEED,
'data_path': 'data/train.csv',
'model': 'RandomForest',
'params': {'n_estimators': 100, 'max_depth': 10},
'timestamp': datetime.now().isoformat()
}
print(f"Config: {config}")
# 3. Load data (with hash for verification)
df = pd.read_csv(config['data_path'])
print(f"Data shape: {df.shape}")
# 4. Train with fixed seed
model = RandomForestClassifier(
**config['params'],
random_state=SEED
)
# 5. Log results
results = {
'config': config,
'metrics': {'accuracy': accuracy, 'f1': f1}
}
Key Takeaway
Reproducibility is a habit, not an afterthought. Set random seeds everywhere, version your environment and data, log all experiments, and commit code before runs. Future you (and your colleagues) will thank you. Start simple with seeds and requirements.txt, add experiment tracking as needed.
#Machine Learning#Reproducibility#Best Practices#MLOps#Intermediate