ML9 min read
Time Series Basics for Machine Learning
Learn the fundamentals of working with time series data in machine learning.
Sarah Chen
December 19, 2025
0.0k0
Time Series Basics for Machine Learning
Time series data has a twist: the order matters. You can't randomly split it like regular data. Let's learn the fundamentals.
What Makes Time Series Special
- Temporal order matters - Can't shuffle data
- Autocorrelation - Values correlate with past values
- Trends - Long-term increase/decrease
- Seasonality - Repeating patterns (daily, weekly, yearly)
Loading and Exploring
import pandas as pd
import matplotlib.pyplot as plt
# Load with datetime index
df = pd.read_csv('data.csv', parse_dates=['date'], index_col='date')
# Quick visualization
df['value'].plot(figsize=(12, 4))
plt.title('Time Series Data')
plt.show()
# Check for trends and seasonality
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(df['value'], period=12)
decomposition.plot()
Splitting Time Series Data
NEVER use random splits! Future data leaks into training.
# Correct: Split by time
train_size = int(len(df) * 0.8)
train = df[:train_size]
test = df[train_size:]
# Visual check
plt.plot(train.index, train['value'], label='Train')
plt.plot(test.index, test['value'], label='Test')
plt.legend()
Creating Lag Features
Past values as features for predicting future:
def create_lag_features(df, column, lags):
df_features = df.copy()
for lag in lags:
df_features[f'{column}_lag_{lag}'] = df[column].shift(lag)
return df_features.dropna()
# Create lags 1-7
df_features = create_lag_features(df, 'value', range(1, 8))
# Features: value at t-1, t-2, ..., t-7
# Target: value at t
Rolling Statistics
# Moving average
df['rolling_mean_7'] = df['value'].rolling(window=7).mean()
df['rolling_std_7'] = df['value'].rolling(window=7).std()
# Exponential moving average (more weight on recent)
df['ema_7'] = df['value'].ewm(span=7).mean()
Time-Based Features
df['day_of_week'] = df.index.dayofweek
df['month'] = df.index.month
df['is_weekend'] = df.index.dayofweek.isin([5, 6]).astype(int)
df['quarter'] = df.index.quarter
Cross-Validation for Time Series
Regular CV leaks future info. Use TimeSeriesSplit:
from sklearn.model_selection import TimeSeriesSplit
tscv = TimeSeriesSplit(n_splits=5)
for train_idx, test_idx in tscv.split(X):
X_train, X_test = X[train_idx], X[test_idx]
y_train, y_test = y[train_idx], y[test_idx]
# Train and evaluate
# With cross_val_score
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=TimeSeriesSplit(n_splits=5))
Simple Forecasting with ML
from sklearn.ensemble import RandomForestRegressor
# Prepare features and target
X = df_features.drop('value', axis=1)
y = df_features['value']
# Time-based split
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]
# Train model
model = RandomForestRegressor(n_estimators=100)
model.fit(X_train, y_train)
# Evaluate
from sklearn.metrics import mean_absolute_error, mean_squared_error
predictions = model.predict(X_test)
print(f"MAE: {mean_absolute_error(y_test, predictions):.2f}")
print(f"RMSE: {mean_squared_error(y_test, predictions)**0.5:.2f}")
Common Mistakes
- Random train/test split - Future leaks into training
- Using future data as features - Check your lags!
- Ignoring seasonality - Add time-based features
- Wrong evaluation - Use time-aware CV
When to Use ML vs Traditional Methods
ML approaches work when:
- Multiple features available
- Complex non-linear patterns
- External variables matter
Traditional (ARIMA, etc.) better when:
- Pure time series (no extra features)
- Strong seasonality
- Need confidence intervals
Key Takeaway
Time series requires respecting temporal order in everything: splits, cross-validation, and feature creation. Create lag features and rolling statistics, add time-based features for seasonality, and always use TimeSeriesSplit for validation. Never let future information leak into your training data!
#Machine Learning#Time Series#Forecasting#Intermediate