ML7 min read

Feature Scaling: Normalization vs Standardization

Learn when and how to scale your features for better ML model performance.

Sarah Chen
December 19, 2025
0.0k0

Feature Scaling: Normalization vs Standardization

Different features have different scales. A house's square footage (1000-5000) vs bedrooms (1-5) shouldn't be compared directly. Scaling fixes this.

Why Scale Features?

Problem Without Scaling

Feature 1 (Income): 30,000 - 200,000
Feature 2 (Age): 18 - 80

Distance-based algorithms (KNN, SVM, Neural Networks) will be dominated by income!

Algorithms That NEED Scaling

  • K-Nearest Neighbors (KNN)
  • Support Vector Machines (SVM)
  • Neural Networks
  • Linear/Logistic Regression (for convergence)
  • PCA

Algorithms That DON'T Need Scaling

  • Decision Trees
  • Random Forest
  • Gradient Boosting (XGBoost, LightGBM)
  • Naive Bayes

Two Main Techniques

1. Normalization (Min-Max Scaling)

Scales to a fixed range, usually [0, 1]:

x_normalized = (x - min) / (max - min)
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
X_normalized = scaler.fit_transform(X)

# All values now between 0 and 1

Use when:

  • You need bounded values
  • Data doesn't have outliers
  • Neural networks (especially image data)

2. Standardization (Z-Score Scaling)

Centers around 0 with unit variance:

x_standardized = (x - mean) / std
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_standardized = scaler.fit_transform(X)

# Mean ≈ 0, Std ≈ 1

Use when:

  • Data has outliers
  • Algorithm assumes normally distributed data
  • Default choice for most cases

Quick Comparison

Aspect Normalization Standardization
Range [0, 1] fixed No fixed range
Outlier handling Poor Better
Mean Not centered Centered at 0
Use case Image pixels, bounded data Most other cases

Code Example

import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# Sample data
X = np.array([[30000, 25], [80000, 45], [50000, 35], [120000, 50]])
#             income  age

# Standardization
std_scaler = StandardScaler()
X_std = std_scaler.fit_transform(X)
print("Standardized:")
print(X_std)
print(f"Mean: {X_std.mean(axis=0)}")  # Should be ~0
print(f"Std: {X_std.std(axis=0)}")    # Should be ~1

# Normalization  
minmax_scaler = MinMaxScaler()
X_norm = minmax_scaler.fit_transform(X)
print("\nNormalized:")
print(X_norm)
print(f"Min: {X_norm.min(axis=0)}")   # Should be 0
print(f"Max: {X_norm.max(axis=0)}")   # Should be 1

Critical: Fit on Train, Transform Both

# WRONG - Data leakage!
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)  # Learns from ALL data
X_train, X_test = split(X_scaled)

# RIGHT
X_train, X_test = train_test_split(X, test_size=0.2)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # Learn from train only
X_test_scaled = scaler.transform(X_test)        # Apply same transformation

The test set shouldn't influence the scaling parameters!

Other Scaling Methods

Robust Scaler (for outliers)

from sklearn.preprocessing import RobustScaler

scaler = RobustScaler()  # Uses median and IQR, robust to outliers

Max Abs Scaler (sparse data)

from sklearn.preprocessing import MaxAbsScaler

scaler = MaxAbsScaler()  # Scales by max absolute value, keeps sparsity

When NOT to Scale

  1. Tree-based models - They split on thresholds, scale doesn't matter
  2. Categorical features - One-hot encoded features (0/1) shouldn't be scaled
  3. Already scaled data - Don't scale twice!

Summary

Situation Recommendation
Default choice StandardScaler
Bounded output needed MinMaxScaler
Data has outliers RobustScaler
Tree-based models No scaling needed
Neural networks MinMaxScaler or StandardScaler

Remember: Always fit on training data only, then transform both train and test sets!

#Machine Learning#Feature Scaling#Preprocessing#Beginner