ML7 min read
Feature Scaling: Normalization vs Standardization
Learn when and how to scale your features for better ML model performance.
Sarah Chen
December 19, 2025
0.0k0
Feature Scaling: Normalization vs Standardization
Different features have different scales. A house's square footage (1000-5000) vs bedrooms (1-5) shouldn't be compared directly. Scaling fixes this.
Why Scale Features?
Problem Without Scaling
Feature 1 (Income): 30,000 - 200,000
Feature 2 (Age): 18 - 80
Distance-based algorithms (KNN, SVM, Neural Networks) will be dominated by income!
Algorithms That NEED Scaling
- K-Nearest Neighbors (KNN)
- Support Vector Machines (SVM)
- Neural Networks
- Linear/Logistic Regression (for convergence)
- PCA
Algorithms That DON'T Need Scaling
- Decision Trees
- Random Forest
- Gradient Boosting (XGBoost, LightGBM)
- Naive Bayes
Two Main Techniques
1. Normalization (Min-Max Scaling)
Scales to a fixed range, usually [0, 1]:
x_normalized = (x - min) / (max - min)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_normalized = scaler.fit_transform(X)
# All values now between 0 and 1
Use when:
- You need bounded values
- Data doesn't have outliers
- Neural networks (especially image data)
2. Standardization (Z-Score Scaling)
Centers around 0 with unit variance:
x_standardized = (x - mean) / std
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_standardized = scaler.fit_transform(X)
# Mean ≈ 0, Std ≈ 1
Use when:
- Data has outliers
- Algorithm assumes normally distributed data
- Default choice for most cases
Quick Comparison
| Aspect | Normalization | Standardization |
|---|---|---|
| Range | [0, 1] fixed | No fixed range |
| Outlier handling | Poor | Better |
| Mean | Not centered | Centered at 0 |
| Use case | Image pixels, bounded data | Most other cases |
Code Example
import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler
# Sample data
X = np.array([[30000, 25], [80000, 45], [50000, 35], [120000, 50]])
# income age
# Standardization
std_scaler = StandardScaler()
X_std = std_scaler.fit_transform(X)
print("Standardized:")
print(X_std)
print(f"Mean: {X_std.mean(axis=0)}") # Should be ~0
print(f"Std: {X_std.std(axis=0)}") # Should be ~1
# Normalization
minmax_scaler = MinMaxScaler()
X_norm = minmax_scaler.fit_transform(X)
print("\nNormalized:")
print(X_norm)
print(f"Min: {X_norm.min(axis=0)}") # Should be 0
print(f"Max: {X_norm.max(axis=0)}") # Should be 1
Critical: Fit on Train, Transform Both
# WRONG - Data leakage!
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X) # Learns from ALL data
X_train, X_test = split(X_scaled)
# RIGHT
X_train, X_test = train_test_split(X, test_size=0.2)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train) # Learn from train only
X_test_scaled = scaler.transform(X_test) # Apply same transformation
The test set shouldn't influence the scaling parameters!
Other Scaling Methods
Robust Scaler (for outliers)
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler() # Uses median and IQR, robust to outliers
Max Abs Scaler (sparse data)
from sklearn.preprocessing import MaxAbsScaler
scaler = MaxAbsScaler() # Scales by max absolute value, keeps sparsity
When NOT to Scale
- Tree-based models - They split on thresholds, scale doesn't matter
- Categorical features - One-hot encoded features (0/1) shouldn't be scaled
- Already scaled data - Don't scale twice!
Summary
| Situation | Recommendation |
|---|---|
| Default choice | StandardScaler |
| Bounded output needed | MinMaxScaler |
| Data has outliers | RobustScaler |
| Tree-based models | No scaling needed |
| Neural networks | MinMaxScaler or StandardScaler |
Remember: Always fit on training data only, then transform both train and test sets!
#Machine Learning#Feature Scaling#Preprocessing#Beginner