AI6 min read
Principal Component Analysis
Reduce dimensions while keeping important information.
Robert Anderson
December 18, 2025
0.0k0
Simplify complex data.
What is PCA?
Reduces number of features while keeping important info.
Like summarizing a long book into key points!
Why Use PCA?
- Too many features slow down training
- Visualization (can't plot 100 dimensions!)
- Remove noise
- Reduce storage
Real Example
Customer data with 50 features → Reduce to 5 key features
Maybe those 5 capture 95% of important information!
How It Works
- Find directions with most variation
- Project data onto those directions
- Keep top N components
Python Code
from sklearn.decomposition import PCA
import numpy as np
# Data with many features
X = np.array([
[2.5, 2.4, 1.5, 3.2],
[0.5, 0.7, 0.9, 1.1],
[2.2, 2.9, 1.8, 3.5],
[1.9, 2.2, 1.6, 2.8]
])
# Reduce to 2 components
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
print(f"Original shape: {X.shape}")
print(f"Reduced shape: {X_reduced.shape}")
# Check variance explained
print(f"Variance explained: {pca.explained_variance_ratio_}")
Choosing Components
Keep components that explain 95% of variance:
pca = PCA(n_components=0.95) # Keep 95% variance
X_reduced = pca.fit_transform(X)
Applications
- Image compression
- Noise reduction
- Feature extraction
- Data visualization
Advantages
- Fast
- No parameters to tune
- Interpretable
Disadvantages
- Loses some information
- Assumes linear relationships
- Hard to interpret components
Remember
- Use before training models
- Great for visualization
- Try keeping 95% variance
#AI#Intermediate#PCA