AI6 min read

Principal Component Analysis

Reduce dimensions while keeping important information.

Robert Anderson
December 18, 2025
0.0k0

Simplify complex data.

What is PCA?

Reduces number of features while keeping important info.

Like summarizing a long book into key points!

Why Use PCA?

- Too many features slow down training - Visualization (can't plot 100 dimensions!) - Remove noise - Reduce storage

Real Example

Customer data with 50 features → Reduce to 5 key features

Maybe those 5 capture 95% of important information!

How It Works

1. Find directions with most variation 2. Project data onto those directions 3. Keep top N components

Python Code

```python from sklearn.decomposition import PCA import numpy as np

Data with many features X = np.array([ [2.5, 2.4, 1.5, 3.2], [0.5, 0.7, 0.9, 1.1], [2.2, 2.9, 1.8, 3.5], [1.9, 2.2, 1.6, 2.8] ])

Reduce to 2 components pca = PCA(n_components=2) X_reduced = pca.fit_transform(X)

print(f"Original shape: {X.shape}") print(f"Reduced shape: {X_reduced.shape}")

Check variance explained print(f"Variance explained: {pca.explained_variance_ratio_}") ```

Choosing Components

Keep components that explain 95% of variance:

```python pca = PCA(n_components=0.95) # Keep 95% variance X_reduced = pca.fit_transform(X) ```

Applications

- Image compression - Noise reduction - Feature extraction - Data visualization

Advantages

- Fast - No parameters to tune - Interpretable

Disadvantages

- Loses some information - Assumes linear relationships - Hard to interpret components

Remember

- Use before training models - Great for visualization - Try keeping 95% variance

#AI#Intermediate#PCA