Support Vector Machines (SVM) Explained
Learn how SVMs find the optimal boundary between classes and when to use them.
Support Vector Machines (SVM) Explained
SVM finds the best line (or hyperplane) that separates your classes with the maximum margin. It's elegant math that works surprisingly well.
The Core Idea
Imagine plotting two classes of points. Many lines could separate them. SVM finds the one that:
- Correctly separates classes
- Maximizes distance to nearest points
The nearest points are called "support vectors" - they support (define) the decision boundary.
Visual Intuition
Class A: ●
Class B: ○
● ● ○ ○
● ● | ○ ○
● | ←margin→ ○
● ● | ○ ○
|
Support vectors touch the margin
Implementation
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# SVM needs scaled features!
svm_pipeline = Pipeline([
('scaler', StandardScaler()),
('svm', SVC(kernel='rbf', C=1.0))
])
svm_pipeline.fit(X_train, y_train)
accuracy = svm_pipeline.score(X_test, y_test)
The Kernel Trick
What if data isn't linearly separable? Kernels transform data to higher dimensions where it becomes separable.
Common Kernels:
| Kernel | Use Case |
|---|---|
| linear | Linearly separable data |
| rbf (default) | Most problems, non-linear |
| poly | Polynomial relationships |
# Linear kernel - faster for high-dimensional data
SVC(kernel='linear')
# RBF kernel - default, handles non-linear
SVC(kernel='rbf', gamma='scale')
# Polynomial kernel
SVC(kernel='poly', degree=3)
Key Parameters
C (Regularization):
- High C: Tries to classify all points correctly (risk of overfitting)
- Low C: Allows some misclassification (smoother boundary)
gamma (for RBF kernel):
- High gamma: Points must be close to affect boundary (complex boundary)
- Low gamma: Points far away still matter (simpler boundary)
When to Use SVM
Works well for:
- Binary classification
- Small to medium datasets
- High-dimensional data (text classification)
- When you need a clear margin
Not ideal for:
- Large datasets (slow training)
- Multi-class problems (needs workarounds)
- When probability estimates are needed
- Noisy data with overlapping classes
Important: Scale Your Features!
SVM is sensitive to feature scales. Always standardize:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
Key Takeaway
SVM finds the optimal boundary with maximum margin. Use RBF kernel as default, always scale your features, and tune C for the bias-variance tradeoff. Great for smaller datasets with clear separation, but consider tree-based methods for large tabular data.