Support Vector Machines (SVM) Explained
Learn how SVMs find the optimal boundary between classes and when to use them.
Support Vector Machines (SVM) Explained
SVM finds the best line (or hyperplane) that separates your classes with the maximum margin. It's elegant math that works surprisingly well.
The Core Idea
Imagine plotting two classes of points. Many lines could separate them. SVM finds the one that: 1. Correctly separates classes 2. Maximizes distance to nearest points
The nearest points are called "support vectors" - they support (define) the decision boundary.
Visual Intuition
``` Class A: ● Class B: ○
● ● ○ ○ ● ● | ○ ○ ● | ←margin→ ○ ● ● | ○ ○ | Support vectors touch the margin ```
Implementation
```python from sklearn.svm import SVC from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline
SVM needs scaled features! svm_pipeline = Pipeline([ ('scaler', StandardScaler()), ('svm', SVC(kernel='rbf', C=1.0)) ])
svm_pipeline.fit(X_train, y_train) accuracy = svm_pipeline.score(X_test, y_test) ```
The Kernel Trick
What if data isn't linearly separable? Kernels transform data to higher dimensions where it becomes separable.
**Common Kernels:**
| Kernel | Use Case | |--------|----------| | linear | Linearly separable data | | rbf (default) | Most problems, non-linear | | poly | Polynomial relationships |
```python # Linear kernel - faster for high-dimensional data SVC(kernel='linear')
RBF kernel - default, handles non-linear SVC(kernel='rbf', gamma='scale')
Polynomial kernel SVC(kernel='poly', degree=3) ```
Key Parameters
**C (Regularization):** - High C: Tries to classify all points correctly (risk of overfitting) - Low C: Allows some misclassification (smoother boundary)
**gamma (for RBF kernel):** - High gamma: Points must be close to affect boundary (complex boundary) - Low gamma: Points far away still matter (simpler boundary)
When to Use SVM
**Works well for:** - Binary classification - Small to medium datasets - High-dimensional data (text classification) - When you need a clear margin
**Not ideal for:** - Large datasets (slow training) - Multi-class problems (needs workarounds) - When probability estimates are needed - Noisy data with overlapping classes
Important: Scale Your Features!
SVM is sensitive to feature scales. Always standardize:
```python from sklearn.preprocessing import StandardScaler
scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) ```
Key Takeaway
SVM finds the optimal boundary with maximum margin. Use RBF kernel as default, always scale your features, and tune C for the bias-variance tradeoff. Great for smaller datasets with clear separation, but consider tree-based methods for large tabular data.