Support Vector Machines (SVM) Explained

SVM finds the best line (or hyperplane) that separates your classes with the maximum margin. It's elegant math that works surprisingly well.

The Core Idea

Imagine plotting two classes of points. Many lines could separate them. SVM finds the one that: 1. Correctly separates classes 2. Maximizes distance to nearest points

The nearest points are called "support vectors" - they support (define) the decision boundary.

Visual Intuition

``` Class A: ● Class B: ○

● ● ○ ○ ● ● | ○ ○ ● | ←margin→ ○ ● ● | ○ ○ | Support vectors touch the margin ```

Implementation

```python from sklearn.svm import SVC from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline

SVM needs scaled features! svm_pipeline = Pipeline([ ('scaler', StandardScaler()), ('svm', SVC(kernel='rbf', C=1.0)) ])

svm_pipeline.fit(X_train, y_train) accuracy = svm_pipeline.score(X_test, y_test) ```

The Kernel Trick

What if data isn't linearly separable? Kernels transform data to higher dimensions where it becomes separable.

**Common Kernels:**

| Kernel | Use Case | |--------|----------| | linear | Linearly separable data | | rbf (default) | Most problems, non-linear | | poly | Polynomial relationships |

```python # Linear kernel - faster for high-dimensional data SVC(kernel='linear')

RBF kernel - default, handles non-linear SVC(kernel='rbf', gamma='scale')

Polynomial kernel SVC(kernel='poly', degree=3) ```

Key Parameters

**C (Regularization):** - High C: Tries to classify all points correctly (risk of overfitting) - Low C: Allows some misclassification (smoother boundary)

**gamma (for RBF kernel):** - High gamma: Points must be close to affect boundary (complex boundary) - Low gamma: Points far away still matter (simpler boundary)

When to Use SVM

**Works well for:** - Binary classification - Small to medium datasets - High-dimensional data (text classification) - When you need a clear margin

**Not ideal for:** - Large datasets (slow training) - Multi-class problems (needs workarounds) - When probability estimates are needed - Noisy data with overlapping classes

Important: Scale Your Features!

SVM is sensitive to feature scales. Always standardize:

```python from sklearn.preprocessing import StandardScaler

scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) ```

Key Takeaway

SVM finds the optimal boundary with maximum margin. Use RBF kernel as default, always scale your features, and tune C for the bias-variance tradeoff. Great for smaller datasets with clear separation, but consider tree-based methods for large tabular data.

Support Vector Machines (SVM) Explained

Support Vector Machines (SVM) Explained

The Core Idea

Visual Intuition

Implementation

SVM needs scaled features! svm_pipeline = Pipeline([ ('scaler', StandardScaler()), ('svm', SVC(kernel='rbf', C=1.0)) ])

The Kernel Trick

RBF kernel - default, handles non-linear SVC(kernel='rbf', gamma='scale')

Polynomial kernel SVC(kernel='poly', degree=3) ```

Key Parameters

When to Use SVM

Important: Scale Your Features!

Key Takeaway

More on ML

What is Machine Learning? A Simple Introduction

Supervised vs Unsupervised Learning Explained

Understanding Training, Validation, and Test Sets