K-Nearest Neighbors
Classify based on nearest similar data points.
Learn from your neighbors.
What is KNN?
"Tell me who your friends are, I'll tell you who you are"
Classify based on K nearest similar examples.
Simple Example
Classifying new house in Denver:
Your house: 1500 sq ft, $400k
**3 Nearest Neighbors**: - House A: 1450 sq ft, $390k → Affordable - House B: 1520 sq ft, $410k → Affordable - House C: 1480 sq ft, $395k → Affordable
**Result**: Your house is "Affordable" (3/3 agree)
How It Works
1. Calculate distance to all points 2. Find K nearest neighbors 3. Take majority vote 4. Assign category
Python Code
```python from sklearn.neighbors import KNeighborsClassifier
Data: [size_sqft, price_k] X = [[1000, 300], [1500, 400], [2000, 600], [2500, 800]] y = ['affordable', 'affordable', 'expensive', 'expensive']
K=3 means check 3 nearest neighbors model = KNeighborsClassifier(n_neighbors=3) model.fit(X, y)
Predict new_house = [[1800, 550]] result = model.predict(new_house) print(result) # 'expensive' ```
Choosing K
**K=1**: Too sensitive to noise **K=100**: Too general **K=3 to 7**: Usually good starting point
Distance Metrics
**Euclidean**: Straight line distance (most common) **Manhattan**: Grid-like distance **Cosine**: Angle-based
Advantages
- Simple to understand - No training needed - Works for classification and regression
Disadvantages
- Slow for large datasets - Sensitive to feature scaling - Doesn't work well in high dimensions
Remember
- Scale your features! - Start with K=5 - Fast to implement, slow to predict