Clustering with K-Means
Group similar data points automatically.
Group similar things together.
What is K-Means?
Automatically groups similar data points.
No labels needed! (Unsupervised learning)
Real Example
Group customers in San Francisco by shopping behavior:
**Cluster 1**: Young, low spending **Cluster 2**: Middle-age, medium spending **Cluster 3**: Older, high spending
How It Works
1. Choose K (number of groups) 2. Place K random center points 3. Assign each point to nearest center 4. Move centers to group average 5. Repeat until stable
Python Code
```python from sklearn.cluster import KMeans import numpy as np
Customer data: [age, annual_purchases] X = np.array([ [25, 10], [22, 8], [50, 50], [48, 55], [70, 80], [72, 85] ])
Find 3 clusters model = KMeans(n_clusters=3) model.fit(X)
Get cluster assignments labels = model.labels_ print(labels) # [0, 0, 1, 1, 2, 2]
Predict cluster for new customer new_customer = [[30, 15]] cluster = model.predict(new_customer) print(f"Customer belongs to cluster {cluster[0]}") ```
Choosing K
Use "Elbow Method": - Try different K values - Plot total distance - Choose "elbow" point
Applications
- Customer segmentation - Image compression - Document clustering - Anomaly detection
Advantages
- Simple and fast - Scales to large data - Easy to implement
Disadvantages
- Must choose K manually - Sensitive to initial centers - Assumes spherical clusters
Remember
- Start with 3-5 clusters - Scale features first - Try multiple runs