Group similar things together.

What is K-Means?

Automatically groups similar data points.

No labels needed! (Unsupervised learning)

Real Example

Group customers in San Francisco by shopping behavior:

**Cluster 1**: Young, low spending **Cluster 2**: Middle-age, medium spending **Cluster 3**: Older, high spending

How It Works

1. Choose K (number of groups) 2. Place K random center points 3. Assign each point to nearest center 4. Move centers to group average 5. Repeat until stable

Python Code

```python from sklearn.cluster import KMeans import numpy as np

Customer data: [age, annual_purchases] X = np.array([ [25, 10], [22, 8], [50, 50], [48, 55], [70, 80], [72, 85] ])

Find 3 clusters model = KMeans(n_clusters=3) model.fit(X)

Get cluster assignments labels = model.labels_ print(labels) # [0, 0, 1, 1, 2, 2]

Predict cluster for new customer new_customer = [[30, 15]] cluster = model.predict(new_customer) print(f"Customer belongs to cluster {cluster[0]}") ```

Choosing K

Use "Elbow Method": - Try different K values - Plot total distance - Choose "elbow" point

Applications

- Customer segmentation - Image compression - Document clustering - Anomaly detection

Advantages

- Simple and fast - Scales to large data - Easy to implement

Disadvantages

- Must choose K manually - Sensitive to initial centers - Assumes spherical clusters

Remember

- Start with 3-5 clusters - Scale features first - Try multiple runs

Clustering with K-Means