AI6 min read
Clustering with K-Means
Group similar data points automatically.
Robert Anderson
December 18, 2025
0.0k0
Group similar things together.
What is K-Means?
Automatically groups similar data points.
No labels needed! (Unsupervised learning)
Real Example
Group customers in San Francisco by shopping behavior:
Cluster 1: Young, low spending
Cluster 2: Middle-age, medium spending
Cluster 3: Older, high spending
How It Works
- Choose K (number of groups)
- Place K random center points
- Assign each point to nearest center
- Move centers to group average
- Repeat until stable
Python Code
from sklearn.cluster import KMeans
import numpy as np
# Customer data: [age, annual_purchases]
X = np.array([
[25, 10], [22, 8], [50, 50],
[48, 55], [70, 80], [72, 85]
])
# Find 3 clusters
model = KMeans(n_clusters=3)
model.fit(X)
# Get cluster assignments
labels = model.labels_
print(labels) # [0, 0, 1, 1, 2, 2]
# Predict cluster for new customer
new_customer = [[30, 15]]
cluster = model.predict(new_customer)
print(f"Customer belongs to cluster {cluster[0]}")
Choosing K
Use "Elbow Method":
- Try different K values
- Plot total distance
- Choose "elbow" point
Applications
- Customer segmentation
- Image compression
- Document clustering
- Anomaly detection
Advantages
- Simple and fast
- Scales to large data
- Easy to implement
Disadvantages
- Must choose K manually
- Sensitive to initial centers
- Assumes spherical clusters
Remember
- Start with 3-5 clusters
- Scale features first
- Try multiple runs
#AI#Intermediate#Clustering