AI6 min read

Clustering with K-Means

Group similar data points automatically.

Robert Anderson
December 18, 2025
0.0k0

Group similar things together.

What is K-Means?

Automatically groups similar data points.

No labels needed! (Unsupervised learning)

Real Example

Group customers in San Francisco by shopping behavior:

Cluster 1: Young, low spending
Cluster 2: Middle-age, medium spending
Cluster 3: Older, high spending

How It Works

  1. Choose K (number of groups)
  2. Place K random center points
  3. Assign each point to nearest center
  4. Move centers to group average
  5. Repeat until stable

Python Code

from sklearn.cluster import KMeans
import numpy as np

# Customer data: [age, annual_purchases]
X = np.array([
    [25, 10], [22, 8], [50, 50], 
    [48, 55], [70, 80], [72, 85]
])

# Find 3 clusters
model = KMeans(n_clusters=3)
model.fit(X)

# Get cluster assignments
labels = model.labels_
print(labels)  # [0, 0, 1, 1, 2, 2]

# Predict cluster for new customer
new_customer = [[30, 15]]
cluster = model.predict(new_customer)
print(f"Customer belongs to cluster {cluster[0]}")

Choosing K

Use "Elbow Method":

  • Try different K values
  • Plot total distance
  • Choose "elbow" point

Applications

  • Customer segmentation
  • Image compression
  • Document clustering
  • Anomaly detection

Advantages

  • Simple and fast
  • Scales to large data
  • Easy to implement

Disadvantages

  • Must choose K manually
  • Sensitive to initial centers
  • Assumes spherical clusters

Remember

  • Start with 3-5 clusters
  • Scale features first
  • Try multiple runs
#AI#Intermediate#Clustering