Hierarchical Clustering: Building a Cluster Tree
Learn how hierarchical clustering builds a tree of clusters and how to read dendrograms.
Hierarchical Clustering: Building a Cluster Tree
Unlike K-Means, hierarchical clustering doesn't need K upfront. It builds a tree of clusters you can cut at any level.
Two Approaches
Agglomerative (Bottom-up): Start with each point as its own cluster, merge closest pairs
Divisive (Top-down): Start with one cluster, split recursively
Agglomerative is most common.
How Agglomerative Works
- Each point is a cluster
- Find two closest clusters
- Merge them
- Repeat until one cluster remains
Implementation
from sklearn.cluster import AgglomerativeClustering
from sklearn.preprocessing import StandardScaler
# Scale data
X_scaled = StandardScaler().fit_transform(X)
# Cluster
hc = AgglomerativeClustering(n_clusters=3, linkage='ward')
clusters = hc.fit_predict(X_scaled)
The Dendrogram
A dendrogram shows the merging process. Height indicates distance when clusters merged.
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt
# Create linkage matrix
Z = linkage(X_scaled, method='ward')
# Plot dendrogram
plt.figure(figsize=(10, 7))
dendrogram(Z)
plt.xlabel('Sample Index')
plt.ylabel('Distance')
plt.title('Hierarchical Clustering Dendrogram')
plt.show()
Reading a Dendrogram
_______________
| |
___|___ ___|___
| | | |
_|_ _|_ _|_ _|_
| | | | | | | |
1 2 3 4 5 6 7 8
Cut here → 2 clusters
Cut lower → More clusters
- Vertical lines = clusters being merged
- Height = distance when merged
- Cut horizontally to get clusters
Linkage Methods
How to measure distance between clusters:
| Method | Description | Best For |
|---|---|---|
| ward | Minimize variance | Compact, equal-sized |
| complete | Max distance between points | Well-separated |
| average | Mean distance | General purpose |
| single | Min distance | Can find elongated shapes |
# Ward - most common
AgglomerativeClustering(linkage='ward')
# Complete linkage
AgglomerativeClustering(linkage='complete')
Choosing Number of Clusters
From Dendrogram
Cut where there's a big jump in height:
from scipy.cluster.hierarchy import fcluster
# Cut at specific distance
clusters = fcluster(Z, t=10, criterion='distance')
# Or specify number of clusters
clusters = fcluster(Z, t=3, criterion='maxclust')
Using Metrics
from sklearn.metrics import silhouette_score
for n in range(2, 10):
hc = AgglomerativeClustering(n_clusters=n)
labels = hc.fit_predict(X_scaled)
score = silhouette_score(X_scaled, labels)
print(f"n={n}: silhouette={score:.3f}")
K-Means vs Hierarchical
| Aspect | K-Means | Hierarchical |
|---|---|---|
| Need K upfront | Yes | No |
| Speed | Fast | Slower |
| Scalability | Good | Poor for large data |
| Output | Flat clusters | Tree structure |
| Deterministic | No | Yes |
When to Use Hierarchical
Good for:
- Small to medium datasets
- When you want to explore cluster structure
- When hierarchy makes sense (taxonomy, evolution)
- When K is unknown
Not ideal for:
- Large datasets (O(n²) memory)
- When you need to add new points later
Key Takeaway
Hierarchical clustering gives you a complete picture of cluster structure through the dendrogram. Use it when you want to explore different numbers of clusters or when a hierarchy is meaningful. For large datasets, stick with K-Means.