Hierarchical Clustering: Building a Cluster Tree
Learn how hierarchical clustering builds a tree of clusters and how to read dendrograms.
Hierarchical Clustering: Building a Cluster Tree
Unlike K-Means, hierarchical clustering doesn't need K upfront. It builds a tree of clusters you can cut at any level.
Two Approaches
**Agglomerative (Bottom-up):** Start with each point as its own cluster, merge closest pairs **Divisive (Top-down):** Start with one cluster, split recursively
Agglomerative is most common.
How Agglomerative Works
1. Each point is a cluster 2. Find two closest clusters 3. Merge them 4. Repeat until one cluster remains
Implementation
```python from sklearn.cluster import AgglomerativeClustering from sklearn.preprocessing import StandardScaler
Scale data X_scaled = StandardScaler().fit_transform(X)
Cluster hc = AgglomerativeClustering(n_clusters=3, linkage='ward') clusters = hc.fit_predict(X_scaled) ```
The Dendrogram
A dendrogram shows the merging process. Height indicates distance when clusters merged.
```python from scipy.cluster.hierarchy import dendrogram, linkage import matplotlib.pyplot as plt
Create linkage matrix Z = linkage(X_scaled, method='ward')
Plot dendrogram plt.figure(figsize=(10, 7)) dendrogram(Z) plt.xlabel('Sample Index') plt.ylabel('Distance') plt.title('Hierarchical Clustering Dendrogram') plt.show() ```
Reading a Dendrogram
``` _______________ | | ___|___ ___|___ | | | | _|_ _|_ _|_ _|_ | | | | | | | | 1 2 3 4 5 6 7 8 Cut here → 2 clusters Cut lower → More clusters ```
- Vertical lines = clusters being merged - Height = distance when merged - Cut horizontally to get clusters
Linkage Methods
How to measure distance between clusters:
| Method | Description | Best For | |--------|-------------|----------| | ward | Minimize variance | Compact, equal-sized | | complete | Max distance between points | Well-separated | | average | Mean distance | General purpose | | single | Min distance | Can find elongated shapes |
```python # Ward - most common AgglomerativeClustering(linkage='ward')
Complete linkage AgglomerativeClustering(linkage='complete') ```
Choosing Number of Clusters
### From Dendrogram
Cut where there's a big jump in height:
```python from scipy.cluster.hierarchy import fcluster
Cut at specific distance clusters = fcluster(Z, t=10, criterion='distance')
Or specify number of clusters clusters = fcluster(Z, t=3, criterion='maxclust') ```
### Using Metrics
```python from sklearn.metrics import silhouette_score
for n in range(2, 10): hc = AgglomerativeClustering(n_clusters=n) labels = hc.fit_predict(X_scaled) score = silhouette_score(X_scaled, labels) print(f"n={n}: silhouette={score:.3f}") ```
K-Means vs Hierarchical
| Aspect | K-Means | Hierarchical | |--------|---------|--------------| | Need K upfront | Yes | No | | Speed | Fast | Slower | | Scalability | Good | Poor for large data | | Output | Flat clusters | Tree structure | | Deterministic | No | Yes |
When to Use Hierarchical
**Good for:** - Small to medium datasets - When you want to explore cluster structure - When hierarchy makes sense (taxonomy, evolution) - When K is unknown
**Not ideal for:** - Large datasets (O(n²) memory) - When you need to add new points later
Key Takeaway
Hierarchical clustering gives you a complete picture of cluster structure through the dendrogram. Use it when you want to explore different numbers of clusters or when a hierarchy is meaningful. For large datasets, stick with K-Means.