ML7 min read

Hierarchical Clustering: Building a Cluster Tree

Learn how hierarchical clustering builds a tree of clusters and how to read dendrograms.

Sarah Chen
December 19, 2025
0.0k0

Hierarchical Clustering: Building a Cluster Tree

Unlike K-Means, hierarchical clustering doesn't need K upfront. It builds a tree of clusters you can cut at any level.

Two Approaches

Agglomerative (Bottom-up): Start with each point as its own cluster, merge closest pairs
Divisive (Top-down): Start with one cluster, split recursively

Agglomerative is most common.

How Agglomerative Works

  1. Each point is a cluster
  2. Find two closest clusters
  3. Merge them
  4. Repeat until one cluster remains

Implementation

from sklearn.cluster import AgglomerativeClustering
from sklearn.preprocessing import StandardScaler

# Scale data
X_scaled = StandardScaler().fit_transform(X)

# Cluster
hc = AgglomerativeClustering(n_clusters=3, linkage='ward')
clusters = hc.fit_predict(X_scaled)

The Dendrogram

A dendrogram shows the merging process. Height indicates distance when clusters merged.

from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt

# Create linkage matrix
Z = linkage(X_scaled, method='ward')

# Plot dendrogram
plt.figure(figsize=(10, 7))
dendrogram(Z)
plt.xlabel('Sample Index')
plt.ylabel('Distance')
plt.title('Hierarchical Clustering Dendrogram')
plt.show()

Reading a Dendrogram

         _______________
        |               |
     ___|___         ___|___
    |       |       |       |
   _|_     _|_     _|_     _|_
  |   |   |   |   |   |   |   |
  1   2   3   4   5   6   7   8
  
  Cut here → 2 clusters
  Cut lower → More clusters
  • Vertical lines = clusters being merged
  • Height = distance when merged
  • Cut horizontally to get clusters

Linkage Methods

How to measure distance between clusters:

Method Description Best For
ward Minimize variance Compact, equal-sized
complete Max distance between points Well-separated
average Mean distance General purpose
single Min distance Can find elongated shapes
# Ward - most common
AgglomerativeClustering(linkage='ward')

# Complete linkage
AgglomerativeClustering(linkage='complete')

Choosing Number of Clusters

From Dendrogram

Cut where there's a big jump in height:

from scipy.cluster.hierarchy import fcluster

# Cut at specific distance
clusters = fcluster(Z, t=10, criterion='distance')

# Or specify number of clusters
clusters = fcluster(Z, t=3, criterion='maxclust')

Using Metrics

from sklearn.metrics import silhouette_score

for n in range(2, 10):
    hc = AgglomerativeClustering(n_clusters=n)
    labels = hc.fit_predict(X_scaled)
    score = silhouette_score(X_scaled, labels)
    print(f"n={n}: silhouette={score:.3f}")

K-Means vs Hierarchical

Aspect K-Means Hierarchical
Need K upfront Yes No
Speed Fast Slower
Scalability Good Poor for large data
Output Flat clusters Tree structure
Deterministic No Yes

When to Use Hierarchical

Good for:

  • Small to medium datasets
  • When you want to explore cluster structure
  • When hierarchy makes sense (taxonomy, evolution)
  • When K is unknown

Not ideal for:

  • Large datasets (O(n²) memory)
  • When you need to add new points later

Key Takeaway

Hierarchical clustering gives you a complete picture of cluster structure through the dendrogram. Use it when you want to explore different numbers of clusters or when a hierarchy is meaningful. For large datasets, stick with K-Means.

#Machine Learning#Hierarchical Clustering#Dendrogram#Intermediate