Learn how hierarchical clustering builds a tree of clusters and how to read dendrograms.

Hierarchical Clustering: Building a Cluster Tree

Unlike K-Means, hierarchical clustering doesn't need K upfront. It builds a tree of clusters you can cut at any level.

Two Approaches

**Agglomerative (Bottom-up):** Start with each point as its own cluster, merge closest pairs **Divisive (Top-down):** Start with one cluster, split recursively

Agglomerative is most common.

How Agglomerative Works

1. Each point is a cluster 2. Find two closest clusters 3. Merge them 4. Repeat until one cluster remains

Implementation

```python from sklearn.cluster import AgglomerativeClustering from sklearn.preprocessing import StandardScaler

Scale data X_scaled = StandardScaler().fit_transform(X)

Cluster hc = AgglomerativeClustering(n_clusters=3, linkage='ward') clusters = hc.fit_predict(X_scaled) ```

The Dendrogram

A dendrogram shows the merging process. Height indicates distance when clusters merged.

```python from scipy.cluster.hierarchy import dendrogram, linkage import matplotlib.pyplot as plt

Create linkage matrix Z = linkage(X_scaled, method='ward')

Plot dendrogram plt.figure(figsize=(10, 7)) dendrogram(Z) plt.xlabel('Sample Index') plt.ylabel('Distance') plt.title('Hierarchical Clustering Dendrogram') plt.show() ```

Reading a Dendrogram

``` _______________ | | ___|___ ___|___ | | | | _|_ _|_ _|_ _|_ | | | | | | | | 1 2 3 4 5 6 7 8 Cut here → 2 clusters Cut lower → More clusters ```

- Vertical lines = clusters being merged - Height = distance when merged - Cut horizontally to get clusters

Linkage Methods

How to measure distance between clusters:

| Method | Description | Best For | |--------|-------------|----------| | ward | Minimize variance | Compact, equal-sized | | complete | Max distance between points | Well-separated | | average | Mean distance | General purpose | | single | Min distance | Can find elongated shapes |

```python # Ward - most common AgglomerativeClustering(linkage='ward')

Complete linkage AgglomerativeClustering(linkage='complete') ```

Choosing Number of Clusters

### From Dendrogram

Cut where there's a big jump in height:

```python from scipy.cluster.hierarchy import fcluster

Cut at specific distance clusters = fcluster(Z, t=10, criterion='distance')

Or specify number of clusters clusters = fcluster(Z, t=3, criterion='maxclust') ```

### Using Metrics

```python from sklearn.metrics import silhouette_score

for n in range(2, 10): hc = AgglomerativeClustering(n_clusters=n) labels = hc.fit_predict(X_scaled) score = silhouette_score(X_scaled, labels) print(f"n={n}: silhouette={score:.3f}") ```

K-Means vs Hierarchical

| Aspect | K-Means | Hierarchical | |--------|---------|--------------| | Need K upfront | Yes | No | | Speed | Fast | Slower | | Scalability | Good | Poor for large data | | Output | Flat clusters | Tree structure | | Deterministic | No | Yes |

When to Use Hierarchical

**Good for:** - Small to medium datasets - When you want to explore cluster structure - When hierarchy makes sense (taxonomy, evolution) - When K is unknown

**Not ideal for:** - Large datasets (O(n²) memory) - When you need to add new points later

Key Takeaway

Hierarchical clustering gives you a complete picture of cluster structure through the dendrogram. Use it when you want to explore different numbers of clusters or when a hierarchy is meaningful. For large datasets, stick with K-Means.

Hierarchical Clustering: Building a Cluster Tree

Hierarchical Clustering: Building a Cluster Tree

Two Approaches

How Agglomerative Works

Implementation

Scale data X_scaled = StandardScaler().fit_transform(X)

Cluster hc = AgglomerativeClustering(n_clusters=3, linkage='ward') clusters = hc.fit_predict(X_scaled) ```

The Dendrogram

Create linkage matrix Z = linkage(X_scaled, method='ward')

Plot dendrogram plt.figure(figsize=(10, 7)) dendrogram(Z) plt.xlabel('Sample Index') plt.ylabel('Distance') plt.title('Hierarchical Clustering Dendrogram') plt.show() ```

Reading a Dendrogram

Linkage Methods

Complete linkage AgglomerativeClustering(linkage='complete') ```

Choosing Number of Clusters

Cut at specific distance clusters = fcluster(Z, t=10, criterion='distance')

Or specify number of clusters clusters = fcluster(Z, t=3, criterion='maxclust') ```

K-Means vs Hierarchical

When to Use Hierarchical

Key Takeaway

More on ML

What is Machine Learning? A Simple Introduction

Supervised vs Unsupervised Learning Explained

Understanding Training, Validation, and Test Sets