A bottom-up hierarchical clustering method that progressively merges similar data points.
Agglomerative clustering is a hierarchical unsupervised learning technique that builds clusters from the ground up, starting with each data point as its own singleton cluster and iteratively merging the most similar pairs until a single all-encompassing cluster remains. The result is a tree-like structure called a dendrogram, which visually encodes the sequence and similarity levels at which merges occurred. Practitioners can "cut" the dendrogram at any level to obtain a desired number of clusters, making the method flexible without requiring the number of clusters to be specified in advance.
The algorithm's behavior is governed by two key choices: a distance metric and a linkage criterion. Common distance metrics include Euclidean, Manhattan, and cosine distance. Linkage criteria determine how the distance between two candidate clusters is computed: single linkage uses the minimum pairwise distance, complete linkage uses the maximum, average linkage averages all pairwise distances, and Ward's linkage minimizes the total within-cluster variance after merging. Ward's method tends to produce compact, evenly sized clusters and is widely used in practice. Naive implementations run in O(n³) time, though optimized approaches using priority queues can reduce this to O(n² log n), which still limits scalability to datasets of moderate size.
Agglomerative clustering is valued for its interpretability and its ability to reveal multi-scale structure in data without strong distributional assumptions. It is widely applied in bioinformatics for gene expression analysis, in natural language processing for document grouping, and in customer segmentation tasks where understanding the hierarchy of groupings is as important as the groupings themselves. Unlike k-means, it does not require a predefined cluster count and handles non-spherical cluster shapes more gracefully, though it is sensitive to noise and outliers, particularly under single linkage, which can produce the "chaining" effect where clusters elongate rather than compact. Its deterministic output—given fixed inputs—also makes it reproducible, a practical advantage over stochastic alternatives.