Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Agglomerative Clustering

Agglomerative Clustering

A bottom-up hierarchical clustering method that progressively merges similar data points.

Year: 1963Generality: 694
Back to Vocab

Agglomerative clustering is a hierarchical unsupervised learning technique that builds clusters from the ground up, starting with each data point as its own singleton cluster and iteratively merging the most similar pairs until a single all-encompassing cluster remains. The result is a tree-like structure called a dendrogram, which visually encodes the sequence and similarity levels at which merges occurred. Practitioners can "cut" the dendrogram at any level to obtain a desired number of clusters, making the method flexible without requiring the number of clusters to be specified in advance.

The algorithm's behavior is governed by two key choices: a distance metric and a linkage criterion. Common distance metrics include Euclidean, Manhattan, and cosine distance. Linkage criteria determine how the distance between two candidate clusters is computed: single linkage uses the minimum pairwise distance, complete linkage uses the maximum, average linkage averages all pairwise distances, and Ward's linkage minimizes the total within-cluster variance after merging. Ward's method tends to produce compact, evenly sized clusters and is widely used in practice. Naive implementations run in O(n³) time, though optimized approaches using priority queues can reduce this to O(n² log n), which still limits scalability to datasets of moderate size.

Agglomerative clustering is valued for its interpretability and its ability to reveal multi-scale structure in data without strong distributional assumptions. It is widely applied in bioinformatics for gene expression analysis, in natural language processing for document grouping, and in customer segmentation tasks where understanding the hierarchy of groupings is as important as the groupings themselves. Unlike k-means, it does not require a predefined cluster count and handles non-spherical cluster shapes more gracefully, though it is sensitive to noise and outliers, particularly under single linkage, which can produce the "chaining" effect where clusters elongate rather than compact. Its deterministic output—given fixed inputs—also makes it reproducible, a practical advantage over stochastic alternatives.

Related

Related

Clustering
Clustering

An unsupervised learning technique that groups similar data points together automatically.

Generality: 838
Centroid-Based Clustering
Centroid-Based Clustering

Unsupervised learning method grouping data points by proximity to cluster centroids.

Generality: 694
Unsupervised Learning
Unsupervised Learning

Machine learning that discovers hidden patterns in data without labeled examples.

Generality: 850
Mixture Model
Mixture Model

A probabilistic model representing data as drawn from multiple component distributions.

Generality: 796
TDA (Topological Data Analysis)
TDA (Topological Data Analysis)

Applies algebraic topology to extract robust, shape-based features from high-dimensional data.

Generality: 520
Ensemble Methods
Ensemble Methods

Combining multiple trained models to produce predictions stronger than any single model.

Generality: 771