Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Similarity Computation

Similarity Computation

Quantifying how alike two data objects are to support learning algorithms.

Year: 1990Generality: 709
Back to Vocab

Similarity computation is the process of measuring how closely related two or more data objects are, typically by applying a mathematical metric that produces a scalar score. Common measures include Euclidean distance, which captures geometric proximity in continuous feature spaces; cosine similarity, which measures the angle between vector representations and is widely used in text and embedding models; and the Jaccard index, which compares set overlap for discrete or binary data. The choice of metric is not arbitrary — it encodes assumptions about the structure of the data and directly shapes what a model considers "close" or "far."

In machine learning, similarity computation sits at the heart of a broad range of algorithms. K-Nearest Neighbors (KNN) classifies a query point by aggregating the labels of its closest neighbors, making the similarity metric the primary driver of accuracy. Clustering algorithms such as k-means and DBSCAN partition data by grouping points that are mutually similar. Recommendation systems identify items or users that resemble a target profile, and retrieval-augmented generation pipelines rank candidate documents by their embedding similarity to a query. In each case, the quality of the similarity measure determines the quality of the downstream result.

The rise of deep learning introduced learned similarity functions, where neural networks are trained to embed inputs into a latent space such that semantically related objects land near each other. Siamese networks and contrastive learning frameworks like SimCLR and CLIP exemplify this approach, optimizing an embedding space so that similarity in that space aligns with human-meaningful notions of likeness. This shift from hand-crafted metrics to learned representations dramatically improved performance on tasks like face verification, image retrieval, and cross-modal search.

As datasets grow in scale and dimensionality, computing exact similarity between all pairs of objects becomes computationally prohibitive. Approximate nearest neighbor (ANN) methods — including locality-sensitive hashing and hierarchical navigable small world (HNSW) graphs — make large-scale similarity search tractable, enabling real-time retrieval over billions of vectors. Efficient similarity computation is therefore not just a theoretical concern but a practical infrastructure challenge central to modern AI systems.

Related

Related

Similarity Search
Similarity Search

Finding the most similar items to a query within a large dataset.

Generality: 794
Similarity Learning
Similarity Learning

Training models to measure meaningful similarity between data points for comparison tasks.

Generality: 694
Cosine Similarity
Cosine Similarity

A measure of angular similarity between two vectors, regardless of their magnitude.

Generality: 796
Dot Product Similarity
Dot Product Similarity

Quantifies vector similarity by summing the products of corresponding elements.

Generality: 694
Similarity Masking
Similarity Masking

Suppressing redundant or overly similar features to sharpen model focus on distinct information.

Generality: 293
Embedding Space
Embedding Space

A learned vector space where similar data points cluster geometrically close together.

Generality: 794