Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Similarity Search

Similarity Search

Finding the most similar items to a query within a large dataset.

Year: 1990Generality: 794
Back to Vocab

Similarity search is the computational task of retrieving data points from a dataset that are most similar to a given query item, measured according to some distance or similarity metric. Rather than looking for exact matches, the goal is to find items that are "close" in some meaningful sense — whether that means images with similar visual content, documents with related topics, or product embeddings with comparable feature representations. The choice of metric is central to the approach: Euclidean distance works well for dense numeric data, cosine similarity is preferred for high-dimensional text or embedding vectors, and Manhattan distance suits certain structured data types.

The core challenge of similarity search is scale. Naively comparing a query against every item in a dataset — a brute-force linear scan — becomes computationally prohibitive as datasets grow into the millions or billions of items. To address this, researchers developed specialized data structures and algorithms that organize data to prune the search space. Classic approaches include KD-trees and ball trees, which partition space hierarchically to accelerate exact nearest-neighbor lookups. For very high-dimensional data, however, these exact methods suffer from the curse of dimensionality and lose their efficiency advantage.

Approximate nearest neighbor (ANN) algorithms emerged as the practical solution for large-scale, high-dimensional similarity search. Techniques such as locality-sensitive hashing (LSH), hierarchical navigable small world graphs (HNSW), and product quantization allow systems to find results that are very likely to be among the true nearest neighbors, at a fraction of the computational cost of exact search. These methods trade a small, controllable amount of accuracy for dramatic gains in speed and memory efficiency, making real-time similarity search feasible at web scale.

Similarity search has become a foundational primitive in modern machine learning infrastructure. With the rise of dense vector embeddings produced by neural networks — for text, images, audio, and more — vector databases built around ANN search have become critical components in retrieval-augmented generation (RAG) systems, semantic search engines, recommendation pipelines, and multimodal AI applications. The ability to efficiently search embedding spaces is now central to deploying large-scale AI systems in production.

Related

Related

Similarity Computation
Similarity Computation

Quantifying how alike two data objects are to support learning algorithms.

Generality: 709
Similarity Learning
Similarity Learning

Training models to measure meaningful similarity between data points for comparison tasks.

Generality: 694
Cosine Similarity
Cosine Similarity

A measure of angular similarity between two vectors, regardless of their magnitude.

Generality: 796
Vector Database
Vector Database

A database optimized for storing and searching high-dimensional vector embeddings.

Generality: 620
PQ (Product Quantization)
PQ (Product Quantization)

Compresses high-dimensional vectors into compact codes for fast approximate similarity search.

Generality: 521
Dot Product Similarity
Dot Product Similarity

Quantifies vector similarity by summing the products of corresponding elements.

Generality: 694