Algorithms that find vectors close to a query in high-dimensional space, trading exactness for speed.
Approximate Nearest Neighbor, or ANN, is a family of algorithms that find vectors close to a given query in a high-dimensional vector space, accepting small errors in the result in exchange for orders-of-magnitude speedup over exhaustive search. The technique underlies nearly all production-scale vector search and embedding-based retrieval systems, where the cost of comparing a query against every stored vector would be prohibitive. The acronym ANN is sometimes confused with artificial neural networks, which is unrelated.
ANN algorithms work by precomputing a data structure over the vector collection that can be queried in sub-linear time. The most common families are graph-based (HNSW — Hierarchical Navigable Small World — builds a multi-layer proximity graph and walks it), tree-based (KD-trees, Annoy's random projection trees), and quantization-based (PQ — Product Quantization — compresses vectors into compact codes that are cheap to compare). Some libraries combine approaches: IVF-PQ indexes vectors with inverted-file bucketing plus product quantization, achieving high recall at very large scale. The accuracy, speed, and memory trade-off is controlled by hyperparameters — graph degree, number of probes, codebook size — and is reported as recall at K against an exact kNN baseline.
ANN indexes can answer queries in milliseconds on collections of billions of vectors, but they require significant memory — often more than the raw vector data — and offer no exact guarantees: a small fraction of true nearest neighbors are typically missed in exchange for the speedup. Index construction is also more expensive than a flat scan, and the indexes are difficult to update incrementally — most production systems rebuild or re-merge periodically rather than inserting one vector at a time. The dominant cost is RAM; recent work has focused on disk-based and SSD-based ANN to break that ceiling.
Open questions include whether ANN will remain dominant as embedding dimensions grow into the thousands and as model architectures push toward long-context or sparse representations. Newer approaches like ScaNN, DiskANN, and various learned-index techniques are pushing the recall-versus-cost frontier, and there is no consensus on a winning family. For very small corpora, exhaustive kNN remains competitive because the constant factors of ANN indexes dominate; for very large corpora, the open question is how to keep indexes fresh without full rebuilds. The relationship between ANN and modern retrieval-augmented generation is also evolving as the bottleneck shifts from vector lookup to context construction.