Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. PQ (Product Quantization)

PQ (Product Quantization)

Compresses high-dimensional vectors into compact codes for fast approximate similarity search.

Year: 2011Generality: 521
Back to Vocab

Product quantization (PQ) is a vector compression technique designed to make similarity search tractable over massive, high-dimensional datasets. The core idea is to decompose a high-dimensional vector into a fixed number of lower-dimensional sub-vectors, then independently quantize each sub-vector using its own learned codebook — a small set of representative centroids trained via k-means clustering. The original vector is approximated by concatenating the nearest centroid indices from each sub-codebook, producing a compact binary code that can be stored and compared far more efficiently than the raw floating-point representation.

At query time, distances between a query vector and database entries can be computed using precomputed lookup tables — one per sub-space — that store distances from the query's sub-vectors to each centroid. This asymmetric distance computation (ADC) allows approximate nearest neighbor (ANN) distances to be estimated with just a handful of table lookups and additions, replacing expensive floating-point dot products. The result is a system that can scan millions of compressed vectors in milliseconds while consuming only a fraction of the memory that raw vectors would require.

PQ became foundational in large-scale information retrieval after Hervé Jégou, Matthijs Douze, and Cordelia Schmid formalized and popularized it in their 2011 paper. It underpins widely used libraries such as FAISS (Facebook AI Similarity Search), which combines PQ with inverted index structures to scale ANN search to billions of vectors. The technique is central to applications including image retrieval, recommendation systems, and dense vector search in modern retrieval-augmented generation (RAG) pipelines.

While PQ introduces approximation error — the reconstructed vector is never identical to the original — the tradeoff between accuracy and efficiency is highly controllable. Practitioners tune the number of sub-spaces and codebook size to balance recall against memory and latency budgets. Extensions such as Optimized Product Quantization (OPQ) and Residual Quantization (RQ) have since improved accuracy by rotating the vector space before quantization or applying quantization hierarchically, but the original PQ framework remains the dominant paradigm for compressed vector search at scale.

Related

Related

Quantization
Quantization

Reducing numerical precision of model weights and activations to shrink size and accelerate inference.

Generality: 794
TurboQuant
TurboQuant

A high-speed quantization framework for compressing neural networks with minimal accuracy loss.

Generality: 94
BQL (Binary Quantization Learning)
BQL (Binary Quantization Learning)

A technique that compresses neural networks by reducing weights and activations to binary values.

Generality: 174
Similarity Search
Similarity Search

Finding the most similar items to a query within a large dataset.

Generality: 794
LAQ (Locally-Adaptive Quantization)
LAQ (Locally-Adaptive Quantization)

Quantization method that adjusts precision locally based on data characteristics for better efficiency.

Generality: 101
Vector Database
Vector Database

A database optimized for storing and searching high-dimensional vector embeddings.

Generality: 620