PQ (Product Quantization)

Product quantization (PQ) is a vector compression technique designed to make similarity search tractable over massive, high-dimensional datasets. The core idea is to decompose a high-dimensional vector into a fixed number of lower-dimensional sub-vectors, then independently quantize each sub-vector using its own learned codebook — a small set of representative centroids trained via k-means clustering. The original vector is approximated by concatenating the nearest centroid indices from each sub-codebook, producing a compact binary code that can be stored and compared far more efficiently than the raw floating-point representation.

At query time, distances between a query vector and database entries can be computed using precomputed lookup tables — one per sub-space — that store distances from the query's sub-vectors to each centroid. This asymmetric distance computation (ADC) allows approximate nearest neighbor (ANN) distances to be estimated with just a handful of table lookups and additions, replacing expensive floating-point dot products. The result is a system that can scan millions of compressed vectors in milliseconds while consuming only a fraction of the memory that raw vectors would require.

PQ became foundational in large-scale information retrieval after Hervé Jégou, Matthijs Douze, and Cordelia Schmid formalized and popularized it in their 2011 paper. It underpins widely used libraries such as FAISS (Facebook AI Similarity Search), which combines PQ with inverted index structures to scale ANN search to billions of vectors. The technique is central to applications including image retrieval, recommendation systems, and dense vector search in modern retrieval-augmented generation (RAG) pipelines.

While PQ introduces approximation error — the reconstructed vector is never identical to the original — the tradeoff between accuracy and efficiency is highly controllable. Practitioners tune the number of sub-spaces and codebook size to balance recall against memory and latency budgets. Extensions such as Optimized Product Quantization (OPQ) and Residual Quantization (RQ) have since improved accuracy by rotating the vector space before quantization or applying quantization hierarchically, but the original PQ framework remains the dominant paradigm for compressed vector search at scale.

PQ (Product Quantization)

Related

PQ (Product Quantization)

Related

Related

Related