A data structure enabling fast key-value storage and retrieval via hash functions.
A hash table, also known as a hash map, is a foundational data structure that stores key-value pairs and enables near-constant-time lookups, insertions, and deletions. At its core, a hash function transforms an input key — which can be a string, integer, or other object — into an integer index that points to a specific slot in an underlying array. When the hash function distributes keys uniformly, each operation achieves an average time complexity of O(1), making hash tables among the most efficient structures for associative data storage.
Collisions — cases where two distinct keys map to the same index — are an inevitable challenge in hash table design. Two primary strategies address this: chaining, where each array slot holds a linked list of all entries that hash to that index, and open addressing, where a collision triggers a search for the next available slot according to a probing sequence. The choice of collision resolution strategy, combined with the quality of the hash function and the table's load factor (ratio of stored entries to total slots), determines real-world performance. A well-tuned hash table degrades gracefully under load, while a poorly designed one can collapse to O(n) performance.
In machine learning and AI systems, hash tables appear throughout the data pipeline and model infrastructure. They underlie vocabulary lookups in natural language processing, feature indexing in recommendation systems, and caching mechanisms in inference engines. The "hashing trick" — a technique that uses hash functions to map high-dimensional sparse features into a fixed-size vector — directly applies hash table principles to reduce memory overhead in large-scale models. Frameworks like TensorFlow and PyTorch rely on hash-based structures internally for gradient bookkeeping and tensor indexing.
Beyond infrastructure, hash tables inform algorithmic techniques such as locality-sensitive hashing (LSH), which enables approximate nearest-neighbor search in high-dimensional embedding spaces — a critical operation in retrieval-augmented generation and semantic search. Their blend of simplicity and performance has made hash tables indispensable at every layer of modern AI systems, from raw data ingestion to serving predictions at scale.