Selecting the k highest-scoring items from a model's output for ranking or generation.
Top-k is a selection strategy used throughout machine learning and information retrieval to restrict output to the k highest-scoring candidates from a larger set. Rather than returning all possible results or predictions, a system ranks items by some score — probability, relevance, similarity — and retains only the top k entries. This constraint serves both practical and quality-oriented goals: it reduces computational overhead in downstream processing, focuses evaluation metrics on the most meaningful predictions, and prevents low-confidence outputs from polluting results in classification, search, and recommendation tasks.
In generative language modeling, top-k sampling has become a standard decoding technique. At each token generation step, the model computes a probability distribution over its entire vocabulary, then zeroes out all tokens except the k most probable before sampling. This prevents the model from selecting highly improbable tokens that would derail coherent text generation, while still preserving enough diversity to avoid the repetitive outputs that come from purely greedy decoding. The value of k is a tunable hyperparameter: small values produce more focused, deterministic text, while larger values introduce more variety and creative range.
Beyond text generation, top-k appears in recommendation systems, where a model scores thousands of candidate items and surfaces only the k most relevant to a user; in approximate nearest-neighbor search, where retrieval systems return the k closest vectors to a query embedding; and in evaluation protocols, where metrics like Precision@k and Recall@k measure how well a ranked list performs within its top k positions. These metrics are especially important in settings where users only examine the first few results, making the quality of the top-k slice more consequential than overall ranking accuracy.
Top-k is closely related to top-p (nucleus) sampling, which selects the smallest set of tokens whose cumulative probability exceeds a threshold p, offering a more adaptive alternative. Together, these techniques form the core toolkit for controlling the trade-off between output quality and diversity in modern generative models. The concept's simplicity and broad applicability have made it one of the most pervasive selection primitives across the entire ML stack.