A sequence of three consecutive tokens used in language modeling and NLP.
A trigram is a specific instance of an n-gram model in which sequences of exactly three consecutive tokens — words, characters, or other linguistic units — are extracted from text. As a foundational technique in natural language processing, trigrams capture local context by representing the probability of a token given the two tokens that immediately precede it. This conditional probability framework allows models to estimate how likely a particular word is to follow a given two-word context, making trigrams a natural extension of unigrams (single tokens) and bigrams (two-token sequences).
In practice, trigram language models are built by counting how often each three-token sequence appears in a large corpus, then normalizing those counts into conditional probabilities. To handle sequences that never appeared in training data — a problem known as data sparsity — techniques such as Laplace smoothing, Kneser-Ney smoothing, and interpolation with lower-order n-gram models are commonly applied. These smoothing strategies redistribute probability mass from observed sequences to unseen ones, making the model more robust when deployed on real-world text.
Trigrams have been widely applied across core NLP tasks including language modeling, machine translation, speech recognition, spell checking, and optical character recognition. Their ability to encode short-range syntactic and semantic dependencies — such as common verb phrases or prepositional patterns — gives them a meaningful advantage over unigrams and bigrams for many tasks. At the same time, trigrams are limited by their fixed window of three tokens, meaning they cannot capture long-range dependencies that span many words in a sentence.
Although neural language models such as recurrent networks and transformers have largely superseded trigram models for state-of-the-art performance, trigrams remain relevant as lightweight baselines, as features in hybrid systems, and in resource-constrained environments where neural approaches are impractical. Their simplicity, interpretability, and computational efficiency ensure they continue to appear in production systems and serve as a pedagogical entry point for understanding probabilistic language modeling.