Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Bigram Language Model

Bigram Language Model

A statistical model that predicts each word based solely on the preceding word.

Year: 1980Generality: 574
Back to Vocab

A bigram language model is a statistical approach to natural language processing that estimates the probability of a word given only the single word that immediately precedes it. Formally, it applies the Markov assumption to language: the probability of a sequence of words is approximated as the product of conditional probabilities, where each word depends only on its immediate predecessor. This reduces the intractable problem of modeling full sentence context to a manageable set of pairwise word co-occurrence statistics that can be estimated directly from a text corpus.

Training a bigram model involves counting how often each word pair appears in a corpus and normalizing those counts into conditional probabilities. For example, if the word "machine" is followed by "learning" 800 times out of 1,000 occurrences of "machine," the model assigns P(learning | machine) = 0.8. In practice, smoothing techniques such as Laplace smoothing or Kneser-Ney smoothing are applied to handle unseen word pairs that never appeared in training data, preventing zero-probability assignments that would break downstream calculations.

Bigram models became a foundational tool in NLP tasks including automatic speech recognition, spelling correction, predictive text, and early machine translation systems. Their appeal lies in computational simplicity: the model requires only a vocabulary-sized matrix of transition probabilities and can score or generate text extremely efficiently. However, the strict two-word window means the model is blind to longer-range dependencies — it cannot capture that "the bank" means something different in "the bank of a river" versus "the bank approved the loan."

Despite being largely superseded by neural language models such as RNNs, LSTMs, and transformers, bigram models remain relevant as baselines, as components in larger systems, and as pedagogical tools for understanding probabilistic language modeling. They also appear as a special case within n-gram model families, and the intuitions behind them — counting co-occurrences, applying smoothing, evaluating with perplexity — carry forward directly into modern NLP practice.

Related

Related

Trigrams
Trigrams

A sequence of three consecutive tokens used in language modeling and NLP.

Generality: 420
N-gram
N-gram

A contiguous sequence of N items drawn from text or speech.

Generality: 700
Next Word Prediction
Next Word Prediction

A training objective where models learn to predict the next token in a sequence.

Generality: 794
DLMs (Deep Language Models)
DLMs (Deep Language Models)

Deep neural networks trained to understand, generate, and translate human language.

Generality: 796
LLM (Large Language Model)
LLM (Large Language Model)

Massive neural networks trained on text to understand and generate human language.

Generality: 905
Unigram Entropy
Unigram Entropy

A measure of word-level unpredictability in text, assuming each word occurs independently.

Generality: 450