Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Context Window

Context Window

The span of text a model can see and process at one time.

Year: 2017Generality: 731
Back to Vocab

A context window defines the maximum amount of text — measured in tokens — that a language model can process in a single forward pass. For early neural language models and word embedding methods like Word2Vec, the context window was a fixed, symmetric neighborhood of words surrounding a target word, typically spanning a few tokens in each direction. The model would use these neighboring words to learn or infer the meaning of the central term. Larger windows capture broader topical relationships, while smaller windows tend to capture tighter syntactic dependencies.

With the rise of transformer-based architectures, the concept expanded significantly. Rather than a fixed local neighborhood, the context window now refers to the total sequence length a model can attend to at once — encompassing the entire prompt, conversation history, retrieved documents, or any other input fed to the model. Transformers use self-attention mechanisms to relate every token in this window to every other token, allowing the model to draw on long-range dependencies rather than just immediate neighbors. This makes the size of the context window a critical architectural parameter: too small, and the model loses track of earlier information; too large, and computational costs grow quadratically with sequence length.

Context window size has become a major axis of competition among large language models. Early GPT models supported windows of around 2,000 tokens, while more recent systems support hundreds of thousands of tokens or more, enabling tasks like analyzing entire codebases, legal documents, or books in a single pass. Techniques such as rotary positional embeddings, sliding window attention, and sparse attention patterns have been developed specifically to extend effective context length without prohibitive memory costs.

The practical implications are substantial. A longer context window allows models to maintain coherence across extended conversations, follow complex multi-step instructions, and perform in-context learning from many examples simultaneously. It also reduces the need for external retrieval systems in some applications. As a result, context window capacity has become one of the most closely watched specifications when evaluating modern language models for real-world deployment.

Related

Related

Infinite Context Window
Infinite Context Window

A model architecture that can attend to all preceding tokens without fixed length limits.

Generality: 398
Long-Context Modeling
Long-Context Modeling

Architectures and techniques enabling AI models to process and reason over very long sequences.

Generality: 694
Context Anxiety
Context Anxiety

The degraded performance of language models as inputs approach their maximum context length.

Generality: 94
Context Compaction
Context Compaction

Compressing or summarizing context to fit within a model's limited context window.

Generality: 339
Contextual Embedding
Contextual Embedding

Word representations that dynamically shift meaning based on surrounding context.

Generality: 752
Lost-in-the-Middle
Lost-in-the-Middle

LLMs systematically underuse information positioned in the middle of long contexts.

Generality: 104