The span of text a model can see and process at one time.
A context window defines the maximum amount of text — measured in tokens — that a language model can process in a single forward pass. For early neural language models and word embedding methods like Word2Vec, the context window was a fixed, symmetric neighborhood of words surrounding a target word, typically spanning a few tokens in each direction. The model would use these neighboring words to learn or infer the meaning of the central term. Larger windows capture broader topical relationships, while smaller windows tend to capture tighter syntactic dependencies.
With the rise of transformer-based architectures, the concept expanded significantly. Rather than a fixed local neighborhood, the context window now refers to the total sequence length a model can attend to at once — encompassing the entire prompt, conversation history, retrieved documents, or any other input fed to the model. Transformers use self-attention mechanisms to relate every token in this window to every other token, allowing the model to draw on long-range dependencies rather than just immediate neighbors. This makes the size of the context window a critical architectural parameter: too small, and the model loses track of earlier information; too large, and computational costs grow quadratically with sequence length.
Context window size has become a major axis of competition among large language models. Early GPT models supported windows of around 2,000 tokens, while more recent systems support hundreds of thousands of tokens or more, enabling tasks like analyzing entire codebases, legal documents, or books in a single pass. Techniques such as rotary positional embeddings, sliding window attention, and sparse attention patterns have been developed specifically to extend effective context length without prohibitive memory costs.
The practical implications are substantial. A longer context window allows models to maintain coherence across extended conversations, follow complex multi-step instructions, and perform in-context learning from many examples simultaneously. It also reduces the need for external retrieval systems in some applications. As a result, context window capacity has become one of the most closely watched specifications when evaluating modern language models for real-world deployment.