The degraded performance of language models as inputs approach their maximum context length.
Context anxiety refers to the phenomenon where large language models (LLMs) exhibit declining reliability, coherence, or accuracy as the amount of input text approaches or fills their maximum context window. Rather than maintaining uniform performance across all positions within a context, models tend to struggle with retrieval, reasoning, and instruction-following when the context is densely packed or extremely long. The term borrows from human psychology to describe a kind of "overload" state in which the model's effective capacity to process and integrate information degrades under the pressure of a full or near-full context.
The underlying mechanics relate to how attention mechanisms in transformer-based models distribute focus across tokens. Research has shown that models often exhibit a "lost in the middle" effect, where information placed in the center of a long context is recalled less reliably than information near the beginning or end. As context length grows, the quadratic complexity of standard self-attention means that each token must attend to an ever-larger set of other tokens, diluting the signal-to-noise ratio and making it harder for the model to surface the most relevant information. Positional encodings, which help models understand token order, can also become less reliable at extreme lengths if the model was not sufficiently trained on very long sequences.
Context anxiety has practical implications for applications like retrieval-augmented generation (RAG), long-document summarization, multi-turn dialogue, and agentic workflows that accumulate large amounts of tool output or conversation history. Developers must account for the fact that simply extending a model's context window does not guarantee proportional gains in usable performance; the effective context — the portion the model can reliably reason over — may be considerably shorter than the technical maximum.
Mitigations include careful chunking and ordering of retrieved content, prompt compression techniques, hierarchical summarization, and training models with specific long-context objectives such as those used in models like Claude or Gemini. The concept has become increasingly important as context windows have expanded from thousands to millions of tokens, making the quality of attention across that full span a central engineering and research challenge.