LLMs systematically underuse information positioned in the middle of long contexts.
Lost-in-the-Middle is a documented failure mode in large language models where performance degrades when relevant information is located in the middle of a long input sequence, as opposed to near the beginning or end. Empirical studies have shown that models consistently perform better on retrieval and reasoning tasks when the key information appears at the edges of the context window, suggesting that attention mechanisms do not distribute focus uniformly across long inputs. The phenomenon was formally characterized in a 2023 paper by Liu et al., which systematically tested models across varying context lengths and key-information positions, finding a pronounced U-shaped performance curve.
The root cause lies in how transformer attention operates over long sequences. While transformers theoretically attend to all positions, in practice the attention weights learned during training tend to favor recency and primacy — patterns reinforced by the structure of training data and the way positional encodings interact with attention scores. As context length grows, the signal from middle tokens becomes increasingly diluted relative to tokens at the boundaries, making it harder for the model to surface and integrate that information during generation.
This limitation has significant practical consequences for applications that depend on processing long documents, such as multi-document question answering, legal and scientific document analysis, retrieval-augmented generation, and long-form summarization. When a system retrieves multiple passages and concatenates them as context, the ordering of those passages can dramatically affect answer quality — a subtle but critical engineering consideration that is easy to overlook.
Addressing Lost-in-the-Middle has become an active area of research. Proposed mitigations include reranking retrieved passages to place the most relevant content at context boundaries, training models with explicit objectives that reward attending to middle positions, and architectural modifications such as sliding-window attention or memory-augmented transformers. As context windows continue to expand — reaching hundreds of thousands of tokens in some models — ensuring uniform and reliable utilization of the full context remains an open and important challenge.