Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Observatory
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Context Compaction

Context Compaction

Compressing or summarizing context to fit within a model's limited context window.

Year: 2023Generality: 339
Back to Vocab

Context compaction is a technique used in large language model (LLM) systems to reduce the size of an accumulated conversation or document context so that it fits within the model's fixed context window while preserving as much relevant information as possible. Because transformer-based models can only attend to a finite number of tokens at once — ranging from a few thousand to hundreds of thousands depending on the architecture — long-running interactions, agentic workflows, or large document processing tasks inevitably risk exceeding this limit. Context compaction addresses this constraint by intelligently condensing prior content before it is truncated or lost.

The most common approach to context compaction involves using the model itself to generate a summary of earlier portions of the conversation or task history, then replacing those raw tokens with the compressed summary. This recursive summarization can be triggered automatically when the context approaches a threshold, or it can be applied in structured layers — for example, summarizing individual tool call results before aggregating them into a higher-level task summary. Other strategies include selective retention, where only the most semantically relevant or recently accessed information is kept verbatim, and retrieval-augmented approaches that offload older context to an external memory store and retrieve it on demand.

Context compaction is particularly critical in agentic and multi-step reasoning systems, where an AI agent may execute dozens or hundreds of tool calls, accumulate observations, and maintain a running plan over an extended session. Without compaction, these systems either fail when the context limit is hit or must discard valuable history arbitrarily. Effective compaction strategies allow agents to operate over much longer horizons, maintain coherent task state, and avoid repeating work already completed. The quality of compaction directly affects downstream performance — overly aggressive compression can cause the model to lose critical details, while insufficient compression fails to solve the underlying problem.

As context windows have grown larger, the need for compaction has shifted rather than disappeared. Longer windows increase computational cost quadratically due to the attention mechanism, making efficient context management an economic and latency concern even when hard limits are not reached. Context compaction thus remains an active area of research and engineering, intersecting with topics like memory-augmented neural networks, retrieval-augmented generation, and efficient attention mechanisms.

Related

Related

Context Anxiety
Context Anxiety

The degraded performance of language models as inputs approach their maximum context length.

Generality: 94
Context Window
Context Window

The span of text a model can see and process at one time.

Generality: 731
Long-Context Modeling
Long-Context Modeling

Architectures and techniques enabling AI models to process and reason over very long sequences.

Generality: 694
Infinite Context Window
Infinite Context Window

A model architecture that can attend to all preceding tokens without fixed length limits.

Generality: 398
Context Rot
Context Rot

Gradual degradation of an AI system's context, producing stale or contradictory outputs over time.

Generality: 107
Model Compression
Model Compression

Techniques that shrink machine learning models while preserving predictive accuracy.

Generality: 795