Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Attention Budget

Attention Budget

Each token has a fixed capacity to influence others, which shrinks as context grows.

Back to Vocab

An attention budget is the finite amount of influence each token in a context window can distribute across all other tokens. When a model processes a token, it allocates some portion of its representational capacity to attend to related tokens — prior context, instructions, referenced files, or conversation history. That allocation is fixed per token, independent of how long the overall context grows.

The mechanism stems from how transformer attention works: every token attends to every other token in the context window, but the information any single token can transmit is bounded. Heavy attention drawn by one relationship — say, a long thread of tool results — leaves proportionally less capacity for other relationships, such as a schema reference pasted at the top of the prompt.

This creates a practical tradeoff as sessions grow. Early in a conversation, the model's attention is sharply focused on the most recent instructions. Later, older tokens compete for that same fixed budget, and the effective signal-to-noise ratio drops. This is why long agent sessions often exhibit degraded performance on tasks that require recalling early context.

Open questions include whether sparse attention mechanisms can selectively budget attention toward higher-value relationships, and whether session-level memory systems can offload low-value tokens to preserve per-token budget efficiency. The field has no established standard for measuring or communicating attention budget constraints to users.