Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Observatory
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Inference-Time Scaling

Inference-Time Scaling

Increasing compute at inference via parallel trajectories or recursion depth rather than model size.

Year: 2025Generality: 700
Back to Vocab

Inference-time scaling refers to improving model performance by allocating additional computational resources during inference rather than before inference via training on larger models or more data. In the context of reasoning systems, this typically means increasing the number of computational steps (recursion depth) or the number of parallel reasoning paths (trajectory sampling) at test time. The goal is to extract more capable reasoning from a fixed model by using it more intensively.

For autoregressive language models, inference-time scaling has been explored via chain-of-thought prompting, self-consistency voting, and test-time compute allocation strategies. For recursive reasoning models, inference-time scaling operates differently: depth scaling increases the number of shared transition function applications, while trajectory scaling increases the number of independently sampled latent paths explored in parallel. Both trade latency for reasoning quality, but trajectory scaling provides solution diversity that depth scaling alone cannot.

GRAM demonstrates that inference-time scaling for recursive reasoning has two independent axes that can be combined: recursion depth (more steps per trajectory) and trajectory count (more parallel trajectories). Increasing either axis improves performance, but with different tradeoffs. Depth scaling is more memory-efficient but hits diminishing returns when the model begins looping or converging. Trajectory scaling provides orthogonal gains — more trajectories means more chances to find alternative solutions — but grows memory linearly with trajectory count.

The key insight enabling inference-time scaling for RRMs is that the transition function is shared across all recursion steps and all trajectories. This means adding compute does not require adding parameters. However, inference-time scaling is not free: deeper recursion increases latency proportionally, and parallel trajectories increase memory usage proportionally. Whether inference-time compute can substitute for model scale remains an open empirical question with implications for deploying capable reasoning systems on resource-constrained hardware.