A probabilistic framework turning recursive latent reasoning into multi-trajectory computation.
GRAM is a framework that reimagines recursive reasoning as a stochastic latent trajectory, replacing the deterministic state updates of prior Recursive Reasoning Models with probabilistic sampling at each recursion step. Instead of converging to a single solution, GRAM maintains a distribution over reasoning paths, enabling multiple hypotheses to be explored simultaneously. It models p(y|x) by marginalizing over latent trajectories and p(x) for unconditional generation. Trained with amortized variational inference, it scales inference time via both recursion depth and parallel trajectory sampling.
The core mechanism introduces stochasticity into the latent state transition: at each step the model samples from a transition distribution conditioned on the input and current state, rather than computing a single deterministic update. This produces a distribution over reasoning trajectories, each representing a distinct solution strategy. Multiple trajectories can be sampled in parallel at inference time, effectively trading compute for coverage of the solution space. The latent process reward model (LPRM) provides step-wise guidance to distinguish productive from unproductive reasoning paths.
GRAM improves over deterministic RRMs (HRM, TRM) on structured reasoning and constraint satisfaction tasks where multiple valid solutions exist. It demonstrates unconditional generation capability — generating valid puzzle instances without input — a capability prior RRMs lack. The approach trades some per-trajectory efficiency for broader solution coverage: each individual trajectory is stochastic and may require more steps to converge, but parallel sampling finds alternatives that deterministic models entirely miss.
Whether stochastic recursion consistently outperforms carefully designed deterministic recursion on tasks with single solutions remains unclear. The unconditional generation capability — generating valid structured problem instances from scratch — is underexplored and may require architectural innovations beyond what GRAM currently demonstrates. Scaling via parallel trajectories introduces memory and compute costs that grow linearly with trajectory count, potentially limiting practical deployment on memory-constrained devices.