Reasoning approach exploring multiple parallel latent trajectories to cover diverse solutions.
Multi-trajectory reasoning is a computational strategy for problems with multiple valid solutions, where a single reasoning system explores several latent trajectories in parallel rather than committing to one path. Each trajectory represents an independent hypothesis about how to reach a valid solution, and trajectories can differ in their intermediate states, strategy, and final answer. The approach is particularly relevant for constraint satisfaction problems where multiple configurations satisfy all constraints.
The mechanism works by maintaining multiple independent reasoning states simultaneously and sampling transitions stochastically for each. At each recursion step, each trajectory updates its own latent state conditioned on the input and current state. Trajectories can converge to different solutions (multi-solution tasks) or diverge and explore different strategies before potentially reconverging. Parallel inference naturally exposes solution diversity: if 3 of 10 trajectories reach valid solutions, the system finds 3 solutions rather than 1.
Multi-trajectory reasoning is enabled by stochastic recursion: deterministic RRMs collapse all trajectories to the same attractor state, meaning running the model 100 times with different random seeds produces identical solutions. Stochastic RRMs like GRAM produce genuinely different trajectories from each run, and sampling many trajectories in parallel trades compute for coverage. This is especially valuable for tasks where the solution space has multiple global optima or where finding any valid solution requires exploring different constraint propagation orders.
The primary tradeoff is computational: each additional trajectory requires memory proportional to the latent state size and compute proportional to the recursion depth. On memory-constrained hardware, trajectory count may be limited. A subtler issue is trajectory quality: stochastic sampling may produce some trajectories that take unproductive paths and fail to converge, requiring more samples to achieve a target solution rate. How to allocate compute between deeper recursion and more trajectories remains a task-dependent design decision with no established optimal policy.