Amortized Variational Inference

Amortized variational inference is a training technique that uses a learned inference network to approximate the posterior distribution over latent variables, avoiding the need for expensive per-example optimization. In the context of recursive reasoning models like GRAM, it enables training by providing a tractable approximation to the true posterior over reasoning trajectories. The inference network takes observations and produces parameters of an approximate posterior distribution, which is then used to compute evidence lower bound (ELBO) gradients for training the generative model.

The term amortized refers to the key efficiency gain: instead of running an iterative optimization for each training example to find the optimal latent variables, the inference network learns to predict good latent variables from inputs directly. Once trained, inference is a single forward pass — amortizing the cost of inference across all examples. For GRAM, this means the latent reasoning trajectory is produced by a learned inference network rather than optimized from scratch at test time.

In GRAM specifically, amortized variational inference enables training by providing a differentiable lower bound objective that replaces per-trajectory optimization with a forward pass through the inference network. The variational posterior approximates the distribution over reasoning trajectories given inputs and outputs, while the generative model defines the prior over latent trajectories and the likelihood of outputs given trajectories. The ELBO objective balances reconstruction quality with proximity of the posterior to the prior, encouraging diverse and valid reasoning paths.

The main limitation is the amortization gap: the inference network is constrained to produce distributions from a parametric family, which may not perfectly match the true posterior for any given example. This approximation error can cause underestimation of uncertainty and mode collapse in some distributions. More sophisticated inference networks can reduce this gap but increase training complexity. Whether the amortization gap meaningfully degrades performance on reasoning tasks with sparse latent structure is not yet well understood.

Amortized Variational Inference

Research this in Signals

Amortized Variational Inference

Research this in Signals