A bound on model evidence used to approximate intractable posterior distributions efficiently.
Variational free energy is a mathematical quantity that serves as a tractable upper bound on the negative log-evidence (or "surprise") of a probabilistic model. Rather than computing the true posterior distribution directly—which is often computationally intractable for complex models—variational inference minimizes the free energy with respect to an approximate distribution from a simpler, parameterized family. This minimization is equivalent to minimizing the Kullback-Leibler divergence between the approximate and true posterior, making variational free energy the central objective in a wide class of Bayesian learning algorithms.
The mechanics of variational free energy decompose naturally into two competing terms: an accuracy term that rewards the model for explaining observed data well, and a complexity term (related to the KL divergence from the prior) that penalizes overly complex posterior beliefs. This tradeoff mirrors the bias-variance tradeoff in classical statistics and provides a principled way to balance model fit against generalization. In practice, optimizing this objective with respect to neural network parameters yields algorithms like the variational autoencoder (VAE), where the encoder learns an approximate posterior and the decoder reconstructs data from sampled latent variables.
Variational free energy gained particular prominence in machine learning following the introduction of scalable stochastic variational inference methods and the reparameterization trick in the early 2010s, which made it feasible to optimize variational objectives end-to-end using gradient descent on large datasets. These advances transformed variational inference from a niche Bayesian technique into a cornerstone of modern deep generative modeling, enabling applications in image synthesis, representation learning, and semi-supervised classification.
Beyond deep learning, variational free energy has been adopted as a unifying principle in computational neuroscience and cognitive science, where Karl Friston's "free energy principle" proposes that biological brains minimize variational free energy as a general account of perception, action, and learning. Whether in artificial or biological systems, the framework's power lies in recasting intractable probabilistic inference as a tractable optimization problem, connecting information theory, statistical physics, and machine learning under a single mathematical umbrella.