Energy-based models

Energy-based models

A modeling framework that assigns a scalar energy to each configuration of variables and defines preferences, inference, or unnormalized probabilities via energy minimization rather than an explicit normalized likelihood.

A modeling framework that assigns a scalar energy to each configuration of variables and defines preferences, inference, or unnormalized probabilities via energy minimization rather than an explicit normalized likelihood.

Energy-based models (EBMs) formalize learning and inference by specifying an energy function E(x, y; θ) (or E(x; θ) for unsupervised cases) that scores how compatible a configuration (data x and possibly label/latent y) is under parameters θ; low energy corresponds to high preference. Probabilities are given implicitly as p(x) ∝ exp(−E(x; θ)), exposing the partition function Z(θ) = ∫ exp(−E) dx which is typically intractable and drives much of the methodological development (contrastive objectives, score matching, noise-contrastive estimation, and MCMC or Langevin-based sampling). In ML (Machine Learning) and AI (Artificial Intelligence) applications EBMs are attractive because they can represent highly multimodal, structured, and constraint-rich distributions, serve as discriminators or priors, and compose naturally (energies add) for modular systems. Practical success depends on approximations for training and sampling (e.g., persistent contrastive divergence, SGLD, score-based estimators), architectural choices that make gradients stable, and designing contrastive or margin-based losses that sidestep explicit normalization; these trade-offs explain both the long-standing theoretical interest in EBMs and the recent empirical revivals in deep generative modeling and structured prediction.

First conceptual incarnations date to the early 1980s (Hopfield networks, 1982) and the Boltzmann machine formalized in the mid-1980s (Ackley, Hinton & Sejnowski, 1985); the explicit “energy-based model” framing was consolidated in the mid-2000s (notably LeCun et al.’s tutorial and related work), with a major modern resurgence around 2018–2022 driven by advances in score matching, diffusion/score-based generative models, and scalable MCMC samplers.

Key contributors include John Hopfield (associative memory and energy formulations), David Ackley / Geoffrey Hinton / Terry Sejnowski (Boltzmann machines), Geoffrey Hinton (contrastive divergence), Yann LeCun and collaborators (formal EBM tutorial and promotion of EBM frameworks), Aapo Hyvärinen (score matching), Michael Gutmann (noise-contrastive estimation), Jascha Sohl-Dickstein and contemporaries (diffusion approaches), and more recent contributors such as Yang Song and Stefano Ermon for score-based generative modeling and groups at FAIR, DeepMind, and major research labs that advanced deep EBM training and applications.

Related