A stochastic recurrent network that learns probability distributions over binary variables.
A Boltzmann Machine is a type of stochastic recurrent neural network composed of symmetrically connected binary units that can be either "on" or "off." The network is defined by an energy function over its unit states, and the probability of any configuration is determined by a Boltzmann distribution — hence the name. Learning proceeds by adjusting connection weights so that low-energy states correspond to patterns present in the training data, effectively teaching the network to model the underlying data distribution. This makes Boltzmann Machines a foundational example of energy-based generative models in machine learning.
Training a Boltzmann Machine requires estimating two quantities: the average behavior of the network when clamped to real data (the positive phase) and its average behavior when running freely (the negative phase). The difference between these phases drives weight updates. In practice, computing the negative phase requires running a Markov Chain Monte Carlo process — typically Gibbs sampling — until the network reaches thermal equilibrium, a procedure that is computationally expensive and scales poorly to large networks. This limitation motivated the development of the Restricted Boltzmann Machine (RBM), which constrains connections to exist only between visible and hidden layers, making exact inference tractable and enabling the more efficient Contrastive Divergence training algorithm.
Boltzmann Machines matter because they established a principled probabilistic framework for unsupervised representation learning. RBMs derived from this architecture became building blocks for Deep Belief Networks (DBNs), which in the mid-2000s demonstrated that deep generative models could be trained effectively — a breakthrough that helped reignite interest in deep learning. Beyond historical significance, the energy-based modeling perspective pioneered by Boltzmann Machines continues to influence modern architectures, including diffusion models and modern Hopfield networks.
Applications of Boltzmann Machines and their descendants include dimensionality reduction, feature learning, collaborative filtering, and generative modeling of images and text. While largely superseded in practice by variational autoencoders and generative adversarial networks, the conceptual contributions of Boltzmann Machines — particularly the idea of learning a joint probability distribution through energy minimization — remain deeply embedded in contemporary machine learning theory.