Generative neural networks that learn probability distributions over input data using two layers.
A Restricted Boltzmann Machine (RBM) is a two-layer generative neural network consisting of a visible layer, which represents observed data, and a hidden layer, which captures latent features. Unlike general Boltzmann machines, RBMs restrict connections so that no units within the same layer are linked to each other — only cross-layer connections exist. This bipartite structure forms an undirected graphical model where the network learns to represent the joint probability distribution of the input data, enabling both feature extraction and data generation.
Training an RBM involves adjusting weights so the model assigns high probability to observed training examples. The primary algorithm used is contrastive divergence (CD), an approximation of maximum likelihood learning that alternates between positive and negative phases. In the positive phase, the hidden units are inferred from real data; in the negative phase, the network reconstructs the visible layer from the hidden activations and then re-infers the hidden layer. The difference between these two phases drives the weight updates, allowing the model to gradually learn the underlying data distribution without requiring expensive exact inference.
RBMs gained significant traction in the mid-2000s when Geoffrey Hinton and colleagues demonstrated that stacking multiple RBMs could initialize deep belief networks (DBNs) more effectively than random weight initialization. This greedy layer-wise pretraining strategy helped overcome the vanishing gradient problem that had stymied deep network training, briefly making RBMs central to the deep learning renaissance. Beyond pretraining, RBMs have been applied to dimensionality reduction, collaborative filtering (notably in Netflix Prize competition entries), topic modeling, and classification tasks.
While RBMs have been largely supplanted by variational autoencoders and other modern generative models for most practical applications, they remain conceptually important. They represent an early and influential example of unsupervised representation learning and probabilistic generative modeling in neural networks, and studying them provides foundational intuition for energy-based models, latent variable models, and the broader landscape of deep generative modeling.