A generative neural network built from stacked Restricted Boltzmann Machines trained layer by layer.
A Deep Belief Network (DBN) is a generative probabilistic model composed of multiple layers of latent variables, typically implemented as a stack of Restricted Boltzmann Machines (RBMs). Each RBM layer learns to represent the output of the layer below it, capturing increasingly abstract features of the input data. The top two layers form an associative memory with undirected connections, while the lower layers use directed, top-down connections to decode representations back into observable data. This architecture allows DBNs to model the joint distribution between observed data and the many layers of hidden features that explain it.
Training a DBN proceeds in two phases. First, each RBM layer is trained greedily and independently using contrastive divergence, an efficient approximation to maximum likelihood learning. Once all layers are pre-trained in this unsupervised fashion, the entire network can be fine-tuned using backpropagation with labeled data if a supervised task is desired. This pre-training strategy was a breakthrough because it provided a principled way to initialize deep networks, circumventing the vanishing gradient problem that had made training deep architectures notoriously difficult throughout the 1990s and early 2000s.
DBNs matter historically because Geoffrey Hinton and Ruslan Salakhutdinov's 2006 paper demonstrated that deep networks could be trained effectively, reigniting serious interest in deep learning at a time when shallow models dominated the field. The greedy layer-wise pre-training strategy they introduced influenced a generation of deep learning research and helped establish that depth itself was a valuable architectural property worth pursuing.
While DBNs have been largely superseded in practice by convolutional networks, recurrent architectures, and transformer models — all of which benefit from improved optimizers, activation functions, and massive labeled datasets — they remain conceptually important. They demonstrated the power of unsupervised pre-training, contributed foundational ideas about generative modeling, and helped catalyze the modern deep learning era. Their influence is still visible in contemporary generative models and representation learning research.