A probabilistic model where each state depends only on the immediately preceding state.
A Markov chain is a mathematical framework for modeling sequences of events where the probability of transitioning to any future state depends solely on the current state, not on the history of how that state was reached. This property, known as the Markov property or "memorylessness," makes these models both analytically tractable and computationally efficient. Markov chains can be discrete or continuous in time, and their behavior is fully characterized by a transition matrix (or kernel) that encodes the probabilities of moving between states.
In machine learning, Markov chains appear across a surprisingly wide range of methods. Hidden Markov Models (HMMs) extend the framework to settings where the underlying state is unobserved, enabling applications in speech recognition, genomic sequence analysis, and part-of-speech tagging. Markov chain Monte Carlo (MCMC) methods — including Metropolis-Hastings and Gibbs sampling — use carefully constructed chains to draw samples from complex, high-dimensional probability distributions that would otherwise be intractable. Reinforcement learning also relies on the Markov assumption, modeling environments as Markov Decision Processes (MDPs) where an agent's optimal policy depends only on the current state.
The theoretical appeal of Markov chains lies in their long-run behavior. Under mild conditions, a chain will converge to a unique stationary distribution regardless of its starting state — a property that underpins the correctness of MCMC sampling and informs the design of stable stochastic systems. Concepts like mixing time, ergodicity, and detailed balance are central to understanding how quickly and reliably a chain explores its state space, which has direct practical implications for the efficiency of sampling algorithms.
More recently, Markov chain thinking has influenced the design of generative models. Diffusion models — a state-of-the-art approach for image and audio synthesis — are explicitly formulated as learned reversals of a Markov noising process. This connection demonstrates that the Markov chain, far from being a classical relic, remains a living conceptual tool at the frontier of modern deep learning research.