An iterative algorithm that estimates model parameters when latent variables are present.
Expectation-Maximization (EM) is an iterative optimization algorithm used to find maximum likelihood estimates of parameters in probabilistic models that involve unobserved, or latent, variables. Because directly maximizing the likelihood function is often intractable when hidden variables are present, EM sidesteps this difficulty by alternating between two steps: inferring the likely values of the latent variables given current parameter estimates, and then updating the parameters to better explain the data under those inferences. This cycle continues until the estimates converge to a stable solution.
The algorithm proceeds in two distinct phases per iteration. In the E-step (Expectation), the algorithm computes the expected value of the log-likelihood with respect to the conditional distribution of the latent variables, effectively filling in soft assignments or probabilities for the hidden structure. In the M-step (Maximization), the parameters are updated to maximize this expected log-likelihood, treating the soft assignments as fixed. A key theoretical guarantee is that each full iteration is monotonically non-decreasing in likelihood, meaning the algorithm will never make the fit worse, though it may converge to a local rather than global maximum.
EM finds wide application across machine learning and statistics. Its most prominent use is fitting Gaussian Mixture Models (GMMs), where the latent variable indicates which Gaussian component generated each data point. The E-step assigns probabilistic cluster memberships, and the M-step re-estimates each component's mean, covariance, and mixing weight. Beyond GMMs, EM underpins training for Hidden Markov Models, probabilistic latent semantic analysis, and many Bayesian network inference procedures. It also connects deeply to variational inference, where the E-step is approximated when exact inference is intractable.
EM's appeal lies in its conceptual simplicity, guaranteed likelihood improvement per iteration, and broad applicability to models with missing or hidden data. Its main limitations are sensitivity to initialization, potential convergence to poor local optima, and slow convergence in high-dimensional settings. Despite these drawbacks, EM remains one of the most widely used algorithms in statistical machine learning, serving as both a practical workhorse and a theoretical foundation for more advanced inference methods.