A probabilistic model representing data as drawn from multiple component distributions.
A mixture model is a probabilistic framework that assumes observed data originates from a combination of several underlying probability distributions, each corresponding to a distinct subpopulation or latent cluster. Rather than fitting a single distribution to all the data, the model posits that each data point was generated by first selecting one of K component distributions according to a set of mixing weights, and then drawing a sample from that chosen distribution. This structure makes mixture models powerful tools for capturing multimodal, heterogeneous, or otherwise complex data that no single distribution can adequately describe.
The most widely used variant is the Gaussian Mixture Model (GMM), which represents each component as a multivariate Gaussian characterized by its own mean vector and covariance matrix. Parameter estimation is typically performed using the Expectation-Maximization (EM) algorithm, an iterative procedure that alternates between computing the probability that each data point belongs to each component (the E-step) and updating the component parameters and mixing weights to maximize the expected log-likelihood (the M-step). This process converges to a local maximum of the likelihood, making initialization strategies such as k-means seeding practically important. Beyond Gaussians, mixture models can incorporate other component families—Bernoulli mixtures for binary data, Dirichlet mixtures for text, or even non-parametric components in Bayesian nonparametric settings like the Dirichlet Process Mixture Model.
Mixture models occupy a central role in unsupervised learning, density estimation, and generative modeling. In clustering, they provide a soft, probabilistic alternative to hard-assignment methods like k-means, assigning each observation a posterior probability of membership in each cluster. In density estimation, they offer a flexible, interpretable approximation to arbitrary continuous distributions. They also underpin more complex architectures: mixture-of-experts layers in deep learning, for instance, route inputs through specialized sub-networks using a gating mechanism that mirrors the mixture model formulation. Their combination of mathematical tractability, interpretability, and expressive power has made them a foundational tool across statistics, machine learning, and data analysis.