Probabilistic models representing data as a weighted mixture of Gaussian distributions.
A Gaussian Mixture Model is a probabilistic framework that assumes observed data is generated from a combination of several Gaussian distributions, each representing a distinct subpopulation or cluster. Rather than assigning each data point to a single hard cluster, GMMs treat cluster membership as a latent variable with probabilistic weights, allowing points to belong to multiple clusters with varying degrees of confidence. The model is fully described by three sets of parameters for each component: the mean vector, the covariance matrix, and the mixing coefficient that controls how much each Gaussian contributes to the overall distribution.
Parameter estimation in GMMs is almost universally performed using the Expectation-Maximization (EM) algorithm. The process alternates between two steps: the E-step computes the posterior probability that each data point belongs to each Gaussian component given the current parameters, and the M-step updates the means, covariances, and mixing weights to maximize the expected log-likelihood computed in the E-step. This iterative procedure is guaranteed to converge to a local maximum of the likelihood, though initialization matters significantly and multiple restarts are often used in practice to avoid poor local optima.
GMMs are valued for their flexibility compared to simpler clustering methods like k-means. Because each component can have its own full covariance matrix, GMMs can capture clusters of varying sizes, orientations, and elliptical shapes rather than assuming spherical clusters. This makes them well-suited for density estimation tasks where the underlying data distribution is complex and multimodal. Model selection — choosing the number of components — is typically guided by information criteria such as BIC or AIC, or by cross-validation.
Beyond clustering and density estimation, GMMs appear throughout machine learning in applications including anomaly detection, generative modeling, speaker recognition, and as prior distributions in Bayesian models. They also serve as a foundational building block in more complex architectures, such as the mixture of experts framework and variational autoencoders with mixture priors. Their combination of interpretability, mathematical tractability, and expressive power has kept GMMs a staple tool in both research and applied machine learning.