The logarithm of a likelihood function, simplifying probabilistic model optimization and parameter estimation.
Log likelihood is the natural logarithm of the likelihood function — a measure of how probable observed data is under a given set of model parameters. In statistical modeling and machine learning, the likelihood function is typically expressed as a product of probabilities across many data points, which can become numerically unwieldy for large datasets. Taking the logarithm converts this product into a sum, dramatically improving numerical stability and making the function far easier to work with analytically and computationally. Because the logarithm is a monotonically increasing function, maximizing the log likelihood is mathematically equivalent to maximizing the likelihood itself, preserving the same optimal parameter values.
Log likelihood sits at the heart of maximum likelihood estimation (MLE), the dominant framework for fitting probabilistic models to data. In practice, many optimization algorithms minimize a loss function rather than maximize an objective, so practitioners often work with the negative log likelihood (NLL) as a loss. Gradient-based methods like stochastic gradient descent can then efficiently minimize NLL by computing its derivatives with respect to model parameters. This connection makes log likelihood directly relevant to training a wide range of models, from logistic regression and Gaussian mixture models to hidden Markov models and deep neural networks with probabilistic output layers such as softmax classifiers.
Beyond parameter estimation, log likelihood serves as a principled tool for model comparison and evaluation. Metrics like the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are built on log likelihood values, penalizing model complexity to guard against overfitting. In deep learning, cross-entropy loss — ubiquitous in classification tasks — is mathematically equivalent to minimizing the negative log likelihood under a categorical distribution, illustrating how foundational this concept is across modern machine learning. Its combination of mathematical elegance, computational tractability, and theoretical grounding makes log likelihood one of the most pervasive ideas in both classical statistics and contemporary AI.