A loss function measuring average squared differences between predicted and actual values.
Mean Squared Error (MSE) is one of the most widely used loss functions and evaluation metrics in machine learning, quantifying how well a model's predictions align with observed data. It is computed by taking the difference between each predicted value and its corresponding ground truth, squaring those differences, and averaging them across all samples. The squaring operation serves two purposes: it ensures all error terms are non-negative, and it disproportionately penalizes large deviations, making MSE especially sensitive to outliers and significant prediction mistakes.
In practice, MSE plays a central role in training regression models. During optimization, algorithms such as gradient descent minimize MSE by iteratively adjusting model parameters in the direction that reduces the average squared error. Because MSE is differentiable everywhere, it integrates cleanly into backpropagation pipelines, making it a natural fit for neural networks tackling regression tasks. Its mathematical properties — convexity for linear models and smooth gradients — make convergence well-behaved in many settings.
The choice of MSE carries meaningful implications for model behavior. Because errors are squared, a single large outlier can dominate the loss signal and pull parameter updates disproportionately toward correcting that one example. This sensitivity is a double-edged sword: it encourages precision in high-stakes predictions but can destabilize training when data contains noise or label errors. Alternatives like Mean Absolute Error (MAE) or Huber loss are often preferred when robustness to outliers is a priority, while MSE remains the default when large errors are genuinely more costly than small ones.
Beyond training, MSE is a standard benchmark metric for comparing model performance on held-out test sets, often reported alongside its square root — Root Mean Squared Error (RMSE) — which restores the error to the original units of the target variable and is more interpretable in applied contexts. From linear regression to deep learning, MSE remains a foundational tool that connects statistical estimation theory to modern machine learning practice.