A data preprocessing technique that removes correlations and normalizes feature scales.
Whitening is a data preprocessing transformation that converts input features into a representation with zero mean, unit variance, and no linear correlations between dimensions. The goal is to produce data whose covariance matrix is the identity matrix — a condition described as "white" by analogy to white noise, which has a flat power spectrum. This standardized form removes redundant statistical structure from the input, placing all features on equal footing before they enter a learning algorithm.
The most common implementations rely on eigendecomposition of the data's covariance matrix. In PCA whitening, the data is first projected onto the principal components and then scaled so each component has unit variance, simultaneously decorrelating and normalizing the features. ZCA (Zero-phase Component Analysis) whitening applies an additional rotation that keeps the transformed data as close as possible to the original space, which can be advantageous when the spatial or semantic structure of inputs matters — as with image pixels. Both approaches require estimating the covariance matrix from training data, which can be expensive or unstable in very high dimensions, motivating approximations and regularization strategies.
Whitening matters in machine learning because poorly scaled or correlated inputs can dramatically slow optimization. When features differ in magnitude or are strongly correlated, the loss landscape becomes elongated and ill-conditioned, causing gradient descent to oscillate or converge slowly. Whitening reshapes this landscape toward a more spherical geometry, allowing larger learning rates and faster convergence. It was particularly influential in early neural network research and remains relevant in unsupported feature learning, generative models, and reinforcement learning settings where input normalization is not otherwise handled.
Modern deep learning has partially displaced explicit whitening with techniques like Batch Normalization, which performs an adaptive, layer-wise normalization during training. Nevertheless, whitening retains practical importance as an offline preprocessing step for shallow models, kernel methods, and scenarios where controlling the statistical properties of inputs is essential for reproducibility and stability.