A normalization technique that stabilizes neural network training by standardizing each layer's inputs.
Layer Normalization (LN) is a technique used in deep learning to stabilize and accelerate the training of neural networks. Unlike Batch Normalization, which normalizes across the batch dimension, LN normalizes across the feature dimension within a single training example — computing the mean and variance over all neurons in a given layer independently for each input. This makes it entirely agnostic to batch size, a significant practical advantage in settings where large batches are infeasible or undesirable.
The mechanics are straightforward: for each input to a layer, LN computes the mean and standard deviation of the activations, then rescales them using learned parameters (gain and bias). This process keeps activations in a stable numerical range throughout training, reducing sensitivity to weight initialization and helping to prevent vanishing or exploding gradients. Because normalization happens per-sample rather than per-batch, the statistics are consistent between training and inference, eliminating the need for running averages maintained during training.
LN is especially well-suited to sequence models. In Recurrent Neural Networks (RNNs), batch statistics are difficult to compute reliably across variable-length sequences, making Batch Normalization awkward to apply. LN sidesteps this entirely. Its importance grew further with the rise of the Transformer architecture, where it became a standard component — applied either before or after attention and feed-forward sublayers — and is now ubiquitous in large language models and other attention-based systems.
The broader significance of Layer Normalization lies in its generality and simplicity. It imposes minimal assumptions about data distribution or batch composition, making it applicable across a wide range of architectures and tasks. As models have scaled to billions of parameters and training runs have grown more expensive, techniques that improve stability without adding complexity have become increasingly valuable. LN has proven to be one of the most reliable such tools in the modern deep learning toolkit.