Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. LN (Layer Normalization)

LN (Layer Normalization)

A normalization technique that stabilizes neural network training by standardizing each layer's inputs.

Year: 2016Generality: 731
Back to Vocab

Layer Normalization (LN) is a technique used in deep learning to stabilize and accelerate the training of neural networks. Unlike Batch Normalization, which normalizes across the batch dimension, LN normalizes across the feature dimension within a single training example — computing the mean and variance over all neurons in a given layer independently for each input. This makes it entirely agnostic to batch size, a significant practical advantage in settings where large batches are infeasible or undesirable.

The mechanics are straightforward: for each input to a layer, LN computes the mean and standard deviation of the activations, then rescales them using learned parameters (gain and bias). This process keeps activations in a stable numerical range throughout training, reducing sensitivity to weight initialization and helping to prevent vanishing or exploding gradients. Because normalization happens per-sample rather than per-batch, the statistics are consistent between training and inference, eliminating the need for running averages maintained during training.

LN is especially well-suited to sequence models. In Recurrent Neural Networks (RNNs), batch statistics are difficult to compute reliably across variable-length sequences, making Batch Normalization awkward to apply. LN sidesteps this entirely. Its importance grew further with the rise of the Transformer architecture, where it became a standard component — applied either before or after attention and feed-forward sublayers — and is now ubiquitous in large language models and other attention-based systems.

The broader significance of Layer Normalization lies in its generality and simplicity. It imposes minimal assumptions about data distribution or batch composition, making it applicable across a wide range of architectures and tasks. As models have scaled to billions of parameters and training runs have grown more expensive, techniques that improve stability without adding complexity have become increasingly valuable. LN has proven to be one of the most reliable such tools in the modern deep learning toolkit.

Related

Related

Layer Normalization
Layer Normalization

Normalizes activations across features within a layer to stabilize neural network training.

Generality: 731
Batch Normalization
Batch Normalization

A technique that normalizes layer inputs to accelerate and stabilize neural network training.

Generality: 794
LNN (Liquid Neural Network)
LNN (Liquid Neural Network)

A recurrent neural network that continuously adapts its internal state to process time-varying data.

Generality: 339
Variance Scaling
Variance Scaling

A weight initialization strategy that preserves consistent activation variance across neural network layers.

Generality: 620
Normalizing Flows
Normalizing Flows

Generative models that learn complex distributions via composed invertible transformations with exact likelihoods.

Generality: 694
nGPT (Normalized Transformer)
nGPT (Normalized Transformer)

A transformer variant that normalizes representations on a hypersphere for faster, more stable training.

Generality: 101