Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. GLU (Gated Linear Unit)

GLU (Gated Linear Unit)

A gating mechanism that selectively controls information flow through neural network layers.

Year: 2017Generality: 651
Back to Vocab

A Gated Linear Unit (GLU) is a neural network building block that uses a learned gating mechanism to regulate which information passes through a layer. The operation works by splitting an input tensor into two equal halves along the feature dimension: one half undergoes a standard linear transformation, while the other is passed through a sigmoid activation function to produce values between 0 and 1. These two outputs are then combined via element-wise multiplication, so the sigmoid-activated half acts as a soft gate that amplifies or suppresses individual features of the linearly transformed half. This selective filtering allows the network to dynamically emphasize relevant signals and suppress noise, without the computational overhead of more complex recurrent architectures.

GLUs were introduced in the 2017 paper "Language Modeling with Gated Convolutional Networks" by Yann Dauphin and colleagues, where they demonstrated that convolutional networks equipped with gating could match or outperform recurrent models on large-scale language modeling benchmarks. A key practical advantage was that, unlike LSTMs or GRUs, GLU-based convolutional networks could be parallelized efficiently during training, making them significantly faster. The gating mechanism also helps mitigate the vanishing gradient problem in deep networks, since gradients can flow more directly through the linear pathway during backpropagation.

Since their introduction, GLUs and their variants have become widely adopted across modern deep learning architectures. The SwiGLU variant — which replaces the sigmoid gate with the Swish activation function — has been incorporated into large language models such as LLaMA and PaLM, where it consistently improves performance over standard feed-forward layers. These variants follow the same structural logic but tune the nonlinearity of the gate to better suit the optimization landscape of very deep transformers.

GLUs matter because they offer a principled, parameter-efficient way to introduce conditional computation into neural networks. Rather than processing all features uniformly, a GLU layer learns to route information based on context, improving both representational capacity and training stability. Their compatibility with modern hardware and their demonstrated gains in large-scale language modeling have made them a standard component in state-of-the-art model designs.

Related

Related

Gating Mechanism
Gating Mechanism

A learned control system that selectively regulates information flow through a neural network.

Generality: 781
ReLU (Rectified Linear Unit)
ReLU (Rectified Linear Unit)

An activation function that outputs its input if positive, otherwise zero.

Generality: 816
LSTM (Long Short-Term Memory)
LSTM (Long Short-Term Memory)

A recurrent neural network architecture that learns long-range dependencies in sequential data.

Generality: 838
Semantic Logic Gates
Semantic Logic Gates

Neural components that perform logical operations directly over distributed semantic representations.

Generality: 293
RNN (Recurrent Neural Network)
RNN (Recurrent Neural Network)

Neural networks with feedback connections that process sequential data using internal memory.

Generality: 838
LNN (Liquid Neural Network)
LNN (Liquid Neural Network)

A recurrent neural network that continuously adapts its internal state to process time-varying data.

Generality: 339