Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. xLSTM (Extended Long Short-Term Memory)

xLSTM (Extended Long Short-Term Memory)

A modernized LSTM architecture with exponential gating and parallelizable memory structures.

Year: 2024Generality: 420
Back to Vocab

xLSTM, or Extended Long Short-Term Memory, is a deep learning architecture introduced in 2024 that revisits and substantially upgrades the classical LSTM design to compete with Transformer-based models at scale. Where traditional LSTMs rely on sigmoid-based gating and sequential memory updates that bottleneck parallelization, xLSTM introduces exponential gating with numerical stabilization and two new cell variants — sLSTM and mLSTM — each addressing different computational trade-offs. The sLSTM cell enhances memory mixing through scalar updates, while the mLSTM cell replaces the scalar memory with a fully parallelizable matrix memory structure, enabling efficient training on modern hardware accelerators.

The architectural innovations in xLSTM are motivated by the practical limitations that prevented classical LSTMs from scaling to the billions of parameters now common in large language models. Exponential gating allows the model to revise stored information more aggressively than sigmoid gates permit, improving its ability to correct past memory states — a known weakness of standard LSTMs. The matrix memory in mLSTM dramatically expands the model's storage capacity per layer without sacrificing the recurrent inductive bias that makes sequence modeling efficient on long-range dependencies.

xLSTM matters because it challenges the prevailing assumption that Transformers are the only viable architecture for large-scale sequence modeling. Transformers carry quadratic attention complexity with respect to sequence length, which becomes costly for very long contexts. xLSTM's recurrent structure offers linear scaling in sequence length, making it attractive for applications where memory efficiency and throughput are critical. Early benchmarks suggest xLSTM models are competitive with similarly sized Transformers and state-space models like Mamba on language modeling tasks.

The development of xLSTM represents a broader trend of revisiting classical recurrent architectures with modern training techniques, hardware-aware design, and scaled experiments. Its emergence signals that the architectural landscape for sequence modeling remains open, and that recurrence — long considered superseded — may still offer meaningful advantages in specific regimes of scale and application.

Related

Related

LSTM (Long Short-Term Memory)
LSTM (Long Short-Term Memory)

A recurrent neural network architecture that learns long-range dependencies in sequential data.

Generality: 838
L2M (Large Memory Model)
L2M (Large Memory Model)

A decoder-only Transformer with addressable auxiliary memory enabling reasoning far beyond its attention window.

Generality: 189
Memory Extender
Memory Extender

Systems and techniques that expand how much information an AI model can retain and access.

Generality: 520
RNN (Recurrent Neural Network)
RNN (Recurrent Neural Network)

Neural networks with feedback connections that process sequential data using internal memory.

Generality: 838
LNN (Liquid Neural Network)
LNN (Liquid Neural Network)

A recurrent neural network that continuously adapts its internal state to process time-varying data.

Generality: 339
Neural Long-Term Memory Module
Neural Long-Term Memory Module

An explicit memory subsystem enabling neural networks to store and retrieve information persistently.

Generality: 441