Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. EMT (Extended Mind Transformer)

EMT (Extended Mind Transformer)

A transformer architecture that augments self-attention with external memory retrieval for longer context.

Year: 2023Generality: 107
Back to Vocab

The Extended Mind Transformer (EMT) is a neural architecture that enhances the standard transformer by integrating an external memory system, allowing the model to retrieve and attend to information stored outside its fixed context window. Inspired by the philosophical concept of the "extended mind" — the idea that cognition can incorporate external tools and environments — EMT treats external memory as a functional extension of the model's reasoning process rather than a separate lookup table. This design philosophy distinguishes EMT from simpler retrieval-augmented generation (RAG) approaches by tightly coupling retrieval with the attention mechanism itself.

At a technical level, EMT augments the self-attention process with a k-nearest neighbor (kNN) retrieval step. During inference, the model queries an external memory store using learned representations, retrieves the top-k most relevant entries, and incorporates them directly into the attention computation alongside the standard in-context tokens. This means the model can effectively "attend" to a vastly larger pool of information than its context window would otherwise permit, without the quadratic computational cost of extending self-attention over extremely long sequences. The retrieval is differentiable and integrated into the model's decision-making, making it more adaptive than post-hoc retrieval pipelines.

EMT addresses a fundamental limitation of transformer models: the trade-off between context length and computational feasibility. Tasks requiring long-range dependencies — such as multi-document reasoning, extended dialogue, or complex code generation — often exceed practical context limits. By offloading some of this burden to an external memory that can be queried efficiently, EMT enables models to maintain relevant information across inputs far longer than their native context windows support. This makes it particularly valuable for enterprise and research applications where comprehensive context is critical.

The architecture gained meaningful attention in the machine learning community around 2023, with work from researchers at Normal Computing formalizing the approach and demonstrating its advantages over both vanilla transformers and standard RAG systems. EMT represents a broader trend in the field toward hybrid architectures that combine parametric knowledge stored in model weights with non-parametric, dynamically retrievable external information.

Related

Related

Memory Extender
Memory Extender

Systems and techniques that expand how much information an AI model can retain and access.

Generality: 520
L2M (Large Memory Model)
L2M (Large Memory Model)

A decoder-only Transformer with addressable auxiliary memory enabling reasoning far beyond its attention window.

Generality: 189
Neural Long-Term Memory Module
Neural Long-Term Memory Module

An explicit memory subsystem enabling neural networks to store and retrieve information persistently.

Generality: 441
Memory Systems
Memory Systems

Architectures that enable AI models to store, retrieve, and reason over information.

Generality: 753
MoT (Mixture of Transformers)
MoT (Mixture of Transformers)

An architecture combining multiple specialized transformers to capture richer, more diverse representations.

Generality: 337
Transformer
Transformer

A neural network architecture using self-attention to process sequential data in parallel.

Generality: 900