Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Encoder-Decoder Transformer

Encoder-Decoder Transformer

A transformer architecture that encodes input sequences and decodes them into outputs.

Year: 2017Generality: 722
Back to Vocab

The encoder-decoder transformer is a neural network architecture built entirely on attention mechanisms, introduced in the landmark 2017 paper "Attention is All You Need." Unlike earlier sequence-to-sequence models that relied on recurrent neural networks, this architecture processes input tokens in parallel rather than sequentially. The encoder stack reads the full input sequence and produces a rich set of contextual representations, while the decoder stack generates the output sequence one token at a time, attending to both its own previously generated tokens and the encoder's representations through a mechanism called cross-attention.

The encoder consists of multiple identical layers, each containing a multi-head self-attention sublayer and a feed-forward sublayer. Self-attention allows every position in the input to directly attend to every other position, capturing long-range dependencies that RNNs struggled to maintain. The decoder mirrors this structure but adds a cross-attention sublayer that queries the encoder's output, enabling the model to selectively focus on relevant parts of the input when producing each output token. Positional encodings are added to token embeddings to inject sequence order information, since the architecture itself has no inherent notion of position.

This design excels at tasks requiring a mapping between two sequences of potentially different lengths and structures, making it the dominant architecture for machine translation, abstractive summarization, and speech recognition. The parallelizability of attention over sequence positions dramatically accelerated training on modern hardware compared to recurrent alternatives, enabling models to scale to far larger datasets and parameter counts.

While later architectures like BERT adopted encoder-only designs and GPT adopted decoder-only designs for different task families, the full encoder-decoder structure remains essential wherever explicit conditioning on a complete input sequence is required. Models such as T5, BART, and mT5 demonstrate its continued relevance, applying the encoder-decoder framework to a broad range of language understanding and generation tasks through large-scale pretraining.

Related

Related

Transformer
Transformer

A neural network architecture using self-attention to process sequential data in parallel.

Generality: 900
Encoder-Decoder Models
Encoder-Decoder Models

Deep learning architectures that compress input into a representation and generate output.

Generality: 792
Transformer Block
Transformer Block

A core neural network module combining self-attention and feedforward layers for sequence modeling.

Generality: 820
Seq2Seq (Sequence-to-Sequence)
Seq2Seq (Sequence-to-Sequence)

A neural architecture that maps variable-length input sequences to variable-length output sequences.

Generality: 794
Self-Attention
Self-Attention

A mechanism that lets neural networks weigh relationships between all parts of an input simultaneously.

Generality: 794
Text-to-Text Model
Text-to-Text Model

An AI model that transforms natural language input into natural language output.

Generality: 720