Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Positional Encoding

Positional Encoding

A method for injecting token order information into sequence models lacking recurrence.

Year: 2017Generality: 731
Back to Vocab

Positional encoding is a technique used in transformer-based neural networks to supply the model with information about where each token appears within an input sequence. Unlike recurrent architectures such as LSTMs, transformers process all tokens in parallel, meaning the model has no built-in sense of order. Without some form of positional signal, a transformer would treat "the cat sat" identically to "sat cat the" — the same tokens, just rearranged. Positional encodings solve this by augmenting each token's embedding with a vector that encodes its position, allowing the model to distinguish tokens based on where they appear.

The most common approach, introduced alongside the original transformer architecture in 2017, uses sinusoidal functions of varying frequencies to generate fixed positional vectors. Each dimension of the positional encoding corresponds to a sine or cosine wave at a different frequency, producing a unique fingerprint for each position that the model can learn to interpret. An alternative approach treats positional encodings as learnable parameters, optimized during training just like weights in any other layer. Both strategies are added directly to the input embeddings before they enter the attention mechanism, preserving dimensional compatibility while enriching the representation with sequential context.

More recent work has explored relative positional encodings, which encode the distance between tokens rather than their absolute positions, and rotary positional embeddings (RoPE), which integrate positional information directly into the attention computation. These advances have proven especially valuable for handling long sequences and for improving generalization to sequence lengths not seen during training. Models like GPT, BERT, and their successors each make distinct choices about positional encoding strategy, and these choices meaningfully affect downstream performance.

Positional encoding matters because sequence order is semantically critical in language, music, time-series data, and many other domains. Getting this signal right is foundational to a transformer's ability to model syntax, causality, and temporal structure. As transformer architectures continue to expand into vision, biology, and multimodal tasks, positional encoding strategies are evolving alongside them, remaining an active area of research.

Related

Related

Contextual Embedding
Contextual Embedding

Word representations that dynamically shift meaning based on surrounding context.

Generality: 752
Transformer
Transformer

A neural network architecture using self-attention to process sequential data in parallel.

Generality: 900
Encoder-Decoder Transformer
Encoder-Decoder Transformer

A transformer architecture that encodes input sequences and decodes them into outputs.

Generality: 722
Sequence Model
Sequence Model

A model that learns patterns and dependencies within ordered data sequences.

Generality: 840
Sequential Models
Sequential Models

AI models that process ordered data by capturing dependencies across time or position.

Generality: 795
Sequence Masking
Sequence Masking

Technique that selectively hides input tokens to control what a model attends to.

Generality: 628