Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. JEPA (Joint Embedding Predictive Architecture)

JEPA (Joint Embedding Predictive Architecture)

A self-supervised architecture that predicts representations in embedding space rather than pixel space.

Year: 2022Generality: 0.34
Back to Vocab

Joint Embedding Predictive Architecture (JEPA) is a self-supervised learning framework in which a model learns by predicting abstract representations of data rather than reconstructing raw inputs. Proposed by Yann LeCun as a cornerstone of his vision for human-level AI, JEPA encodes two related views or segments of an input — such as different patches of an image or different time steps in a sequence — into a shared embedding space. A predictor network then learns to map one encoded representation to another, with the key insight being that prediction happens entirely in latent space, not in pixel or token space.

This design choice is deliberate and consequential. Generative models that reconstruct raw inputs must account for every irrelevant detail — the exact texture of a surface, the precise color of a background — which can distract from learning semantically meaningful structure. By predicting in embedding space, JEPA sidesteps this problem: the model is free to discard low-level noise and focus on higher-level patterns that are actually predictive. A stop-gradient or exponential moving average target encoder (similar to techniques used in BYOL and DINO) is typically used to stabilize training and prevent representational collapse.

Image-JEPA (I-JEPA), introduced by Meta AI in 2023, demonstrated that this approach could learn strong visual representations without relying on hand-crafted data augmentations, outperforming many contrastive and generative baselines on downstream tasks. Video-JEPA (V-JEPA) extended the framework to temporal prediction across video frames. More broadly, JEPA represents a philosophical departure from the dominant paradigm of large generative models: rather than learning to produce outputs, it learns to predict the world's structure in a compact, abstract form — a property LeCun argues is essential for building efficient, generalizable world models that can support planning and reasoning.

Related

Related

Joint Embedding Architecture
Joint Embedding Architecture

A neural network design that maps multiple data modalities into a shared representational space.

Generality: 0.65
JEST (Joint Example Selection for Multimodal Contrastive Learning)
JEST (Joint Example Selection for Multimodal Contrastive Learning)

A multimodal learning method that improves representation quality by strategically selecting training pairs.

Generality: 0.09
Predictive Processing
Predictive Processing

A framework modeling the brain as a hierarchy that minimizes prediction errors about sensory input.

Generality: 0.69
Spatial Autoencoder
Spatial Autoencoder

An autoencoder variant that learns compact representations by preserving spatial structure in data.

Generality: 0.39
SAE (Structural Adaptive Embeddings)
SAE (Structural Adaptive Embeddings)

Embeddings that dynamically adjust to reflect the structural properties of complex data.

Generality: 0.29
Variational Autoencoder (VAE)
Variational Autoencoder (VAE)

A generative model that learns a structured latent space via probabilistic encoding and decoding.

Generality: 0.72