Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. GPT (Generative Pre-Trained Transformer)

GPT (Generative Pre-Trained Transformer)

A transformer-based language model pre-trained to generate coherent, human-like text.

Year: 2018Generality: 865
Back to Vocab

GPT (Generative Pre-Trained Transformer) is a class of large language models developed by OpenAI that uses the transformer architecture to generate fluent, contextually coherent text. Unlike earlier sequence models that processed text bidirectionally or relied on recurrence, GPT employs a unidirectional (left-to-right) autoregressive approach: given a sequence of tokens, the model predicts the next token by attending only to preceding context. This is achieved through stacked layers of masked self-attention and feed-forward networks, allowing the model to capture long-range dependencies across thousands of tokens.

The defining feature of GPT is its two-stage training paradigm. In the pre-training phase, the model is trained on massive corpora of internet text using a simple next-token prediction objective, absorbing broad knowledge about language, facts, and reasoning patterns. In the fine-tuning phase, the pre-trained model is adapted to specific downstream tasks — such as summarization, translation, or question answering — with relatively little labeled data. This transfer learning approach proved dramatically more efficient than training task-specific models from scratch, and helped establish pre-training as the dominant paradigm in NLP.

The GPT series scaled rapidly across successive versions. GPT-1 (2018) demonstrated that unsupervised pre-training could yield strong task performance. GPT-2 (2019) scaled to 1.5 billion parameters and generated surprisingly coherent long-form text, sparking early debates about misuse risks. GPT-3 (2020), at 175 billion parameters, introduced few-shot and zero-shot prompting as practical techniques, enabling the model to perform novel tasks from natural language instructions alone — without any gradient updates. GPT-4 (2023) further extended capabilities into multimodal reasoning.

GPT's impact on AI has been profound. It shifted the field toward foundation models — large, general-purpose systems adapted rather than retrained for each application. It also catalyzed the development of instruction-tuned and reinforcement-learning-from-human-feedback (RLHF) variants, most notably ChatGPT, which brought conversational AI to mainstream use. The architecture and training philosophy pioneered by GPT now underpin a wide ecosystem of competing and derivative models across industry and academia.

Related

Related

bGPT (Byte-Level Transformer)
bGPT (Byte-Level Transformer)

A GPT variant that processes raw bytes instead of tokenized text or subwords.

Generality: 101
nGPT (Normalized Transformer)
nGPT (Normalized Transformer)

A transformer variant that normalizes representations on a hypersphere for faster, more stable training.

Generality: 101
BERT (Bidirectional Encoder Representations from Transformers)
BERT (Bidirectional Encoder Representations from Transformers)

A transformer-based model that understands language by reading text in both directions simultaneously.

Generality: 834
Transformer
Transformer

A neural network architecture using self-attention to process sequential data in parallel.

Generality: 900
NTP (Next Token Prediction)
NTP (Next Token Prediction)

A training objective where language models learn to predict the next token in a sequence.

Generality: 795
Generative AI
Generative AI

AI systems that produce original content by learning patterns from training data.

Generality: 871