A transformer-based language model pre-trained to generate coherent, human-like text.
GPT (Generative Pre-Trained Transformer) is a class of large language models developed by OpenAI that uses the transformer architecture to generate fluent, contextually coherent text. Unlike earlier sequence models that processed text bidirectionally or relied on recurrence, GPT employs a unidirectional (left-to-right) autoregressive approach: given a sequence of tokens, the model predicts the next token by attending only to preceding context. This is achieved through stacked layers of masked self-attention and feed-forward networks, allowing the model to capture long-range dependencies across thousands of tokens.
The defining feature of GPT is its two-stage training paradigm. In the pre-training phase, the model is trained on massive corpora of internet text using a simple next-token prediction objective, absorbing broad knowledge about language, facts, and reasoning patterns. In the fine-tuning phase, the pre-trained model is adapted to specific downstream tasks — such as summarization, translation, or question answering — with relatively little labeled data. This transfer learning approach proved dramatically more efficient than training task-specific models from scratch, and helped establish pre-training as the dominant paradigm in NLP.
The GPT series scaled rapidly across successive versions. GPT-1 (2018) demonstrated that unsupervised pre-training could yield strong task performance. GPT-2 (2019) scaled to 1.5 billion parameters and generated surprisingly coherent long-form text, sparking early debates about misuse risks. GPT-3 (2020), at 175 billion parameters, introduced few-shot and zero-shot prompting as practical techniques, enabling the model to perform novel tasks from natural language instructions alone — without any gradient updates. GPT-4 (2023) further extended capabilities into multimodal reasoning.
GPT's impact on AI has been profound. It shifted the field toward foundation models — large, general-purpose systems adapted rather than retrained for each application. It also catalyzed the development of instruction-tuned and reinforcement-learning-from-human-feedback (RLHF) variants, most notably ChatGPT, which brought conversational AI to mainstream use. The architecture and training philosophy pioneered by GPT now underpin a wide ecosystem of competing and derivative models across industry and academia.