Massive neural networks trained on text to understand and generate human language.
A Large Language Model (LLM) is a deep learning system trained on enormous corpora of text data to perform a wide range of natural language tasks. Built on the transformer architecture, LLMs learn statistical relationships between words, phrases, and concepts by predicting tokens in sequence across billions or trillions of training examples. The result is a model that encodes rich representations of language, world knowledge, and reasoning patterns within its parameters — which typically number in the billions for modern systems.
LLMs work by processing input text as a sequence of tokens and using self-attention mechanisms to weigh relationships between all tokens simultaneously. During training, the model adjusts its parameters to minimize prediction error across massive datasets drawn from books, websites, code repositories, and other sources. After pretraining, models are often fine-tuned or aligned using techniques like reinforcement learning from human feedback (RLHF) to make outputs more accurate, helpful, and safe. Inference involves sampling from the model's probability distribution over possible next tokens, producing fluent, contextually appropriate text.
The practical capabilities of LLMs are remarkably broad: they can answer questions, summarize documents, translate languages, write and debug code, draft creative content, and engage in multi-turn dialogue. Performance scales predictably with model size, dataset size, and compute — a relationship formalized in scaling laws research — which has driven a sustained push toward ever-larger models. GPT-2 (2019), GPT-3 (2020), and subsequent systems like PaLM, LLaMA, and GPT-4 demonstrated successive leaps in capability that surprised even their creators.
LLMs have become foundational infrastructure for modern AI applications, powering products used by hundreds of millions of people. They also raise important questions around factual accuracy, bias amplification, intellectual property, and misuse. Understanding their capabilities and limitations — including tendencies to hallucinate plausible-sounding but false information — is essential for deploying them responsibly. As the dominant paradigm in NLP and an increasingly central component of multimodal AI systems, LLMs represent one of the most consequential developments in the history of artificial intelligence.