A modeling approach that predicts each sequence element from its preceding values.
Autoregressive prediction is a fundamental technique in machine learning and statistical modeling in which each value in a sequence is predicted as a function of its previous values. The term "autoregressive" reflects the self-referential nature of the approach: the model regresses a variable on its own past observations rather than on independent external inputs. This structure makes autoregressive models naturally suited to any domain where order and temporal context matter, including language, audio, time series, and video.
In practice, autoregressive models generate sequences one step at a time. At each step, the model takes previously generated or observed tokens as input and produces a probability distribution over the next possible value, from which a prediction or sample is drawn. This output then becomes part of the conditioning context for the next step. Classic statistical formulations such as AR, ARMA, and ARIMA models capture this idea with linear equations and explicit lag terms. Modern deep learning approaches replace these linear assumptions with neural networks—recurrent architectures like LSTMs, and more recently, Transformer-based models—that can learn complex, long-range dependencies across sequences.
Autoregressive prediction became central to large-scale language modeling with the rise of models like GPT, which are trained to predict the next token in a sequence across massive text corpora. This simple objective, applied at scale, yields models capable of coherent text generation, code synthesis, translation, and reasoning. Similar autoregressive frameworks underpin WaveNet for audio synthesis and PixelCNN for image generation, demonstrating the approach's versatility across modalities.
The appeal of autoregressive models lies in their tractability: because each prediction conditions only on observed history, the joint probability of a sequence can be decomposed into a product of conditional probabilities, making training via maximum likelihood straightforward. The key limitation is sequential generation—each step depends on the last, making parallelization during inference difficult. Despite this, autoregressive prediction remains one of the most powerful and widely deployed paradigms in modern generative AI.