A model that predicts each next output using its own previous outputs as inputs.
An autoregressive sequence generator is a predictive model that produces outputs sequentially, feeding each generated element back as input for the next prediction. Rather than processing all outputs simultaneously, the model conditions each new prediction on the history of what it has already produced — or, in supervised settings, on the true prior values during training. This self-referential structure makes autoregressive models naturally suited to any domain where order and context matter: time-series forecasting, language modeling, audio synthesis, and image generation.
The mechanics vary by architecture, but the core idea is consistent. At each step, the model estimates a probability distribution over possible next values given all previous ones, then samples or selects from that distribution. In classical statistical models like AR(p) or ARIMA, this relationship is expressed as a weighted linear combination of the last p observations. In modern deep learning, the same principle is implemented through recurrent neural networks, transformers, or masked convolutional networks — all of which can capture far richer, nonlinear dependencies across long sequences.
Autoregressive models became central to deep learning after researchers demonstrated that neural language models could generate coherent, fluent text by predicting one token at a time. The transformer-based GPT family exemplifies this approach at scale: each token is predicted from all preceding tokens using self-attention, enabling the model to draw on long-range context efficiently. Similar autoregressive designs power WaveNet for audio and PixelCNN for images, showing the paradigm's versatility across modalities.
The primary trade-off of autoregressive generation is speed: because each output depends on the previous one, generation is inherently sequential and cannot be trivially parallelized at inference time. This has motivated research into speculative decoding, distillation into non-autoregressive models, and other acceleration strategies. Despite this limitation, autoregressive generators remain among the most powerful and widely deployed generative architectures in modern AI, largely because their training objective — predicting the next element — is simple, scalable, and produces models with strong generalization.