Forecasting the next item(s) in a sequence by learning patterns from prior observations.
Sequence prediction is the task of inferring one or more future elements in an ordered series based on patterns learned from preceding elements. The input can be discrete tokens — such as words, characters, or genomic bases — or continuous values like sensor readings and stock prices. What distinguishes sequence prediction from standard regression or classification is the explicit dependence on temporal or positional order: the model must account not just for individual inputs but for how they relate across time.
Modern sequence prediction relies heavily on architectures designed to capture these temporal dependencies. Recurrent Neural Networks (RNNs) introduced a hidden state that persists across time steps, allowing information to flow from past inputs to future predictions. Long Short-Term Memory (LSTM) networks, introduced by Hochreiter and Schmidhuber in 1997, extended this by adding gating mechanisms that selectively retain or discard information, addressing the vanishing gradient problem that plagued earlier RNNs. More recently, Transformer-based models have largely supplanted recurrent architectures for many sequence tasks by using self-attention to model dependencies across arbitrary distances in a sequence without processing steps sequentially.
Sequence prediction underpins a wide range of practical applications. In natural language processing, it powers next-word prediction, machine translation, and autoregressive text generation. In time-series analysis, it enables demand forecasting, anomaly detection, and financial modeling. In biology, it supports protein structure prediction and genomic sequence analysis. The breadth of these applications reflects how fundamental ordered structure is to real-world data.
The field gained significant momentum in the early 2010s as deep learning matured and large datasets became available. The introduction of sequence-to-sequence (seq2seq) models around 2014 was particularly influential, enabling variable-length input and output sequences and opening the door to neural machine translation and speech synthesis at scale. Today, large language models trained on next-token prediction objectives represent the state of the art, demonstrating that sequence prediction, when scaled sufficiently, can serve as a general-purpose learning framework.