A model that learns patterns and dependencies within ordered data sequences.
A sequence model is a class of machine learning model designed to process, analyze, and generate data where order matters — such as text, audio, time series, or genomic data. Unlike models that treat inputs as independent and interchangeable, sequence models explicitly account for the temporal or positional relationships between elements, recognizing that the meaning or value of any single element often depends on what came before or after it. This makes them essential for tasks like language modeling, speech recognition, machine translation, and protein structure prediction.
The core challenge sequence models address is learning dependencies across varying distances within a sequence. Early approaches used Recurrent Neural Networks (RNNs), which process inputs one step at a time while maintaining a hidden state that carries information forward. This allows the model to accumulate context, but vanilla RNNs struggle with long-range dependencies due to the vanishing gradient problem — gradients shrink exponentially as they propagate back through many time steps. Long Short-Term Memory networks (LSTMs) and Gated Recurrent Units (GRUs) addressed this with gating mechanisms that selectively retain or discard information, enabling the model to learn dependencies spanning hundreds of steps.
The introduction of the Transformer architecture in 2017 marked a turning point for sequence modeling. Rather than processing tokens sequentially, Transformers apply self-attention mechanisms that directly compute relationships between all pairs of positions in a sequence simultaneously. This parallelism dramatically accelerates training and allows the model to capture long-range dependencies without the compounding errors of recurrent processing. Transformers became the foundation for large language models like BERT and GPT, which have set state-of-the-art benchmarks across virtually every sequence-based NLP task.
Sequence models matter because so much real-world data is inherently sequential. Language, music, financial markets, sensor readings, and biological signals all unfold over time or position, and capturing that structure is critical to making accurate predictions or generating coherent outputs. As architectures have evolved from RNNs to attention-based models, the range and quality of tasks sequence models can handle has expanded dramatically, making them one of the most impactful paradigms in modern machine learning.