A recurrent neural network architecture that learns long-range dependencies in sequential data.
Long Short-Term Memory (LSTM) networks are a specialized type of recurrent neural network (RNN) designed to overcome the fundamental limitation that plagued earlier sequence models: the inability to retain relevant information across long time spans. Standard RNNs suffer from the vanishing gradient problem, where gradients shrink exponentially during backpropagation through time, making it nearly impossible to learn connections between events separated by many steps. LSTMs, introduced by Sepp Hochreiter and Jürgen Schmidhuber in 1997, address this by replacing the simple recurrent unit with a more sophisticated memory cell capable of preserving state across hundreds or even thousands of timesteps.
The architecture's power comes from three learned gating mechanisms that control information flow. The forget gate decides what stored information to discard from the cell state. The input gate determines which new information to write into memory. The output gate controls what portion of the cell state is exposed as the unit's activation. Together, these gates allow the network to selectively remember, update, and expose information, giving it fine-grained control over what it retains across a sequence. This design creates a gradient highway through the cell state that resists vanishing, enabling stable learning over long contexts.
LSTMs became a dominant architecture throughout the 2000s and 2010s, achieving state-of-the-art results in speech recognition, machine translation, language modeling, handwriting recognition, and time series forecasting. Variants such as the peephole LSTM and the Gated Recurrent Unit (GRU) refined the original design, while bidirectional LSTMs extended the approach to process sequences in both directions simultaneously. Their ability to model temporal dependencies made them the go-to tool for any task where order and context matter.
Although Transformer-based architectures have largely supplanted LSTMs in natural language processing since the late 2010s, LSTMs remain widely used in domains where sequence length is moderate, computational resources are constrained, or streaming inference is required. They represent a foundational milestone in deep learning, demonstrating that neural networks could be engineered to maintain and exploit long-range memory in a principled, trainable way.