Neural networks with feedback connections that process sequential data using internal memory.
A Recurrent Neural Network (RNN) is a class of neural network architecture designed to handle sequential and time-series data by maintaining an internal hidden state that persists across time steps. Unlike feedforward networks, which process each input independently, RNNs feed their hidden state from one step forward into the next, effectively giving the network a form of short-term memory. This makes them naturally suited for tasks where context and ordering matter — such as language modeling, speech recognition, machine translation, and time-series forecasting.
At each time step, an RNN takes the current input and the previous hidden state, combines them through a learned weight matrix, and produces both an output and an updated hidden state. This recurrent connection creates a directed cycle through time, allowing information to theoretically persist across arbitrarily long sequences. In practice, however, standard RNNs struggle to retain information over long distances due to the vanishing gradient problem: as gradients are backpropagated through many time steps, they shrink exponentially, making it difficult to learn long-range dependencies. The complementary exploding gradient problem, where gradients grow uncontrollably, can also destabilize training.
These limitations motivated the development of gated architectures. Long Short-Term Memory (LSTM) networks, introduced by Hochreiter and Schmidhuber in 1997, added explicit memory cells and learnable gates to control what information is stored, forgotten, or passed forward. Gated Recurrent Units (GRUs), proposed in 2014, offered a simpler variant with comparable performance. These extensions dramatically improved RNNs' ability to model long-range dependencies and became the dominant sequence modeling tools throughout the 2010s.
While Transformer-based architectures have largely supplanted RNNs in natural language processing due to their parallelizability and superior handling of long contexts, RNNs remain relevant in settings with strict latency constraints, streaming data, or limited compute — such as embedded systems and real-time signal processing. Their sequential inductive bias also makes them a natural fit for certain scientific and engineering domains where data is inherently ordered and causal.