Internal neural network representations that richly capture complex patterns and long-range dependencies.
In neural networks, hidden states are the intermediate activations that carry learned information through a model as it processes input data. When these states are described as "expressive," it means they possess sufficient capacity to encode rich, nuanced features and capture complex dependencies — including long-range relationships that span many steps in a sequence. Expressiveness is not a binary property but a spectrum: a hidden state that can only represent simple, shallow features is considered low-expressiveness, while one that encodes subtle contextual relationships and hierarchical structure is considered highly expressive.
The practical importance of expressive hidden states became especially clear in sequence modeling tasks such as language modeling, machine translation, and speech recognition. Standard recurrent neural networks (RNNs) struggled with the vanishing gradient problem, which caused their hidden states to lose information about distant past inputs — effectively limiting their expressiveness over long sequences. Architectures like Long Short-Term Memory networks (LSTMs) and Gated Recurrent Units (GRUs) were specifically designed to address this, using learned gating mechanisms to selectively retain, update, or forget information in the hidden state across many time steps.
The Transformer architecture further advanced this concept by replacing recurrent hidden states with attention-based representations that can directly relate any two positions in a sequence, enabling highly expressive encodings without the sequential bottleneck of RNNs. In modern large language models, the hidden states at each layer encode increasingly abstract and semantically rich representations — a property that has been studied extensively through probing classifiers and representation analysis. The expressiveness of these states is now understood to be a key driver of downstream task performance.
Expressive hidden states matter because they determine what a model can and cannot learn to represent. A model with insufficiently expressive hidden states will fail to capture the structure needed for accurate predictions, regardless of how much data or compute is applied. Researchers continue to study how to measure, improve, and control hidden state expressiveness, as it underpins generalization, transfer learning, and the interpretability of neural network behavior.