Features or encodings that a model discovers automatically from data rather than being hand-engineered.
Learned representation refers to features or data encodings that a model discovers automatically through training, rather than being designed by humans. Rather than relying on hand-crafted feature engineering — where practitioners explicitly define what dimensions matter — modern AI systems build their own feature hierarchies from raw data. The model adjusts internal parameters to minimize prediction error, and the resulting representation is a product of that optimization rather than human design.
The mechanism unfolds through stacked neural network layers. Lower layers typically learn low-level features such as edges, textures, and local patterns. Each successive layer transforms the previous representation into something more abstract, eventually encoding high-level semantic concepts without explicit supervision for what those concepts should be. This hierarchy emerges from the mathematics of gradient descent applied to a richly parameterized network trained on abundant data. The representation at any given layer is a learned function of the raw input, compressed and structured in ways that serve the network's task.
The tradeoff is power versus opacity. Learned representations scale extraordinarily well — adding data and compute consistently improves performance, and the same learned features can transfer across many tasks. Yet the resulting encodings are difficult to interpret: it is rarely clear which dimensions correspond to what concepts, or how the geometry of the representation space enables generalization. They also require substantially more data and compute than hand-crafted alternatives, and small changes in architecture or data can produce qualitatively different representations with the same downstream performance.
Open questions persist about why these hierarchies take the specific forms they do, how to control or regularize learned representations without hurting accuracy, and whether fundamentally different learning objectives would produce more interpretable or more sample-efficient encodings. There is also active work on aligning learned representations with human-interpretable concepts — probing, circuit analysis, and concept-based architectures all attempt to bridge the gap between what the network learns and what humans can understand.