A discrete computational stage in a neural network that transforms input representations progressively.
A model layer is a fundamental building block of neural networks — a discrete computational unit that receives input, applies a set of mathematical operations, and passes transformed output to the next stage. Each layer typically consists of learnable parameters (weights and biases) combined with a fixed operation such as a linear transformation, convolution, or attention mechanism, followed by a nonlinear activation function. Stacking multiple layers allows a network to compose simple transformations into increasingly sophisticated functions, enabling it to model complex, high-dimensional relationships in data.
Different layer types are designed for different structural properties of data. Convolutional layers exploit spatial locality and translation invariance, making them well-suited for images and audio. Recurrent layers maintain hidden state across sequential inputs, capturing temporal dependencies in text or time-series data. Fully connected (dense) layers apply global transformations across all input dimensions and are commonly used for final classification or regression stages. More recent architectures introduce transformer layers built around self-attention, which model pairwise relationships across entire input sequences without relying on fixed spatial or temporal structure.
The depth of a network — the number of stacked layers — is one of the most consequential architectural decisions in deep learning. Shallow networks are theoretically capable of approximating arbitrary functions, but deep networks achieve the same expressiveness far more efficiently in terms of parameter count. Empirically, depth enables hierarchical feature learning: early layers in a vision model detect edges and textures, middle layers compose these into parts, and later layers represent semantic concepts like faces or objects. Understanding how information flows and transforms across layers remains an active area of interpretability research, as the internal representations learned by deep networks are often opaque despite their practical effectiveness.