The process of passing input data through a neural network to produce output.
Forward propagation is the core computational process by which a neural network transforms input data into a prediction or output. Starting at the input layer, data flows sequentially through each layer of the network toward the output layer. At every layer, each neuron computes a weighted sum of its inputs, adds a bias term, and applies a nonlinear activation function — such as ReLU or sigmoid — to produce its output. This output is then passed forward as input to the next layer, continuing until the final layer produces the network's prediction.
The mechanics of forward propagation are straightforward but powerful. In matrix form, each layer's transformation can be written as applying a weight matrix and bias vector to the incoming activations, followed by an elementwise nonlinearity. This compact representation makes forward propagation highly efficient to compute, especially on modern hardware like GPUs. For a network with many layers, this sequential chain of transformations allows the model to learn increasingly abstract representations of the input data, with early layers capturing low-level features and deeper layers encoding higher-level structure.
Forward propagation is inseparable from training. During learning, the output it produces is compared against the true target using a loss function, and the resulting error signal is then propagated backward through the network via backpropagation to update the weights. Without forward propagation, there would be no prediction to evaluate and no gradient to compute. At inference time — when the model is deployed — forward propagation is the only computation required, making it the critical path for real-world performance.
The concept became central to machine learning with the resurgence of multilayer perceptrons in the mid-1980s, particularly following the influential 1986 work by Rumelhart, Hinton, and Williams that popularized backpropagation as a training algorithm. Today, forward propagation underpins virtually every deep learning architecture, from convolutional networks for image recognition to transformers for language modeling, and remains one of the most fundamental operations in the field.