A neural network architecture where information flows strictly from input to output.
A feedforward neural network is a foundational architecture in machine learning in which data moves in a single direction — from the input layer, through one or more hidden layers, and finally to the output layer — with no cycles, loops, or feedback connections. This unidirectional flow distinguishes feedforward networks from recurrent architectures, where outputs can cycle back as inputs. The simplest feedforward network is the single-layer perceptron, while deeper variants with multiple hidden layers are commonly called multilayer perceptrons (MLPs) or, more broadly, deep neural networks.
During a forward pass, each neuron computes a weighted sum of its inputs, applies a nonlinear activation function such as ReLU or sigmoid, and passes the result to the next layer. This composition of linear transformations and nonlinearities allows feedforward networks to approximate arbitrarily complex functions — a property formalized by the Universal Approximation Theorem. Training is typically performed via backpropagation, an algorithm that computes gradients of a loss function with respect to each weight by applying the chain rule in reverse through the network, enabling gradient descent optimization.
Feedforward networks matter because they serve as the conceptual backbone for nearly all modern deep learning architectures. Convolutional neural networks, transformers, and other specialized models all incorporate feedforward components — particularly in their fully connected layers and position-wise feed-forward sublayers. Understanding the feedforward mechanism is therefore essential for grasping how information is transformed and abstracted at each stage of a deep model.
In practice, feedforward networks are applied across a wide range of tasks including image classification, tabular data modeling, function approximation, and as components within larger systems. Their relative simplicity makes them an ideal starting point for understanding neural computation, while their scalability — enabled by modern hardware and large datasets — means they remain competitive on many real-world benchmarks. The architecture's enduring relevance reflects how much expressive power can emerge from stacking simple, differentiable transformations.