Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Point-wise Feedforward Network

Point-wise Feedforward Network

A transformer sublayer applying identical linear transformations independently to each sequence position.

Year: 2017Generality: 660
Back to Vocab

A point-wise feedforward network (FFN) is a neural network sublayer that applies the same two-layer fully connected transformation independently to every position in a sequence. Unlike attention mechanisms, which mix information across positions, the FFN treats each position in isolation — hence "point-wise" — making it embarrassingly parallel and computationally efficient. The standard formulation applies a linear projection that expands the input dimensionality by a factor (typically four), passes the result through a non-linear activation function such as ReLU or GELU, then projects back down to the original model dimension.

In the transformer architecture introduced by Vaswani et al. in 2017, each encoder and decoder block contains one point-wise FFN sublayer alongside the multi-head self-attention sublayer. The expansion-then-contraction structure gives the network a bottleneck-like capacity to learn rich intermediate representations at each position before compressing them back into the residual stream. Because the same weight matrices are shared across all positions within a layer, the FFN functions somewhat like a learned, position-agnostic feature extractor applied uniformly across the sequence.

The FFN sublayer plays a critical role in the overall expressiveness of transformer models. Research has shown that these layers store a surprising amount of factual and relational knowledge, effectively acting as key-value memories where the first linear layer retrieves patterns and the second writes updated representations. This insight has motivated architectural variants such as mixture-of-experts (MoE) layers, which replace the dense FFN with a sparse collection of expert networks, allowing models to scale parameters without proportionally increasing compute.

Point-wise feedforward networks matter because they account for a large fraction of a transformer's total parameters and computational cost, making them a primary target for efficiency research. Techniques like low-rank approximation, pruning, and quantization are frequently applied to FFN weights. Their design also influences how models are scaled: the ratio of FFN hidden dimension to model dimension is a key hyperparameter that practitioners tune when balancing capacity against memory and throughput constraints.

Related

Related

Feedforward Neural Network
Feedforward Neural Network

A neural network architecture where information flows strictly from input to output.

Generality: 838
Transformer Block
Transformer Block

A core neural network module combining self-attention and feedforward layers for sequence modeling.

Generality: 820
Transformer
Transformer

A neural network architecture using self-attention to process sequential data in parallel.

Generality: 900
Attention Network
Attention Network

A neural network that dynamically weights input elements to capture relevant context.

Generality: 796
FCN (Fully Convolutional Network)
FCN (Fully Convolutional Network)

A neural network architecture that produces pixel-wise predictions for image segmentation.

Generality: 694
Forward Propagation
Forward Propagation

The process of passing input data through a neural network to produce output.

Generality: 838