A neural network architecture that produces pixel-wise predictions for image segmentation.
A Fully Convolutional Network (FCN) is a neural network architecture in which all layers are convolutional — including those that would traditionally be fully connected — enabling the network to accept input images of arbitrary spatial dimensions and produce dense, pixel-level output maps. This design makes FCNs the foundational approach for semantic segmentation, where the goal is to assign a class label to every pixel in an image rather than producing a single classification score for the image as a whole.
The key architectural insight behind FCNs is the replacement of fully connected layers with 1×1 convolutional layers, which preserves spatial information that would otherwise be collapsed into a fixed-length feature vector. To recover fine-grained spatial resolution lost during pooling and strided convolutions, FCNs employ upsampling techniques — most notably transposed convolutions and skip connections that fuse coarse, high-level semantic features with fine, low-level spatial details from earlier layers. This combination allows the network to produce segmentation maps that are both semantically meaningful and spatially precise.
FCNs were introduced by Long, Shelhamer, and Darrell in their landmark 2015 paper, which demonstrated that end-to-end training on pixel-wise prediction tasks was not only feasible but highly effective. The work established a new paradigm for dense prediction tasks and achieved state-of-the-art results on standard benchmarks such as PASCAL VOC. It showed that networks pretrained for image classification on large datasets like ImageNet could be successfully adapted — or fine-tuned — for segmentation by converting their fully connected layers into convolutional equivalents.
The impact of FCNs on computer vision has been substantial and lasting. They serve as the backbone or direct inspiration for a wide range of subsequent architectures, including U-Net (widely used in medical imaging), SegNet, and DeepLab. Applications span autonomous driving, satellite and aerial image analysis, medical image diagnosis, and robotics — any domain where understanding the spatial layout of a scene at pixel resolution is critical. FCNs effectively transformed semantic segmentation from a patchwork of heuristic approaches into a coherent, learnable, end-to-end framework.