Local Weight Sharing

Local weight sharing is a foundational design principle in convolutional neural networks (CNNs) in which a single set of learned weights — forming a convolutional filter or kernel — is applied repeatedly across different spatial positions of an input. Rather than assigning independent parameters to every connection in the network, the same filter slides across the input, computing dot products at each location. This dramatically reduces the total number of trainable parameters compared to fully connected architectures, making models far more practical to train on high-dimensional inputs like images.

The mechanism works by exploiting a key assumption about natural data: that meaningful patterns — edges, textures, shapes — can appear anywhere in an input, and the detector for such a pattern should not need to be re-learned for every possible position. A filter trained to detect a horizontal edge in the top-left corner of an image is equally useful when that edge appears in the center or bottom-right. This property, known as translation equivariance, emerges directly from weight sharing and is one of the primary reasons CNNs generalize so effectively to visual tasks.

Local weight sharing became central to practical deep learning through Yann LeCun's development of the LeNet architecture in the late 1980s and early 1990s, which applied the principle to handwritten digit recognition with striking success. The approach built on earlier ideas from Kunihiko Fukushima's neocognitron, but LeCun's integration of backpropagation with shared convolutional weights made the technique trainable end-to-end and scalable. This combination proved decisive for the field.

Beyond image recognition, local weight sharing has influenced architectures across domains — from 1D convolutions in audio and text processing to 3D convolutions in video analysis. The principle also informs modern attention-based models, where locality constraints are sometimes imposed to improve efficiency. Understanding local weight sharing is essential for grasping why CNNs are so parameter-efficient and why the inductive biases built into an architecture can matter as much as raw model capacity.

Local Weight Sharing

Related

Local Weight Sharing

Related

Related

Related