An attention head that identifies and copies repeated token patterns from earlier context.
An induction head is a specific type of attention head found in transformer models that performs a precise in-context pattern-matching operation: given a token at the current position, it looks back through the sequence to find previous occurrences of that token, then attends to whatever token followed that earlier occurrence. This behavior allows the model to effectively predict that a pattern seen earlier in the sequence will repeat. The mechanism typically requires two attention heads working in concert — a "previous token head" that shifts information about each token to its successor, and the induction head itself that uses this shifted information to match and copy across long distances.
Induction heads were formally characterized and named by researchers at Anthropic in 2022, though the underlying behavior existed in trained transformers before it was identified and studied. The discovery emerged from mechanistic interpretability research — an effort to reverse-engineer what specific components of neural networks actually compute. Induction heads turned out to be remarkably consistent across model sizes and architectures, forming reliably in transformers during training and appearing to be a fundamental computational primitive rather than an artifact of any particular design choice.
The significance of induction heads extends well beyond simple pattern copying. They are considered a key mechanism underlying in-context learning — the striking ability of large language models to adapt to new tasks from just a few examples provided in the prompt, without any weight updates. When a model sees a few input-output demonstrations, induction heads help it recognize the demonstrated pattern and apply it to new inputs. This connection between a concrete, interpretable circuit and a high-level capability like few-shot learning makes induction heads one of the most compelling findings in mechanistic interpretability research.
Studying induction heads has broader implications for understanding how and why large language models generalize. Their consistent emergence across architectures suggests that certain computational structures are strongly favored by gradient descent on language modeling objectives. This makes them a valuable case study for researchers trying to understand neural network behavior, predict model capabilities, and ultimately build more transparent and reliable AI systems.