Delayed generalization in neural networks where models suddenly learn true structure after overfitting.
Grokking is a phenomenon observed in machine learning where a neural network first memorizes training data — achieving near-perfect training accuracy while failing to generalize — and then, after extended training well past this memorization phase, undergoes a sudden transition to genuine generalization on held-out data. The term borrows from Robert Heinlein's 1961 coinage meaning deep intuitive understanding, but in modern ML research it refers specifically to this delayed, phase-transition-like leap from rote memorization to structural comprehension.
The phenomenon was formally documented and named in a 2022 paper by Power et al. at OpenAI, who observed it while training small transformers on modular arithmetic tasks. Models would plateau at chance-level validation accuracy for thousands of steps after reaching 100% training accuracy, then abruptly generalize — sometimes after training for ten to one hundred times longer than the point of memorization. This sharp transition resembles a phase change rather than gradual improvement, making it both striking and theoretically significant.
Understanding grokking matters because it challenges conventional assumptions about early stopping and overfitting. Standard practice treats a large gap between training and validation performance as a signal to halt training, yet grokking demonstrates that continued optimization can eventually force a model to discover more compact, generalizable representations rather than persist with memorized lookup tables. Researchers have linked the phenomenon to weight norm dynamics and implicit regularization — as training continues, weight decay or other regularization pressures favor simpler solutions that happen to generalize.
Grokking has broader implications for interpretability and our understanding of how neural networks represent knowledge internally. It suggests that generalization and memorization are not simply competing outcomes determined at initialization, but dynamic states a model can transition between. This has spurred investigation into what internal representations look like before and after the grokking transition, with some work showing that interpretable, structured algorithms emerge in the weights only after generalization occurs — offering a rare window into mechanistic interpretability of learned computation.