A deep learning system predicting 3D protein structures from amino acid sequences with near-experimental accuracy.
AlphaFold is a family of deep learning models developed by DeepMind that predict three-dimensional protein structures directly from amino acid sequences. By integrating evolutionary information derived from multiple sequence alignments (MSAs) with learned geometric reasoning, AlphaFold produces atomic coordinates and per-residue confidence scores (pLDDT) that frequently match the accuracy of experimental methods such as X-ray crystallography or cryo-electron microscopy. Its landmark version, AlphaFold2, was introduced at the CASP14 protein structure prediction competition in 2020, where it achieved accuracy so far beyond prior methods that many researchers described it as effectively solving a 50-year-old grand challenge in biology.
Architecturally, AlphaFold2 centers on the "Evoformer" — a novel neural network block that jointly processes two representations: a pairwise residue-residue distance map and an MSA representation encoding evolutionary co-variation across related protein sequences. These representations iteratively exchange information through attention mechanisms, allowing the model to learn implicit physical and evolutionary constraints without hand-crafted energy functions. A downstream structure module uses invariant point attention (IPA) to produce backbone and side-chain coordinates in a rotation- and translation-equivariant manner. The entire pipeline is trained end-to-end on structures from the Protein Data Bank (PDB), with prediction recycling used as a form of self-consistency regularization.
AlphaFold's significance extends well beyond structural biology. It demonstrated that a sufficiently expressive neural architecture, trained on the right combination of evolutionary and structural data, can internalize complex physical rules that previously required decades of expert-crafted heuristics to approximate. This has accelerated drug discovery, enzyme engineering, and the interpretation of disease-associated genetic variants at a scale previously impossible. DeepMind subsequently released predicted structures for over 200 million proteins through the AlphaFold Protein Structure Database, making high-quality structural models freely accessible to the global research community.
Despite its transformative impact, AlphaFold retains important limitations. It performs less reliably on multi-chain protein complexes, intrinsically disordered regions, and proteins whose function depends on ligand binding, post-translational modifications, or conformational dynamics. These gaps have motivated successor systems — including AlphaFold3 and competing models — that extend the framework to broader classes of biomolecular interactions, underscoring how AlphaFold reshaped the entire field of computational structural biology.