A training approach that learns from decentralized data without ever centralizing it.
Federated learning is a machine learning paradigm in which a shared model is trained across many decentralized participants — such as mobile devices, hospitals, or edge servers — each holding private local data that never leaves its source. Instead of pooling raw data into a central repository, each participant trains a local model on its own data and sends only the resulting model updates (gradients or weights) to a central coordinator. The coordinator aggregates these updates, typically using an algorithm like Federated Averaging (FedAvg), to improve the global model, which is then redistributed to participants for the next round. This cycle repeats until the model converges.
The privacy benefits of federated learning are substantial. Because raw data stays on-device, the approach is well-suited to domains governed by strict data protection regulations such as HIPAA in healthcare or GDPR in Europe. It also reduces the bandwidth costs of transmitting large datasets and enables learning from data that is inherently distributed — such as keyboard input on smartphones or sensor readings from industrial equipment. Google's use of federated learning to improve Gboard next-word prediction without accessing users' typed text became an early and influential demonstration of the approach at scale.
Despite its advantages, federated learning introduces significant technical challenges. Data across participants is often non-IID (not independently and identically distributed), meaning local datasets can differ dramatically in size and statistical properties, which can destabilize training and bias the global model. Communication efficiency is another concern, since many rounds of update exchange are required. Security is not fully guaranteed either — model updates can still leak information through techniques like gradient inversion attacks, motivating the combination of federated learning with differential privacy or secure aggregation protocols.
Federated learning has become a foundational concept in privacy-preserving AI and is increasingly relevant as organizations seek to collaborate on model development without surrendering data sovereignty. Its principles extend beyond supervised learning into federated fine-tuning of large language models, federated reinforcement learning, and cross-silo settings where the participants are institutions rather than individual devices.