Reusing a model trained on one task to accelerate learning on another.
Transfer learning is a machine learning paradigm in which knowledge acquired while solving one problem is deliberately applied to a different but related problem. Rather than training a model from scratch, a practitioner begins with a model already trained on a large dataset — often called a foundation or pre-trained model — and adapts it to a new target task. This approach is especially valuable when the target task has limited labeled data, since the pre-trained model has already learned general-purpose representations, such as edges and textures in images or syntactic patterns in text, that transfer usefully across domains.
In practice, transfer learning typically takes one of two forms: feature extraction or fine-tuning. In feature extraction, the pre-trained model's weights are frozen and its internal representations are used as fixed inputs to a new, smaller model trained on the target task. In fine-tuning, the pre-trained weights serve as an initialization point, and some or all layers are updated through continued training on the target dataset. Fine-tuning tends to yield better performance when sufficient target data is available, while feature extraction is preferred when data is scarce or computational resources are limited. The choice of which layers to freeze or update often depends on how similar the source and target domains are.
Transfer learning became central to modern deep learning after researchers demonstrated that convolutional neural networks trained on ImageNet could be repurposed for a wide range of vision tasks with minimal additional training. The paradigm later transformed natural language processing with the introduction of large pre-trained language models such as BERT and GPT, which could be fine-tuned on downstream tasks like sentiment analysis, question answering, and named entity recognition with remarkable efficiency.
The practical impact of transfer learning is difficult to overstate. It dramatically lowers the data and compute requirements for building high-performing models, democratizing access to state-of-the-art AI capabilities. Organizations without the resources to train billion-parameter models from scratch can still achieve competitive results by fine-tuning publicly available pre-trained models. This has accelerated progress across virtually every applied domain of machine learning, from medical imaging to robotics to code generation.