Adapting a pre-trained model to a specific task by continuing training on new data.
Fine-tuning is a transfer learning technique in which a model that has already been trained on a large, general-purpose dataset is further trained on a smaller, task-specific dataset. Rather than initializing weights randomly and learning from scratch, fine-tuning begins from a rich set of learned representations — capturing edges, textures, syntactic patterns, or semantic relationships depending on the domain — and refines them for a narrower objective. This dramatically reduces the amount of labeled data and compute required to achieve strong performance on specialized tasks.
In practice, fine-tuning typically involves unfreezing some or all of the pre-trained model's layers and running additional gradient-based optimization on the new dataset. A reduced learning rate is commonly used to make small, careful adjustments that preserve the valuable knowledge encoded in the original weights, rather than overwriting it. In some settings, only the final layers or a task-specific head are updated while earlier layers remain frozen — a lighter variant sometimes called feature extraction — while full fine-tuning updates the entire network.
Fine-tuning became central to modern NLP with the introduction of large pre-trained language models such as BERT and GPT, where a single model trained on massive text corpora could be fine-tuned to excel at question answering, sentiment analysis, named entity recognition, and dozens of other downstream tasks with minimal additional data. The same paradigm proved equally powerful in computer vision, where models pre-trained on ImageNet were fine-tuned for medical imaging, satellite analysis, and other specialized domains.
The practical significance of fine-tuning is difficult to overstate. It democratizes access to high-performing AI by allowing organizations with limited data and compute budgets to leverage the investments made in training foundation models. It also raises important considerations around catastrophic forgetting, domain shift, and the risk of inheriting biases present in the original pre-training data — challenges that continue to drive active research into more robust and efficient adaptation methods.