A pre-trained model used as a starting point for task-specific adaptation.
A base model is a large neural network trained on broad, diverse data that serves as a reusable foundation for downstream tasks. Rather than learning from random initialization, practitioners begin with a base model that has already internalized rich representations of language, images, or other modalities. This pre-trained knowledge dramatically reduces the data and compute required to build capable, specialized systems.
The mechanics of using a base model typically involve one of two approaches: fine-tuning, where the model's weights are further updated on task-specific data, or prompting and in-context learning, where the model is queried directly without any weight modification. In both cases, the base model's learned representations act as a powerful prior. Architectures like transformers have proven especially effective as base models because their attention mechanisms capture long-range dependencies and generalize well across domains.
Base models became central to modern AI practice with the release of large-scale pretrained systems such as BERT and GPT-2 around 2018–2019, which demonstrated that a single model trained on internet-scale text could be adapted to dozens of benchmarks with minimal additional effort. The paradigm shifted research and industry workflows alike: instead of training task-specific models from scratch, teams now routinely start from a shared base. This has democratized access to high-performance AI, since organizations without massive compute budgets can still build competitive systems by fine-tuning publicly released base models.
The importance of base models extends beyond convenience. They encode a form of transferable knowledge that often generalizes to tasks and distributions not seen during pretraining, a property known as transfer learning. As base models have grown in scale—from hundreds of millions to hundreds of billions of parameters—their emergent capabilities have expanded, enabling few-shot and zero-shot performance that was previously unattainable. Choosing, evaluating, and responsibly deploying base models has consequently become one of the most consequential decisions in applied machine learning.