A data point paired with a known output used to train supervised learning models.
A labeled example is a data sample consisting of an input paired with a corresponding target value or class annotation. In supervised machine learning, these pairs form the training signal that allows a model to learn the relationship between inputs and desired outputs. For instance, an image paired with the label "cat" or a patient record paired with a diagnosis both constitute labeled examples. The label represents ground truth — typically assigned by a human annotator, a domain expert, or derived from a reliable automated process.
During training, a supervised model processes labeled examples iteratively, comparing its predictions against the provided labels and adjusting its internal parameters to reduce the discrepancy. This feedback loop, often formalized through a loss function and optimization algorithm like gradient descent, allows the model to generalize from the specific examples it has seen to new, unseen inputs. The quality, quantity, and diversity of labeled examples therefore directly determine how well a trained model performs in practice.
Creating labeled datasets is one of the most resource-intensive aspects of applied machine learning. Large-scale annotation efforts — such as ImageNet, which contains over 14 million labeled images — require significant human labor and careful quality control. Errors or biases in labeling propagate directly into model behavior, making annotation accuracy a critical concern. Techniques like inter-annotator agreement scoring and active learning have emerged to improve the efficiency and reliability of the labeling process.
The centrality of labeled examples has also motivated research into methods that reduce dependence on them. Semi-supervised learning leverages small amounts of labeled data alongside large pools of unlabeled data, while self-supervised learning generates pseudo-labels from the data structure itself. Few-shot and zero-shot learning aim to generalize from very few labeled examples. Despite these advances, high-quality labeled data remains the most reliable foundation for building accurate, robust supervised models across domains ranging from computer vision and natural language processing to healthcare and finance.