A measure of how hard individual training examples are for a model to learn.
Sample difficulty refers to the varying degrees of challenge that individual data points pose to a machine learning model during training. Not all examples in a dataset are equally easy to learn from: some samples are cleanly representative of their class or target value, while others sit near decision boundaries, contain noise, carry label ambiguity, or belong to underrepresented regions of the feature space. Quantifying and responding to these differences is central to building models that generalize well.
Several factors drive sample difficulty. Label noise — where a data point is incorrectly annotated — makes a sample artificially hard, since the model is penalized for correct reasoning. Class imbalance can make minority-class examples harder to learn because the model sees fewer of them. Intrinsic ambiguity, such as an image that genuinely looks like two different objects, represents irreducible difficulty. Researchers measure sample difficulty through proxies like training loss trajectories, prediction confidence, or the consistency of model predictions across multiple training runs.
Sample difficulty is operationalized in several influential training strategies. Curriculum learning, formalized around 2009, proposes presenting easier samples first and gradually introducing harder ones, mimicking how humans learn structured knowledge. Self-paced learning extends this idea by letting the model itself determine which samples to focus on based on current loss. Conversely, hard example mining — used prominently in object detection — deliberately oversamples difficult examples to force the model to improve on its weakest points. Focal loss, introduced for dense object detection, reweights the training objective so that hard, misclassified examples contribute more to the gradient.
Understanding sample difficulty has practical consequences beyond training efficiency. It informs data cleaning pipelines, helps identify mislabeled examples, and guides active learning strategies that query human annotators for the most informative uncertain samples. As datasets grow larger and more heterogeneous, automated difficulty estimation has become an important tool for diagnosing model weaknesses and improving robustness across diverse real-world conditions.