A discrete category label assigned to data points in supervised classification problems.
In machine learning, a class is a discrete category or label that represents a possible output value in a classification task. When training a supervised model, each data point in the training set is associated with a class label, and the model learns to map input features to these labels. Classes can be binary — such as "spam" or "not spam" — or span multiple categories, as in image recognition systems that must distinguish among hundreds or thousands of object types. The number and nature of classes fundamentally shape the problem structure, the choice of loss function, and how model performance is evaluated.
Classification algorithms — including logistic regression, decision trees, support vector machines, and deep neural networks — all operate by learning decision boundaries that separate one class from another in feature space. In binary classification, a single boundary divides two classes; in multiclass settings, strategies such as one-vs-rest or softmax output layers extend this to handle many categories simultaneously. Class imbalance, where some categories have far fewer examples than others, is a common practical challenge that can bias a model toward majority classes and requires techniques like oversampling, undersampling, or adjusted loss weighting to address.
The concept of a class is foundational to nearly every applied ML system. Medical diagnosis models classify patient data into disease categories; fraud detection systems label transactions as legitimate or fraudulent; natural language models assign sentiment, intent, or topic labels to text. How classes are defined — their granularity, mutual exclusivity, and coverage — directly affects what a model can learn and how useful its predictions are in practice. Poorly defined or overlapping classes introduce label noise that degrades model accuracy, making thoughtful class design as important as algorithmic choice.