A measurable property of data used as input for machine learning models.
In machine learning, an attribute is a measurable property or characteristic of an entity that serves as input to a model. Attributes define the dimensions of a dataset — each row represents an instance (such as a patient, image, or transaction), and each column represents an attribute (such as age, pixel intensity, or purchase amount). They may be numerical (continuous or discrete), categorical (nominal or ordinal), or binary, and together they form the feature space within which a model learns patterns and makes predictions.
The quality and composition of attributes directly determine what a model can learn. Irrelevant or redundant attributes introduce noise and can degrade performance, while informative, well-scaled attributes accelerate convergence and improve generalization. This is why attribute selection — identifying which properties are most predictive — and attribute engineering — transforming raw data into more useful representations — are central activities in any machine learning pipeline. Techniques such as mutual information, correlation analysis, and recursive feature elimination help practitioners identify which attributes carry the most signal.
Attributes are closely related to, and often used interchangeably with, the term feature, though subtle distinctions exist across communities. In classical databases and symbolic AI, "attribute" is the preferred term, while in statistical learning and deep learning, "feature" is more common. In decision tree learning specifically, the algorithm repeatedly selects the attribute that best splits the data at each node, making attribute choice the core mechanism driving model structure.
Understanding attributes matters beyond model accuracy. In high-stakes domains like healthcare, finance, and criminal justice, certain attributes — such as race or gender — raise fairness and legal concerns when used as model inputs. Practitioners must therefore consider not just predictive power but also the ethical implications of which attributes are included, excluded, or proxied by other variables. Attribute-level analysis thus sits at the intersection of technical performance and responsible AI design.