Aggregated group judgments often outperform individual experts in prediction tasks.
Wisdom of the crowd is the principle that aggregating diverse, independent judgments from a large group of people frequently produces more accurate predictions or decisions than any single expert could achieve alone. The effect emerges when individual errors are random and uncorrelated — they cancel out in aggregate, leaving a signal that reflects genuine collective knowledge. For this to work, the crowd must be sufficiently diverse, its members must form opinions independently, and there must be a mechanism to aggregate their inputs effectively. When these conditions hold, the collective estimate tends to converge on the true answer with remarkable reliability.
In machine learning and AI, this principle underpins several important techniques. Crowdsourcing platforms like Amazon Mechanical Turk use crowd wisdom to generate labeled training data at scale, relying on majority voting or weighted aggregation to produce high-quality ground truth from imperfect individual annotations. Ensemble methods in machine learning — such as random forests and boosting — are a computational analog: they combine many weak learners whose errors are partially independent, yielding a stronger collective predictor. Collaborative filtering in recommendation systems similarly aggregates the preferences of many users to surface items that any individual user is likely to enjoy.
The concept gained renewed attention in AI contexts following James Surowiecki's 2004 book and the subsequent explosion of crowdsourcing as a practical data-collection strategy around 2006–2008. Researchers began formally studying how to elicit, weight, and aggregate crowd judgments to maximize accuracy — leading to techniques like prediction markets, Bayesian truth serum, and RLHF (reinforcement learning from human feedback), where crowd-sourced human preferences guide large language model alignment.
Wisdom of the crowd matters for AI because it offers a scalable path to knowledge that is difficult to encode algorithmically. Human perception, common sense, and subjective judgment remain hard to replicate in models, but crowds can supply these qualities cheaply and at scale. The key challenge is managing crowd quality: adversarial participants, correlated errors, and demographic homogeneity can all undermine the effect, making robust aggregation strategies an active area of research in human-in-the-loop AI systems.