A benchmark for whether a machine's conversation is indistinguishable from a human's.
The Turing Test is a benchmark for machine intelligence proposed by Alan Turing in his 1950 paper "Computing Machinery and Intelligence." Rather than asking whether machines can truly "think" — a question Turing considered philosophically intractable — he reframed the problem as an imitation game: if a human evaluator, communicating via text with both a machine and another human, cannot reliably identify which is which, the machine is said to have passed the test. By grounding the question of intelligence in observable linguistic behavior rather than internal mental states, Turing offered a pragmatic and empirically testable criterion at a time when AI had no formal definition.
The mechanics of the test are deliberately simple. A judge exchanges natural language messages with two hidden participants — one human, one machine — and must determine which is which. The machine's goal is to produce responses convincing enough to fool the judge. This framing places natural language understanding, contextual reasoning, and the ability to simulate human-like responses at the center of what it means for a machine to be intelligent. Crucially, the test says nothing about how the machine achieves this — only whether it succeeds.
Despite its historical importance, the Turing Test has attracted substantial criticism. Philosophers like John Searle argued, via the Chinese Room thought experiment, that passing the test demonstrates symbol manipulation rather than genuine understanding. Others note that the test is gameable — a sufficiently clever but narrow system might fool judges without possessing general intelligence. Modern AI systems, including large language models, can pass informal versions of the test in many contexts, yet researchers broadly agree this does not resolve deeper questions about machine cognition or consciousness.
The Turing Test remains a touchstone in AI discourse less because it is a rigorous scientific metric and more because it crystallized the central question of the field: what does it mean for a machine to behave intelligently? It shifted AI research toward behavioral and functional criteria, influencing decades of work in natural language processing, dialogue systems, and cognitive modeling. Its legacy is as much philosophical as technical, continuing to frame debates about the nature of mind and the ambitions of artificial intelligence.