A prompting technique that guides language models through explicit intermediate reasoning steps.
Chain of Thought (CoT) prompting is a technique for eliciting complex reasoning from large language models by encouraging them to produce explicit intermediate steps before arriving at a final answer. Rather than asking a model to jump directly from a question to a conclusion, CoT prompting—either through few-shot examples that demonstrate step-by-step reasoning or through zero-shot instructions like "think step by step"—guides the model to decompose a problem into a sequence of logical sub-steps. This mirrors the scratchpad-style reasoning humans use when working through difficult problems.
The mechanism works because large language models are trained to predict plausible continuations of text. When prompted with examples that show reasoning chains, the model learns to generate similar intermediate text, and that generated reasoning in turn conditions the model's subsequent token predictions toward more accurate conclusions. The approach is particularly effective on tasks requiring arithmetic, symbolic manipulation, commonsense inference, and multi-hop question answering—domains where direct answer prediction frequently fails but where a correct reasoning trace reliably leads to a correct answer.
CoT prompting became a prominent research focus following the 2022 paper by Wei et al. at Google Brain, which demonstrated that the technique emerged as a capability only in sufficiently large models (roughly 100B+ parameters), suggesting it is an emergent property of scale. Subsequent work showed that even smaller models could benefit when fine-tuned on reasoning chain data, and that self-consistency—sampling multiple reasoning paths and taking a majority vote—further improved accuracy. Variants such as Tree of Thought and Program of Thought extended the paradigm by exploring branching reasoning structures or offloading computation to code interpreters.
The significance of CoT extends beyond benchmark performance. By making a model's reasoning process legible, it offers a degree of interpretability that direct-answer prompting lacks, allowing practitioners to identify where a model's logic goes wrong. This transparency is valuable for debugging, for building user trust, and for constructing more reliable AI pipelines in high-stakes domains such as medicine, law, and scientific research.