Minimalist reasoning using fewer tokens than chain-of-thought for efficient intermediate reasoning
Chain of Draft is a lightweight reasoning technique that generates concise intermediate reasoning steps—sketchier and more compressed than traditional chain-of-thought prompting—to guide model outputs while minimizing token consumption. Where chain-of-thought explicitly writes out full reasoning, "step 1: I observe..., step 2: I calculate..., step 3: I conclude," chain of draft produces abbreviated reasoning traces: sparse notation, key insights only, implicit jumps. The model still benefits from having thought through intermediate states without the verbosity that inflates token usage.
The motivation is efficiency. Chain-of-thought improved reasoning quality but at the cost of more tokens per response—both in generation and in thinking/verification overhead. Chain of draft asks: can we get most of the reasoning benefits with less tokenization? Early results suggest yes. By training or prompting models to produce minimal intermediate sketches rather than verbose logical chains, accuracy on complex tasks remains competitive while throughput and cost improve measurably. A model might draft an outline or key constraint before solving a math problem rather than explaining every algebraic step.
This fits into a broader trend of optimizing inference efficiency as reasoning becomes standard. As models handle harder problems and users demand faster responses, techniques like chain of draft, tree-of-thought pruning, and thinking token budgets all compete to deliver reasoning gains without proportional cost increases. Chain of draft is particularly practical for applications where reasoning is a means to an end—code generation, planning, retrieval-augmented tasks—rather than the primary deliverable. The tradeoff is interpretability: minimal sketches are harder for humans to follow, but they remain useful as scaffolding for the model's own reasoning.