A search algorithm that uses randomized simulations to navigate large decision spaces.
Monte Carlo Tree Search (MCTS) is a probabilistic search algorithm designed to find optimal decisions in problems with enormous branching factors, where exhaustive search is computationally infeasible. Rather than evaluating every possible future state, MCTS intelligently focuses computational effort on the most promising regions of a decision tree by combining strategic exploration with random sampling. It has become a cornerstone technique in game-playing AI and sequential decision-making under uncertainty.
The algorithm operates through four repeating phases. In selection, the tree is traversed from the root using a policy that balances exploration of less-visited nodes against exploitation of high-performing ones — most commonly via the UCT (Upper Confidence Bound for Trees) formula. In expansion, one or more new child nodes are added to the tree. In simulation (also called rollout), a random or heuristic-guided playout proceeds from the new node to a terminal state. Finally, in backpropagation, the result of that simulation is propagated back up the tree, updating win/visit statistics at each ancestor node. Repeated over thousands of iterations, this process builds an increasingly accurate picture of which actions are most valuable.
MCTS rose to prominence in machine learning through its role in game-playing systems. Its 2006 application to the board game Go — a domain where classical minimax search with hand-crafted evaluation functions had long struggled — marked a turning point. The algorithm's impact was dramatically amplified in 2016 when DeepMind's AlphaGo combined MCTS with deep neural networks to defeat a world champion Go player, demonstrating that learned value and policy functions could replace random rollouts with far more accurate guidance.
Beyond games, MCTS has found application in reinforcement learning, combinatorial optimization, protein structure prediction, and planning for autonomous systems. Its strength lies in its model-based nature — it requires only a simulator or environment model — and its anytime property, meaning it can return a best-guess answer at any point and improve with additional computation. These qualities make it broadly applicable wherever sequential decisions must be made under uncertainty with a large action space.