A scaling parameter that controls the randomness of a model's output distribution.
Temperature is a hyperparameter used in machine learning models—particularly generative and probabilistic ones—to control how sharply or broadly a probability distribution is shaped over possible outputs. It works by scaling the logits (raw, unnormalized scores) produced by a model before they are passed through a softmax function. Dividing logits by a temperature value T reshapes the resulting distribution: when T is less than 1, the distribution becomes more peaked, concentrating probability mass on the highest-scoring outputs and making the model behave more deterministically. When T is greater than 1, the distribution flattens, spreading probability more evenly across outputs and introducing greater randomness or diversity into the model's choices.
The mechanism is borrowed from statistical thermodynamics, where temperature governs the distribution of energy states in a physical system. In machine learning, the analogy holds: low temperature drives the system toward its lowest-energy (highest-probability) state, while high temperature allows it to explore a wider range of states. This principle appears in simulated annealing, Boltzmann machines, and reinforcement learning, where temperature controls the exploration-exploitation trade-off in policy sampling.
Temperature became especially prominent with the rise of large language models and neural text generation systems. When sampling from a language model, a temperature of 1.0 samples directly from the model's learned distribution, while values below 1.0 produce more predictable, conservative text and values above 1.0 yield more creative, varied—but potentially less coherent—outputs. This makes temperature a practical tool for tuning model behavior at inference time without any retraining.
Beyond text generation, temperature scaling is also used as a post-hoc calibration technique to improve the reliability of a classifier's confidence estimates. By learning an optimal temperature on a held-out validation set, practitioners can align predicted probabilities more closely with true empirical frequencies, making models better suited for downstream decision-making. Its simplicity, interpretability, and broad applicability make temperature one of the most widely used controls in deployed AI systems.