Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Logits

Logits

Raw, unnormalized scores output by a neural network before probability conversion.

Year: 2012Generality: 700
Back to Vocab

Logits are the raw, unnormalized numerical outputs produced by the final linear layer of a neural network before any activation function is applied. In a classification setting, the network outputs one logit per class, and these values represent the model's unconstrained "scores" for each possible label. Because they are unbounded — potentially ranging from large negative to large positive values — logits are not directly interpretable as probabilities, but they encode the relative confidence the model assigns to each class.

To convert logits into probabilities, a softmax function is typically applied, which exponentiates each logit and normalizes the results so they sum to one. This transformation preserves the relative ordering of scores while producing a valid probability distribution. In binary classification, a sigmoid function serves the same role for a single logit. Crucially, many modern training frameworks compute loss functions — such as cross-entropy — directly from logits rather than from softmax outputs, because working in log-space improves numerical stability and avoids floating-point underflow that can occur when probabilities become very small.

The term itself originates from logistic regression, where "logit" referred to the log-odds of a probability: log(p / (1 − p)). This connection is more than etymological — logistic regression can be viewed as a single-layer neural network, and its output before the sigmoid is precisely a logit in both the classical and modern senses. As deep learning scaled up through the 2010s, the term migrated naturally into the neural network vocabulary to describe the pre-activation outputs of any classification head.

Logits matter beyond training mechanics. In knowledge distillation, a student model is trained to match the logit distribution of a larger teacher model, capturing richer information than hard class labels alone. In temperature scaling and calibration, logits are divided by a temperature parameter to sharpen or soften the resulting probability distribution. In language models, the logit vector over the entire vocabulary is the direct output at each token position, making logits central to decoding strategies like beam search, top-k sampling, and nucleus sampling.

Related

Related

Log Odds
Log Odds

The logarithm of the odds ratio, linking probabilities to linear model outputs.

Generality: 694
Logistic Regression
Logistic Regression

A classification algorithm that models the probability of a binary outcome.

Generality: 838
Softmax Function
Softmax Function

Converts a vector of real numbers into a normalized probability distribution over classes.

Generality: 796
Log Likelihood
Log Likelihood

The logarithm of a likelihood function, simplifying probabilistic model optimization and parameter estimation.

Generality: 838
Cross-Entropy Loss
Cross-Entropy Loss

A loss function measuring divergence between predicted probability distributions and true labels.

Generality: 838
Node
Node

A basic computational unit in neural networks or graphs that processes information.

Generality: 795