Skip to main content

Envisioning is an emerging technology research institute and advisory.

LinkedInInstagramGitHub

2011 — 2026

research
  • Reports
  • Newsletter
  • Methodology
  • Origins
  • Vocab
services
  • Research Sessions
  • Signals Workspace
  • Bespoke Projects
  • Use Cases
  • Signal Scanfree
  • Readinessfree
impact
  • ANBIMAFuture of Brazilian Capital Markets
  • IEEECharting the Energy Transition
  • Horizon 2045Future of Human and Planetary Security
  • WKOTechnology Scanning for Austria
audiences
  • Innovation
  • Strategy
  • Consultants
  • Foresight
  • Associations
  • Governments
resources
  • Pricing
  • Partners
  • How We Work
  • Data Visualization
  • Multi-Model Method
  • FAQ
  • Security & Privacy
about
  • Manifesto
  • Community
  • Events
  • Support
  • Contact
  • Login
ResearchServicesPricingPartnersAbout
ResearchServicesPricingPartnersAbout
  1. Home
  2. Vocab
  3. Price Per Token

Price Per Token

The unit cost charged for each token processed by a language model API.

Year: 2020Generality: 293
Back to Vocab

Price per token is the billing unit used by commercial large language model (LLM) providers to charge for API access. Because LLMs process text as sequences of tokens — subword units produced by a tokenizer, typically representing three to four characters on average in English — the total cost of any API call is determined by multiplying the token count by the provider's rate. Most providers distinguish between input tokens (the prompt) and output tokens (the generated response), often charging more for generation since it is computationally heavier.

The mechanics of token-based pricing are tightly coupled to how transformers operate. Every token in a sequence requires attention computations across the full context window, meaning longer prompts and responses consume proportionally more GPU memory and compute time. Providers translate this resource consumption into per-token rates, typically quoted in dollars per million tokens. Because tokenization is language- and model-dependent — a Chinese character may map to multiple tokens while a common English word maps to one — the effective cost per word or per sentence varies considerably across languages and use cases.

This pricing model became commercially significant with the launch of the OpenAI API in 2020 and accelerated rapidly after the release of GPT-3.5 and GPT-4 in 2022–2023, when enterprise adoption drove serious cost optimization efforts. Practitioners building production applications must account for token costs when designing prompts, choosing context window sizes, and selecting between model tiers. Techniques such as prompt compression, caching repeated context, and batching requests have emerged specifically to reduce token expenditure.

Understanding price per token matters beyond simple budgeting. It shapes architectural decisions — whether to use retrieval-augmented generation instead of stuffing documents into a long context, or whether to fine-tune a smaller model rather than rely on few-shot prompting with a large one. As model capabilities have improved and competition among providers has intensified, per-token costs have fallen dramatically, broadening the range of economically viable AI applications and making token efficiency a core concern in LLM engineering.

Related

Related

Token
Token

The basic unit of text that language models read, process, and generate.

Generality: 720
Token Processing
Token Processing

Segmenting text into discrete units that serve as inputs for NLP models.

Generality: 720
Tokenmaxxing
Tokenmaxxing

Maximizing useful information density within a prompt's token budget for better LLM outputs.

Generality: 94
Thinking Tokens
Thinking Tokens

Hidden reasoning tokens consumed during inference for internal step-by-step reasoning invisible to users

Generality: 605
Thought Token
Thought Token

Special tokens that give language models explicit space to reason before answering.

Generality: 450
Next Token Prediction
Next Token Prediction

A training objective where models learn to predict the next token in a sequence.

Generality: 794