The unit cost charged for each token processed by a language model API.
Price per token is the billing unit used by commercial large language model (LLM) providers to charge for API access. Because LLMs process text as sequences of tokens — subword units produced by a tokenizer, typically representing three to four characters on average in English — the total cost of any API call is determined by multiplying the token count by the provider's rate. Most providers distinguish between input tokens (the prompt) and output tokens (the generated response), often charging more for generation since it is computationally heavier.
The mechanics of token-based pricing are tightly coupled to how transformers operate. Every token in a sequence requires attention computations across the full context window, meaning longer prompts and responses consume proportionally more GPU memory and compute time. Providers translate this resource consumption into per-token rates, typically quoted in dollars per million tokens. Because tokenization is language- and model-dependent — a Chinese character may map to multiple tokens while a common English word maps to one — the effective cost per word or per sentence varies considerably across languages and use cases.
This pricing model became commercially significant with the launch of the OpenAI API in 2020 and accelerated rapidly after the release of GPT-3.5 and GPT-4 in 2022–2023, when enterprise adoption drove serious cost optimization efforts. Practitioners building production applications must account for token costs when designing prompts, choosing context window sizes, and selecting between model tiers. Techniques such as prompt compression, caching repeated context, and batching requests have emerged specifically to reduce token expenditure.
Understanding price per token matters beyond simple budgeting. It shapes architectural decisions — whether to use retrieval-augmented generation instead of stuffing documents into a long context, or whether to fine-tune a smaller model rather than rely on few-shot prompting with a large one. As model capabilities have improved and competition among providers has intensified, per-token costs have fallen dramatically, broadening the range of economically viable AI applications and making token efficiency a core concern in LLM engineering.