Token Costs by Model

How token costs vary between different AI models

How Model Pricing Works

Each AI model has its own pricing for input tokens (what you send) and output tokens (what the AI generates). Prices are set by the model providers and displayed per million tokens.

Model Pricing Tiers

Every signed-in user sees the same picker — nothing below is gated by account type. The "tiers" here describe cost bands, not who can access them.

The CoffeeScribe Model (recommended default)

The curated default we maintain, pinned at the top of the picker and pre-selected
Tuned for the best balance of quality and cost
The underlying model evolves over time; the brand label stays the same — see What is the CoffeeScribe Model?

Budget Alternatives

Lowest cost per token
Good for drafts, exploration, and high-volume creation
Available to everyone in the picker (Recommended → cheaper alternatives, or in the advanced disclosure)
Examples: Google Gemini Flash variants, Llama 4 Maverick, Grok 3 Mini

Standard Models

Moderate cost per token
Balanced quality and affordability
Available to everyone in the picker's Recommended set
Good for most use cases that need something specific

Premium Models

Highest cost per token
Best writing quality, nuance, and accuracy
Available to everyone in the picker's Recommended set
Best for professional or polished content
Examples: Claude Sonnet 4, GPT-4o

Estimating Costs

In the Workspace

When using the Workspace, it shows estimated token costs before you generate:

Number of tokens the operation will use
Your remaining token balance

General Estimates

Scribe Length	Budget Model	Standard Model	Premium Model
Short (5 chapters)	Small slice of allowance	A bit more	A bit more again
Medium (10 chapters)	Modest slice	Larger slice	Larger still
Long (20 chapters)	Substantial slice	Bigger slice	Largest slice

Premium models can cost several times more than budget models for the same scribe. Actual usage varies based on content complexity and section length. The Workspace shows a per-action estimate before you confirm so you always know what you're spending.

Saving Tokens

Choose the right model for the task — Use budget models for drafts, premium for final content
Edit manually when possible — Small text changes don't need AI regeneration
Be specific with prompts — Clear instructions produce better results on the first try, reducing rewrites
Choose the right generation mode — Lightning is cheapest (no cross-section context). Lightning Medium and Pour Over add context progressively. Pour Over Slow Brew sends the full scribe as context for each section.
Understand the cost gap — The gap between Lightning and Slow Brew grows with more chapters and pricier models. A 25-chapter scribe with Claude Sonnet could cost 3-4x more in Pour Over Slow Brew vs Lightning.

Input vs Output Tokens

Input tokens are typically cheaper than output tokens
A detailed prompt (more input tokens) often produces better results, saving on output tokens from rewrites
The scribe's existing content is sent as context (input), so longer scribes use more input tokens per section