Token Costs by Model
How token costs vary between different AI models
How Model Pricing Works
Each AI model has its own pricing for input tokens (what you send) and output tokens (what the AI generates). Prices are set by the model providers and displayed per million tokens.
Model Pricing Tiers
The CoffeeScribe Model (Free + Pro)
- The curated default we maintain
- Locked for Free and Pro — no picker shown
- Tuned for the best balance of quality and cost
- The underlying model evolves over time; the brand label stays the same — see What is the CoffeeScribe Model?
Budget Alternatives (Creator picker)
- Lowest cost per token
- Good for drafts, exploration, and high-volume creation
- Available in Creator's picker (Recommended → cheaper alternatives, or in the advanced disclosure)
- Examples: Google Gemini Flash variants, Llama 4 Maverick, Grok 3 Mini
Standard Models (Creator picker)
- Moderate cost per token
- Balanced quality and affordability
- Available in Creator's Recommended set
- Good for most use cases that need something specific
Premium Models (Creator picker)
- Highest cost per token
- Best writing quality, nuance, and accuracy
- Available in Creator's Recommended set
- Best for professional or polished content
- Examples: Claude Sonnet 4, GPT-4o
Estimating Costs
In the Workspace
When using the Workspace, it shows estimated token costs before you generate:
- Number of tokens the operation will use
- Your remaining token balance
General Estimates
| Scribe Length | Budget Model | Standard Model | Premium Model |
|---|---|---|---|
| Short (5 chapters) | Small slice of allowance | A bit more | A bit more again |
| Medium (10 chapters) | Modest slice | Larger slice | Larger still |
| Long (20 chapters) | Substantial slice | Bigger slice | Largest slice |
Premium models can cost several times more than budget models for the same scribe. Actual usage varies based on content complexity and section length. The Workspace shows a per-action estimate before you confirm so you always know what you're spending.
Saving Tokens
- Choose the right model for the task — Use budget models for drafts, premium for final content
- Edit manually when possible — Small text changes don't need AI regeneration
- Be specific with prompts — Clear instructions produce better results on the first try, reducing rewrites
- Choose the right generation mode — Lightning is cheapest (no cross-section context). Lightning Medium and Pour Over add context progressively. Pour Over Slow Brew sends the full scribe as context for each section.
- Understand the cost gap — The gap between Lightning and Slow Brew grows with more chapters and pricier models. A 25-chapter scribe with Claude Sonnet could cost 3-4x more in Pour Over Slow Brew vs Lightning.
Input vs Output Tokens
- Input tokens are typically cheaper than output tokens
- A detailed prompt (more input tokens) often produces better results, saving on output tokens from rewrites
- The scribe's existing content is sent as context (input), so longer scribes use more input tokens per section
See Also
- Understanding Tokens — What tokens are and how they work
- Choosing an AI Model — How to pick the right model