Token Costs by Model

How token costs vary between different AI models

How Model Pricing Works

Each AI model has its own pricing for input tokens (what you send) and output tokens (what the AI generates). Prices are set by the model providers and displayed per million tokens.

Model Pricing Tiers

The CoffeeScribe Model (Free + Pro)

  • The curated default we maintain
  • Locked for Free and Pro — no picker shown
  • Tuned for the best balance of quality and cost
  • The underlying model evolves over time; the brand label stays the same — see What is the CoffeeScribe Model?

Budget Alternatives (Creator picker)

  • Lowest cost per token
  • Good for drafts, exploration, and high-volume creation
  • Available in Creator's picker (Recommended → cheaper alternatives, or in the advanced disclosure)
  • Examples: Google Gemini Flash variants, Llama 4 Maverick, Grok 3 Mini

Standard Models (Creator picker)

  • Moderate cost per token
  • Balanced quality and affordability
  • Available in Creator's Recommended set
  • Good for most use cases that need something specific

Premium Models (Creator picker)

  • Highest cost per token
  • Best writing quality, nuance, and accuracy
  • Available in Creator's Recommended set
  • Best for professional or polished content
  • Examples: Claude Sonnet 4, GPT-4o

Estimating Costs

In the Workspace

When using the Workspace, it shows estimated token costs before you generate:

  • Number of tokens the operation will use
  • Your remaining token balance

General Estimates

Scribe LengthBudget ModelStandard ModelPremium Model
Short (5 chapters)Small slice of allowanceA bit moreA bit more again
Medium (10 chapters)Modest sliceLarger sliceLarger still
Long (20 chapters)Substantial sliceBigger sliceLargest slice

Premium models can cost several times more than budget models for the same scribe. Actual usage varies based on content complexity and section length. The Workspace shows a per-action estimate before you confirm so you always know what you're spending.

Saving Tokens

  • Choose the right model for the task — Use budget models for drafts, premium for final content
  • Edit manually when possible — Small text changes don't need AI regeneration
  • Be specific with prompts — Clear instructions produce better results on the first try, reducing rewrites
  • Choose the right generation mode — Lightning is cheapest (no cross-section context). Lightning Medium and Pour Over add context progressively. Pour Over Slow Brew sends the full scribe as context for each section.
  • Understand the cost gap — The gap between Lightning and Slow Brew grows with more chapters and pricier models. A 25-chapter scribe with Claude Sonnet could cost 3-4x more in Pour Over Slow Brew vs Lightning.

Input vs Output Tokens

  • Input tokens are typically cheaper than output tokens
  • A detailed prompt (more input tokens) often produces better results, saving on output tokens from rewrites
  • The scribe's existing content is sent as context (input), so longer scribes use more input tokens per section

See Also