Audioscribe
Generate audiobook narrations of your scribes with AI voices or your own voice
Overview
Coffeescribe can generate full audiobook narrations of your scribes using AI text-to-speech. Choose from built-in voice presets or upload your own voice clip. Each chapter is generated individually, so you can listen as chapters complete.
There are two audio features:
- Read Aloud (free) — instant browser-based text-to-speech in Reading Mode
- Audioscribe (Pro/Creator) — high-quality AI-generated audio with persistent storage
Read Aloud (Free)
Available to all users in Reading Mode:
- Open any scribe and enter Reading Mode
- Click the Read Aloud button in the header
- Use the bottom bar to play/pause, adjust speed (0.5x-2x), and choose a system voice
Read Aloud uses your browser's built-in speech synthesis. Quality varies by operating system — macOS voices tend to sound the most natural.
Audioscribe (Pro/Creator)
Getting Started
- Open any scribe and click the Audioscribe tab in the secondary navigation bar
- Click Generate on a chapter, or Generate All / Generate Remaining for multiple chapters
- Choose a voice provider and voice
- Confirm the credit cost and start generation
Generation takes approximately 30 seconds per page of content. When using Generate All, chapters are generated in parallel for faster completion. You can leave the page and come back — progress is saved automatically.
If some chapters already have audio, the button changes to Generate Remaining and only generates the missing chapters — completed audio is never overwritten.
If you change your mind during generation, click the Stop button to cancel remaining chapters. Completed chapters keep their audio — only pending chapters are cancelled.
Voice Providers
Coffeescribe offers multiple AI voice engines. When more than one provider is available, tabs appear at the top of the Generate Audio dialog with a description of each:
- OpenAI — High-quality voices with fast generation. Great all-round choice. Six built-in voices (Alloy, Echo, Fable, Nova, Onyx, Shimmer). Text is split into chunks and processed in parallel for speed.
- Chatterbox — Open-source and private. The only provider that supports voice cloning from your own recordings. Seven preset voices plus custom voice upload. Generation is slower but your audio data stays private.
Each provider has its own credit rate (shown in the cost estimate before you confirm).
Voice Presets (Chatterbox)
When using Chatterbox, you can choose from built-in voice presets:
- Random — varies with each generation
- British Male — classic British narrator
- Warm Female — calm, warm American narrator
- Deep Male — deep, authoritative American narrator
- Bright Female — energetic, clear American narrator
- Gentle Male — soft American storytelling voice
- Neutral — balanced, professional narrator
Click the play button next to any preset to preview how it sounds before generating.
Custom Voice Upload (Chatterbox only)
Pro and Creator users can upload their own voice clip for AI voice cloning. This feature is only available with the Chatterbox provider:
- In the Generate Audio dialog, select the Chatterbox tab
- Expand "Record your voice"
- Read the provided script aloud (this helps the AI capture your voice accurately)
- Record using your device's built-in tools:
- Mac: QuickTime Player (File > New Audio Recording)
- Windows: Voice Recorder app
- Linux: Audacity or the
arecordcommand
- Upload the recording (WAV, MP3, or M4A accepted, max 25MB)
- Your voice appears as a selectable option labelled "My Voice"
Tips for the best results:
- Record in a quiet room with minimal background noise
- Speak naturally at your normal pace
- Length: 10–180 seconds. 60–120 seconds works best — longer isn't always better and clips over 180 seconds are rejected
- The provided reading script covers a good range of sounds for voice capture
Note: Voice cloning quality depends on the AI model and may not perfectly match your voice. The AI captures your general tone and cadence rather than producing an exact replica.
Listening to Your Audio
Once chapters are generated:
- Use the built-in audio player with play/pause, seek, and speed controls
- Skip forward/back 15 seconds with the skip buttons
- Chapters auto-advance — when one finishes, the next starts automatically
- Download individual chapters as audio files
- Click any completed chapter card to jump to it in the player
Resume where you left off. Long chapters can run 30+ minutes. When you return to a chapter you started, the player automatically seeks to the last position you were at — across devices and sessions. Your progress is private (own-rows only) and follows your account, not your browser.
Sharing an audiobook
When your scribe is public, the Share button on the audiobook page copies a direct link. Anyone you send the link to can open it and play the audiobook without signing in — full playback controls (play/pause, seek, speed, chapter switching) all work. Any other action (Download, Generate, Delete, etc.) shows a friendly sign-in prompt.
The shared link is stable — it keeps working even if you toggle individual chapters private later. Existing listeners keep their cached URL, so share intentionally.
Listening on mobile
Playback works great on phones. Generating an audiobook on mobile is more fragile — if you lock your phone or switch tabs during a long generation, the browser may throttle the background work and the UI can appear stuck. You'll see a warning banner reminding you that generation works best on a laptop or desktop. Playback-only users don't see this banner.
If a generation looks stalled on mobile, pull down to refresh the page — the visibility-aware refresh will catch up any chapters that completed while the tab was backgrounded.
Public Audiobooks
If your scribe is public, other Pro/Creator users can generate their own audiobook version of it. Public audiobooks show attribution ("Audio by [creator name]") so listeners know who created each version.
As the scribe owner, you control visibility of your generated audio with the globe/lock toggle on each chapter. Public audio is locked for 24 hours after creation to prevent misuse.
Credits and Costs
Audioscribe generation uses credits from your account balance:
- Cost is based on text length and the voice provider you choose
- OpenAI: approximately 3,000 credits per 100 characters
- Chatterbox: approximately 2,500 credits per 100 characters
- The exact credit cost is shown before you confirm generation
- Credits are deducted after successful generation
Monthly Limits
- Pro: 1 audiobook per month
- Creator: 3 audiobooks per month
The limit resets at the start of each billing month.
Troubleshooting
Generation takes a long time
Each chapter takes roughly 30 seconds per page. A 10-chapter scribe may take 5-10 minutes total. You can navigate away and come back — progress updates in real time.
Audio quality issues
The AI voice may occasionally produce artifacts like brief pauses or slight pronunciation irregularities. Regenerating a chapter with the same or different voice often improves the result.
Custom voice doesn't sound like me
AI voice cloning from a short clip produces an approximation of your voice, not a perfect copy. For best results, use a longer recording (up to 60 seconds) in a quiet environment. The technology is improving rapidly and will get better over time.
When edits in the Workspace affect existing audio
The audiobook is generated per chapter — each chapter becomes one TTS file that's saved against its chapter ID. If you edit a chapter's text after generating its audio, the audio doesn't auto-update. Here's what happens for each kind of edit:
| What you change in the Workspace | What happens to existing audio | What to do |
|---|---|---|
| Reorder sections within a chapter (drag the grip handle in the sidebar) | Audio still plays — it stays attached to that chapter — but the spoken order now matches the OLD section order, not the edited one. | Regenerate that chapter if you want the audio to match. |
| Combine adjacent sections (⋯ → "Combine with section above/below") | Audio still plays. The two sections were probably read with a small natural pause between paragraphs in the original audio — that pause stays. Content is unchanged. | Optional regenerate for perfect rhythm match; not required. |
| Delete a section | Audio still plays — and it still includes the now-deleted section's narration. | Regenerate the chapter so the deleted text isn't read aloud. |
| Reorder chapters (drag a chapter handle) | Each chapter's audio is unchanged. The audiobook player simply plays chapters in the new order — which is what you wanted when you reordered. | ✅ Nothing to do. |
| Edit section content in the editor | Audio still plays the OLD text. | Regenerate the chapter when you're happy with the new text. |
Rule of thumb: any edit within a chapter (text, section order, section count) leaves that chapter's audio stale until you regenerate. Edits between chapters (reorder) are free.