Audioscribe

Generate audiobook narrations of your scribes with AI voices or your own voice

Overview

Coffeescribe can generate full audiobook narrations of your scribes using AI text-to-speech. Choose from built-in voice presets or upload your own voice clip. Each chapter is generated individually, so you can listen as chapters complete.

There are two audio features:

Read Aloud (free) — instant browser-based text-to-speech in Reading Mode
Audioscribe (token-billed) — high-quality AI-generated audio with persistent storage, available to every signed-in user

Read Aloud (Free)

Available to all users in Reading Mode:

Open any scribe and enter Reading Mode
Click the Read Aloud button in the header
Use the bottom bar to play/pause, adjust speed (0.5x-2x), and choose a system voice

Read Aloud uses your browser's built-in speech synthesis. Quality varies by operating system — macOS voices tend to sound the most natural.

Audioscribe (every signed-in user, token-billed)

Getting Started

Open any scribe and click the Audioscribe tab in the secondary navigation bar
Click Generate on a chapter, or Generate All / Generate Remaining for multiple chapters
Choose a voice provider and voice
Confirm the credit cost and start generation

Generation takes approximately 30 seconds per page of content. When using Generate All, chapters are generated in parallel for faster completion. You can leave the page and come back — progress is saved automatically.

If some chapters already have audio, the button changes to Generate Remaining and only generates the missing chapters — completed audio is never overwritten.

If you change your mind during generation, click the Stop button to cancel remaining chapters. Completed chapters keep their audio — only pending chapters are cancelled.

Voice Providers

Coffeescribe offers multiple AI voice engines. When more than one provider is available, tabs appear at the top of the Generate Audio dialog with a description of each:

OpenAI — High-quality voices with fast generation. Great all-round choice. Six built-in voices (Alloy, Echo, Fable, Nova, Onyx, Shimmer). Text is split into chunks and processed in parallel for speed.
Chatterbox — Open-source and private. The only provider that supports voice cloning from your own recordings. Seven preset voices plus custom voice upload. Generation is slower but your audio data stays private.

Each provider has its own credit rate (shown in the cost estimate before you confirm).

Voice Presets (Chatterbox)

When using Chatterbox, you can choose from built-in voice presets:

Random — varies with each generation
British Male — classic British narrator
Warm Female — calm, warm American narrator
Deep Male — deep, authoritative American narrator
Bright Female — energetic, clear American narrator
Gentle Male — soft American storytelling voice
Neutral — balanced, professional narrator

Click the play button next to any preset to preview how it sounds before generating.

Custom Voice Upload (Chatterbox only)

Any signed-in user can upload their own voice clip for AI voice cloning. This feature is only available with the Chatterbox provider:

In the Generate Audio dialog, select the Chatterbox tab
Expand "Record your voice"
Read the provided script aloud (this helps the AI capture your voice accurately)
Record using your device's built-in tools:
- Mac: QuickTime Player (File > New Audio Recording)
- Windows: Voice Recorder app
- Linux: Audacity or the arecord command
Upload the recording (WAV, MP3, or M4A accepted, max 25MB)
Your voice appears as a selectable option labelled "My Voice"

Tips for the best results:

Record in a quiet room with minimal background noise
Speak naturally at your normal pace
Length: 10–180 seconds. 60–120 seconds works best — longer isn't always better and clips over 180 seconds are rejected
The provided reading script covers a good range of sounds for voice capture

Note: Voice cloning quality depends on the AI model and may not perfectly match your voice. The AI captures your general tone and cadence rather than producing an exact replica.

Listening to Your Audio

Once chapters are generated:

Use the built-in audio player with play/pause, seek, and speed controls
Skip forward/back 15 seconds with the skip buttons
Chapters auto-advance — when one finishes, the next starts automatically
Download individual chapters as audio files
Click any completed chapter card to jump to it in the player

Resume where you left off. Long chapters can run 30+ minutes. When you return to a chapter you started, the player automatically seeks to the last position you were at — across devices and sessions. Your progress is private (own-rows only) and follows your account, not your browser.

Sharing an audiobook

When your scribe is public, the Share button on the audiobook page copies a direct link. Anyone you send the link to can open it and play the audiobook without signing in — full playback controls (play/pause, seek, speed, chapter switching) all work. Any other action (Download, Generate, Delete, etc.) shows a friendly sign-in prompt.

The shared link is stable — it keeps working even if you toggle individual chapters private later. Existing listeners keep their cached URL, so share intentionally.

Listening on mobile

Playback works great on phones. Generating an audiobook on mobile is more fragile — if you lock your phone or switch tabs during a long generation, the browser may throttle the background work and the UI can appear stuck. You'll see a warning banner reminding you that generation works best on a laptop or desktop. Playback-only users don't see this banner.

If a generation looks stalled on mobile, pull down to refresh the page — the visibility-aware refresh will catch up any chapters that completed while the tab was backgrounded.

Public Audiobooks

If your scribe is public, other signed-in users can generate their own audiobook version of it. Public audiobooks show attribution ("Audio by [creator name]") so listeners know who created each version.

As the scribe owner, you control visibility of your generated audio with the globe/lock toggle on each chapter. Public audio is locked for 24 hours after creation to prevent misuse.

Credits and Costs

Audioscribe generation uses credits from your account balance:

Cost is based on text length and the voice provider you choose
OpenAI: approximately 3,000 credits per 100 characters
Chatterbox: approximately 2,500 credits per 100 characters
The exact credit cost is shown before you confirm generation
Credits are deducted after successful generation

No Monthly Limit

There's no monthly cap on audiobook generation. Your token balance is the only limit — generate as many chapters and audiobooks as your balance covers, any time.

Troubleshooting

Generation takes a long time

Each chapter takes roughly 30 seconds per page. A 10-chapter scribe may take 5-10 minutes total. You can navigate away and come back — progress updates in real time.

Audio quality issues

The AI voice may occasionally produce artifacts like brief pauses or slight pronunciation irregularities. Regenerating a chapter with the same or different voice often improves the result.

Custom voice doesn't sound like me

AI voice cloning from a short clip produces an approximation of your voice, not a perfect copy. For best results, use a longer recording (up to 60 seconds) in a quiet environment. The technology is improving rapidly and will get better over time.

When edits in the Workspace affect existing audio

The audiobook is generated per chapter — each chapter becomes one TTS file that's saved against its chapter ID. If you edit a chapter's text after generating its audio, the audio doesn't auto-update. Here's what happens for each kind of edit:

What you change in the Workspace	What happens to existing audio	What to do
Reorder sections within a chapter (drag the grip handle in the sidebar)	Audio still plays — it stays attached to that chapter — but the spoken order now matches the OLD section order, not the edited one.	Regenerate that chapter if you want the audio to match.
Combine adjacent sections (⋯ → "Combine with section above/below")	Audio still plays. The two sections were probably read with a small natural pause between paragraphs in the original audio — that pause stays. Content is unchanged.	Optional regenerate for perfect rhythm match; not required.
Delete a section	Audio still plays — and it still includes the now-deleted section's narration.	Regenerate the chapter so the deleted text isn't read aloud.
Reorder chapters (drag a chapter handle)	Each chapter's audio is unchanged. The audiobook player simply plays chapters in the new order — which is what you wanted when you reordered.	✅ Nothing to do.
Edit section content in the editor	Audio still plays the OLD text.	Regenerate the chapter when you're happy with the new text.

Rule of thumb: any edit within a chapter (text, section order, section count) leaves that chapter's audio stale until you regenerate. Edits between chapters (reorder) are free.