Importing your own books (Scribe Conversion)
Convert a PDF, EPUB, Word, text, or Markdown document into a native scribe — verbatim, AI-rewritten, or condensed.
What is Scribe Conversion?
Scribe Conversion lets you upload a document you own and turn it into a native scribe in your library. You can keep it word-for-word, have AI rewrite the whole book in a new style, or condense it down to a Mini-Scribe — whatever you need.
Imported scribes live alongside your created scribes in your library. They show an Imports chip in the library filters, and by default they are private to you. You decide if and when to publish.
Supported file types
- PDF (text or scanned — scanned PDFs run through OCR)
- EPUB (DRM-protected EPUBs are detected and rejected)
- DOCX (Word documents)
- TXT (plain text)
- MD (Markdown)
The maximum file size is 100 MB. The dropzone also rejects type-spoofed files (e.g. a PDF renamed to .epub) by checking the actual file signature.
The three conversion modes
When the parser is finished, you pick one of three modes on the control panel at /create/import:
Word-for-Word
A faithful, structure-only conversion. We detect chapter and section breaks but never rewrite the prose — your original wording is preserved. Cheapest option. Best when you already love the text and just want it inside CoffeeScribe so you can read, edit, and annotate it like any other scribe.
AI Rewrite
The whole book is rewritten section-by-section in your chosen style. Pick a preset chip (Modernise, Translate, Tighten, Make playful, etc.) or write your own custom prompt. You can also optionally pin a chapter count. Uses your selected AI model and costs tokens proportional to length.
Mini-Scribe
A condensed version of the source — same ideas, shorter form. Great for long technical books or dense academic texts. The cost preview shows roughly how many tokens this will take before you commit.
For very long books, the system shows a Mini-Scribe override label that nudges you toward Mini-Scribe so the cost stays reasonable. You can still pick whichever mode you want.
Walkthrough
- From
/create, click Upload your own. - Drag a file into the dropzone (or click to pick one). The dropzone shows the warning that images and figures are not preserved in v1 — only text comes through.
- Wait for parsing and (if needed) OCR to finish. You can leave the tab; you can also resume later from
/create/import?resume=<uploadId>. - Pick a mode. The cost preview updates as you choose.
- Click Continue. You land on the Structure Review page at
/create/import/[uploadId]/reviewwhere the AI has proposed chapter breaks. - Pick a structure approach (AI-proposed / Use as-is) and confirm. Manual split is grayed out in v1 — see below.
- Watch the live progress UI. Sections render as they're generated; the page also polls and the import is resume-safe (close the tab — the row completes server-side either way).
- When done, you land on the new scribe's hub page. The title has no "Chapter N:" prefix on imports.
What's preserved and what isn't
| Element | Preserved? |
|---|---|
| Body text | Yes |
| Chapter and section headings | Yes (auto-detected or AI-proposed) |
| Images, figures, diagrams | Not in v1 — flagged in the dropzone and on the review screen |
| Tables | Best-effort as plain text |
| Footnotes / endnotes | Best-effort inline |
| EPUB cover art | Not in v1 |
| Original PDF page numbers | Not preserved |
If you need images or figures, treat the import as a starting point — the Workspace lets you re-add them manually.
OCR for scanned PDFs
If your PDF is a scan (image-only, no embedded text), the parser falls back to Mistral OCR. OCR runs at upload time and the cost is billed against your token allowance — typically $0.30–$1.40 per book depending on page count. The cost preview shows the OCR charge before you commit so there's no surprise.
Note: scanned PDFs cost roughly double a text PDF of the same length because the conversion pipeline re-OCRs during the convert step (a v1 inefficiency we plan to fix).
Manual split — grayed out in v1
The Structure Review screen has three tabs: AI-proposed, Manual split, and Use as-is. Manual split is grayed out with "Coming in V2 — drag-to-highlight section markers on PDF preview." For now, you have two options:
- Pick AI-proposed and edit the chapter list in the Workspace after conversion (rename, add, delete chapters and sections).
- Pick Use as-is if the source already has clean chapter breaks the parser detected.
You can fine-tune any imported scribe's structure in the Workspace just like any other scribe.
Plagiarism, takedowns, and publishing
Imported scribes default to private. They aren't visible to other users until you publish them via the Publish to library flow, which requires a separate ToS-consent confirmation.
Be honest about what you're uploading. Coffeescribe's stance is direct:
If you did not write it, and AI did not generate it, then you uploaded it. It could be plagiarism. Even if AI then rewrote it, it could still be plagiarism. We have a full right to remove that content from public view — and we have a responsibility to do so.
AI rewriting does not launder copyright. An AI Rewrite or Mini-Scribe derivative of a copyrighted source is still derivative of that source — you remain responsible for the rights upstream. The Word-for-Word mode is obviously derivative; AI Rewrite and Mini-Scribe are too.
If a rights-holder reports a published imported scribe, we remove it from public view and notify you. Repeat infringement may end with account suspension or termination. See Terms of Service section 3.5 (Imported Scribes) for the full legal text and the take-down process.
Please only publish content you have the rights to, and aim for content that's powerful and worth reading.
"Convert to Scribe" from the Workspace
If you're working on a scribe that's grown too large for the Workspace to handle smoothly (the section count exceeds the editor's safe limit), you'll see a too-big banner with a Convert to Scribe button. This pre-fills the import flow with the current scribe as the source — useful when you want to re-shape a sprawling work into a cleaner Mini-Scribe or AI Rewrite.
Cost and tier availability
Scribe Conversion is available to every tier — Free, Pro, and Creator — and is not tier-gated. Your token allowance is the only limit. Each mode shows an estimated token cost before you commit:
- Word-for-Word — cheapest, just structure detection.
- AI Rewrite — proportional to word count × model cost.
- Mini-Scribe — proportional to word count, generally cheaper than AI Rewrite.
- OCR (scanned PDFs only) — billed at upload time, $0.30–$1.40 per book.
If you don't have enough tokens, the convert button is disabled with a Top up link.
Library and viewer behaviour
- The library has a new Imports filter chip. Imports also use a newest-first default sort so a fresh import is easy to find.
- On the scribe hub and Read view, the "Chapter N:" prefix is dropped for imports — most published books already include the chapter number in the title (e.g. "Chapter 1: The Beginning"), so duplicating it looked ugly.
- The Workspace Write Complete Scribe button is hidden for imports because the text already exists — you don't need to generate it again.
What's coming next
- Manual split with drag-to-highlight on PDF preview (V2).
- Image preservation for EPUB and DOCX sources.
- Cached parse on execute so scanned PDFs aren't OCR'd twice.
- Per-chapter Mini-Scribe so you can choose which chapters to compress.
Troubleshooting
- "File rejected — type mismatch" — the file's actual signature doesn't match its extension. Re-export the file from its source application.
- "DRM-protected EPUB" — the EPUB has digital rights management and cannot be parsed. Use the publisher's de-DRM tool if you legally own a copy and your jurisdiction allows it.
- "File too large" — the upload limit is 100 MB. Split the file or compress it.
- OCR took longer than expected — scanned PDFs over ~300 pages can take 5–10 minutes. The page polls every 5 seconds and is resume-safe.
- Stuck on "Generating…" — the live progress UI shows a stale-recovery panel after 60 seconds with no activity. Click Refresh to re-check the row's status; if it's still stuck, Mark failed to release and try again.