When we tell authors that CourseBud generates narrated slides from their book, the first question is usually the right one: does it actually sound like me?
The honest answer is: almost, mostly, and the gap is narrower than you'd expect but wider than a breathless demo video would suggest. Here's how the pipeline actually works, what's solved, and what we're still improving.
Step 1: writing the script
Before any audio exists, we write the narration text. We take the relevant section of your book — the content the AI decided belongs in this lesson — and ask a large language model to rewrite it as 1-2 minutes of spoken prose per slide. The key word is rewrite, not read. Written prose and spoken prose are structurally different. Long sentences work on the page; shorter rhythmic sentences work in the ear. Citations interrupt flow when read aloud; your point lands better without them.
This is the step where "sounds like you" is mostly won or lost. The LLM sees your actual words and tries to preserve voice while smoothing cadence for audio. It gets the 80th percentile of this right. The remaining 20% is why we make every narration script editable, sentence by sentence. Authors who spend twenty minutes per lesson tweaking narration end up with dramatically better courses. We'd rather be transparent about that than imply the AI does everything.
Step 2: synthesizing the voice
Once the script is approved, we generate audio using ElevenLabs Flash v2.5. These are professional-grade synthetic voices — warm, natural cadence, real breaths, appropriate emphasis on key words. For most listeners, blind-tested against a real narrator, the gap is small enough that they wouldn't flag it as AI.
It isn't your voice, though. You pick from a set of professionally directed voices. Some authors love this — they don't want to record four hours of narration and their reader doesn't mind. Other authors really do want their own voice, and for them v1 isn't the right fit yet.
What's coming
Voice cloning — recording a short sample of your actual voice and having the course narrate in it — is on our roadmap. The technology exists; we're waiting until we can ship it with enough quality control that it doesn't sound uncanny. Overclaiming here would be easy and dishonest. We'd rather wait.
Why lazy generation matters
One practical note on the pipeline: we don't generate all the audio upfront. The narration is created the first time a student plays a slide, then cached forever. This keeps your course's initial build fast and keeps costs honest — we're only paying for audio that's actually listened to. If you edit a slide's narration text, the cache invalidates and regenerates on the next play. You never hit a "rebuilding audio, please wait" wall.
The part we want you to own
The real craft here isn't the synthetic voice. It's the script. Before any audio plays, you control every word being spoken. Read the slide narration aloud yourself before approving — if a phrase sounds off in your mouth, it'll sound off in the student's ear. Fix it, re-save, done.
We built CourseBud to remove the friction that kills 95% of book-to-course projects — not to replace the author's judgment. The audio pipeline is a tool, a very capable one, but your voice and review is what makes a student feel they're learning from you, not from a robot.
Curious how it sounds on your actual book? Sign up, upload a chapter, and listen.