Separate spoken word from background music, tuned for dialogue.

Vocal Remover for Podcast Editing

Separate spoken word from background music, tuned for dialogue.

Clean up dialogue by removing background music beds, or pull a clean voice track out of a recording with music playing underneath.

Why speech separation isn’t just "vocal removal for talking"

Sung vocals and speech have very different acoustic signatures — pitch contour, sustain, harmonic structure. A model tuned on singing will under-perform on spoken word, sometimes leaving a faint "underwater" music-bed residue behind dialogue. This mode is trained specifically on speech-plus-music mixtures.

What it won’t do

This isn’t a noise-reduction, de-esser, or de-reverb tool — for room echo, mic hiss, or plosives, pair this with your existing podcast editing software after separation. It solves one specific problem: pulling speech and music apart when they’re mixed together.

Does it work on two overlapping speakers at once?: It separates speech from music reliably; separating two overlapping human voices from each other is a different, harder problem this mode doesn’t attempt.
What if I need both the dialogue and the music bed, just rebalanced?: Export both stems separately and remix them at whatever balance you want in your existing editor.

Ready to get started?

Start free — no credit card required.

Get started free