Audio Stem Separation in Python: Demucs, Spleeter & API Compared (2026 Guide)

You've probably heard the term "stem separation" thrown around in music production, ML research, and audio tooling circles. But what exactly is it, and how do you do it programmatically in Python?

This guide is the definitive answer. We'll cover the two main open-source libraries (Demucs and Spleeter), a cloud REST API for environments without a GPU, and practical code examples for the most common use cases: karaoke generation, drum isolation for practice, and building ML training datasets.

What is audio stem separation?

Short answer: Stem separation (also called music source separation) uses a trained neural network to decompose a mixed audio file into isolated tracks — typically vocals, drums, bass, and "other" (guitars, keys, etc.). The model never hears the isolated parts during inference; it learned the signal statistics of each instrument class from thousands of labeled recordings during training.

For a deeper dive into the signal processing and machine learning behind it, see how AI stem separation works under the hood.

The practical result: given song.mp3, you get back vocals.wav, drums.wav, bass.wav, and other.wav — each as a clean mono or stereo WAV.

Common developer use cases:

Use case	Stems needed	Why
Karaoke track generator	`no_vocals`	Remove lead vocals, keep everything else
Vocal isolation / acapella	`vocals`	Remixes, pitch analysis, training data
Drum practice tool	`drums`	Isolated click track for drummers
ML dataset generation	All 4	Train instrument classifiers, pitch detectors
DJ stem mixing	`drums`, `bass`	Live stem mixing in Rekordbox / Serato
Podcast cleanup	`vocals`	Isolate speech, remove background music

Demucs vs Spleeter — which Python library should you use?

Short answer: Use Demucs (htdemucs model) for anything you're building today. Spleeter is legacy — it hasn't had a major release since 2022 and depends on TensorFlow 1.x. Demucs outperforms it on every benchmark and is actively maintained by Meta AI Research.

The only reason to reach for Spleeter in 2026 is if you're maintaining an older pipeline that already uses it, or if you need its 5-stem model (vocals/drums/bass/piano/other).

	Demucs `htdemucs`	Spleeter `4stems`
SDR vocals	~9.4 dB	~6.5 dB
SDR drums	~8.9 dB	~6.0 dB
SDR bass	~8.2 dB	~5.5 dB
GPU required	No (CPU ok, 5–10× slower)	No
Dependency	PyTorch	TensorFlow 1.x
Active development	Yes (Meta AI)	No (last release 2022)
Stem modes	2-stem, 4-stem, 6-stem	2-stem, 4-stem, 5-stem
License	MIT	MIT

For a full quality comparison with listening tests, see the Spleeter vs Demucs comparison.

Method 1 — Demucs (recommended)

Install

pip install demucs

Demucs downloads model weights on first run (~80 MB for htdemucs). PyTorch uses your GPU automatically if CUDA is available; otherwise it runs on CPU.

4-stem separation (vocals / drums / bass / other)

import subprocess
from pathlib import Path


def separate_stems(
    input_path: str | Path,
    output_dir: str | Path = "separated",
    model: str = "htdemucs",
) -> dict[str, Path]:
    """Separate audio into 4 stems using Demucs.

    Returns a dict mapping stem name → output file path.
    """
    input_path = Path(input_path).resolve()
    output_dir = Path(output_dir).resolve()
    output_dir.mkdir(parents=True, exist_ok=True)

    subprocess.run(
        ["demucs", "-n", model, "--out", str(output_dir), str(input_path)],
        check=True,
    )

    stem_dir = output_dir / model / input_path.stem
    return {
        stem: stem_dir / f"{stem}.wav"
        for stem in ("vocals", "drums", "bass", "other")
    }


if __name__ == "__main__":
    stems = separate_stems("my_song.mp3")
    for name, path in stems.items():
        size_kb = path.stat().st_size // 1024
        print(f"  {name}: {path} ({size_kb} KB)")

2-stem separation (vocals / no_vocals)

When you only need vocals vs instrumental (e.g. karaoke track), use --two-stems to skip the unnecessary drum/bass separation — it's faster and produces a cleaner blend in the "no_vocals" output:

def separate_two_stems(
    input_path: str | Path,
    output_dir: str | Path = "separated",
    model: str = "htdemucs",
) -> tuple[Path, Path]:
    """Returns (vocals_path, no_vocals_path)."""
    input_path = Path(input_path).resolve()
    output_dir = Path(output_dir).resolve()
    output_dir.mkdir(parents=True, exist_ok=True)

    subprocess.run(
        [
            "demucs", "-n", model,
            "--two-stems=vocals",
            "--out", str(output_dir),
            str(input_path),
        ],
        check=True,
    )

    stem_dir = output_dir / model / input_path.stem
    return stem_dir / "vocals.wav", stem_dir / "no_vocals.wav"

Model selection guide

Track type	Recommended model	Why
Pop/rock post-1990	`htdemucs_ft`	Fine-tuned, +0.5 dB SDR on vocals
Jazz, classical	`htdemucs` (base)	Better generalisation on complex polyphony
Electronic / heavily processed	`mdx_extra`	Trained on more electronic material
Speed priority	`htdemucs`	Faster than `_ft` variant

Switch model by changing the -n flag: demucs -n htdemucs_ft ...

Batch processing

def batch_separate(input_dir: str, output_dir: str = "separated") -> None:
    input_dir = Path(input_dir)
    audio_files = list(input_dir.glob("*.mp3")) + list(input_dir.glob("*.wav"))
    print(f"Processing {len(audio_files)} files…")
    for audio_file in sorted(audio_files):
        print(f"  → {audio_file.name}")
        try:
            stems = separate_stems(audio_file, output_dir)
            print(f"    ✓ {len(stems)} stems saved")
        except subprocess.CalledProcessError as e:
            print(f"    ✗ failed: {e}")

Method 2 — Spleeter (legacy, still useful for 5-stem)

Spleeter is slower and lower quality than Demucs for most use cases, but its 5-stem model (vocals / drums / bass / piano / other) has no direct Demucs equivalent — useful if you specifically need isolated piano.

Install

pip install spleeter

⚠️ Spleeter requires TensorFlow 1.x and can conflict with modern Python environments. Use a dedicated virtual environment or Docker image. It is not actively maintained.

4-stem separation

from spleeter.separator import Separator
from pathlib import Path


def spleeter_separate(
    input_path: str | Path,
    output_dir: str | Path = "separated_spleeter",
    stems: int = 4,
) -> Path:
    """Separate audio using Spleeter. Returns the output directory for this track."""
    input_path = Path(input_path).resolve()
    output_dir = Path(output_dir).resolve()

    separator = Separator(f"spleeter:{stems}stems")
    separator.separate_to_file(str(input_path), str(output_dir))

    return output_dir / input_path.stem


if __name__ == "__main__":
    result_dir = spleeter_separate("my_song.mp3", stems=4)
    for stem_file in sorted(result_dir.glob("*.wav")):
        print(f"  {stem_file.name}")

When to still use Spleeter

You need isolated piano stems (use stems=5)
You're maintaining an existing ML pipeline built on Spleeter and the migration cost isn't justified
You're benchmarking against older research that used Spleeter as the baseline

For everything else: use Demucs.

Method 3 — StemSplit REST API (no GPU, production-ready)

If you're deploying to a serverless function, a CPU-only cloud server, or building a web app where you can't run local inference, the StemSplit stem splitter API gives you cloud GPU separation through a simple REST call. No model weights, no CUDA dependencies, no cold-start latency.

Install

pip install requests

Full implementation (upload → poll → download all stems)

import time
import requests
from pathlib import Path


STEMSPLIT_API_BASE = "https://stemsplit.io/api"


def separate_via_api(
    input_path: str | Path,
    api_key: str,
    output_dir: str | Path = "separated_api",
    stems: list[str] | None = None,
    poll_interval: int = 5,
    timeout: int = 300,
) -> dict[str, Path]:
    """Separate audio into stems using the StemSplit REST API.

    Args:
        input_path: Local audio file (mp3, wav, flac, m4a — max 10 min).
        api_key: StemSplit API key.
        output_dir: Directory to save downloaded stems.
        stems: Which stems to request. Defaults to all 4 (vocals/drums/bass/other).
        poll_interval: Seconds between status polls.
        timeout: Max seconds to wait.

    Returns:
        Dict mapping stem name → downloaded file path.
    """
    input_path = Path(input_path)
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    stems = stems or ["vocals", "drums", "bass", "other"]
    headers = {"Authorization": f"Bearer {api_key}"}

    # 1. Upload and start the job
    with input_path.open("rb") as f:
        resp = requests.post(
            f"{STEMSPLIT_API_BASE}/separate",
            headers=headers,
            files={"file": (input_path.name, f, "audio/mpeg")},
            data={"stems": ",".join(stems)},
            timeout=60,
        )
    resp.raise_for_status()
    job_id = resp.json()["jobId"]
    print(f"Job started: {job_id}")

    # 2. Poll until complete
    elapsed = 0
    while elapsed < timeout:
        time.sleep(poll_interval)
        elapsed += poll_interval
        status = requests.get(
            f"{STEMSPLIT_API_BASE}/jobs/{job_id}", headers=headers, timeout=30
        ).json()
        print(f"  {status['status']} ({elapsed}s)")
        if status["status"] == "completed":
            stem_urls = status["stems"]
            break
        if status["status"] == "failed":
            raise RuntimeError(f"Job failed: {status.get('error')}")
    else:
        raise TimeoutError(f"Job {job_id} timed out after {timeout}s")

    # 3. Download each stem
    results: dict[str, Path] = {}
    for stem_name, url in stem_urls.items():
        out_path = output_dir / f"{input_path.stem}_{stem_name}.wav"
        dl = requests.get(url, timeout=120, stream=True)
        dl.raise_for_status()
        with out_path.open("wb") as f:
            for chunk in dl.iter_content(chunk_size=8192):
                f.write(chunk)
        results[stem_name] = out_path
        print(f"  ✓ {stem_name}: {out_path}")

    return results


if __name__ == "__main__":
    import os
    stems = separate_via_api(
        "my_song.mp3",
        api_key=os.environ["STEMSPLIT_API_KEY"],
        output_dir="output",
    )

Practical use cases

1 — Karaoke track (remove lead vocals)

_, instrumental = separate_two_stems("song.mp3")
print(f"Karaoke track: {instrumental}")
# → separated/htdemucs/song/no_vocals.wav

Or request only the instrumental stem via API to save processing time:

stems = separate_via_api("song.mp3", api_key=KEY, stems=["other", "drums", "bass"])

For the full karaoke generation workflow including synced lyrics, see how to extract acapella from any song in Python.

2 — Isolated drum track for practice

stems = separate_stems("song.mp3")
drum_track = stems["drums"]
print(f"Drum track: {drum_track}")
# Play this alongside the original to practice along

3 — ML training dataset generation

from pathlib import Path
import json


def build_stem_dataset(
    input_dir: str,
    output_dir: str = "dataset",
    model: str = "htdemucs",
) -> None:
    """Separate a folder of songs into stems for ML training."""
    input_dir = Path(input_dir)
    output_dir = Path(output_dir)
    manifest = []

    for audio_file in sorted(input_dir.glob("*.mp3")):
        stems = separate_stems(audio_file, output_dir / "stems", model)
        manifest.append({
            "source": str(audio_file),
            "stems": {k: str(v) for k, v in stems.items()},
        })
        print(f"  ✓ {audio_file.name}")

    manifest_path = output_dir / "manifest.json"
    manifest_path.write_text(json.dumps(manifest, indent=2))
    print(f"\nManifest saved: {manifest_path} ({len(manifest)} tracks)")

Conclusion

For new Python projects in 2026, the hierarchy is clear:

Local Demucs (htdemucs_ft) — best quality, free, GPU optional
StemSplit API — production cloud option, no GPU or model management
Spleeter — only if you need 5-stem (piano) or are maintaining legacy code

The choice between local and API depends on your infrastructure: if you're running a batch job on a machine you control, go local. If you're building a web service or running on cloud functions, the REST API is far simpler to operate.

How to Extract Acapella from Any Song in Python — vocals-only extraction in depth, including yt-dlp pipeline
How to Isolate Vocals from Any Song: 5 Methods Compared — non-Python methods including browser tools

Audio Stem Separation in Python: Demucs, Spleeter & API Compared (2026 Guide)

What is audio stem separation?

Demucs vs Spleeter — which Python library should you use?

Method 1 — Demucs (recommended)

Install

4-stem separation (vocals / drums / bass / other)

2-stem separation (vocals / no_vocals)

Model selection guide

Batch processing

Method 2 — Spleeter (legacy, still useful for 5-stem)

Install

4-stem separation

When to still use Spleeter

Method 3 — StemSplit REST API (no GPU, production-ready)

Install

Full implementation (upload → poll → download all stems)

Practical use cases

1 — Karaoke track (remove lead vocals)

2 — Isolated drum track for practice

3 — ML training dataset generation

Conclusion

Comments

More from this blog

We Published an Open Stem Separation Benchmark on Hugging Face (4 Models, 50 Tracks)

How to Extract Acapella from Any Song in Python: Demucs, API & YouTube Pipeline (2026)

Best Audio Editing Software in 2026: I Tested 10 Programs for Music Production

How to Remove Background Music from YouTube Videos (Without Breaking Copyright)

Command Palette

What is audio stem separation?

Demucs vs Spleeter — which Python library should you use?

Method 1 — Demucs (recommended)

Install

4-stem separation (vocals / drums / bass / other)

2-stem separation (vocals / no_vocals)

Model selection guide

Batch processing

Method 2 — Spleeter (legacy, still useful for 5-stem)

Install

4-stem separation

When to still use Spleeter

Method 3 — StemSplit REST API (no GPU, production-ready)

Install

Full implementation (upload → poll → download all stems)

Practical use cases

1 — Karaoke track (remove lead vocals)

2 — Isolated drum track for practice

3 — ML training dataset generation

Conclusion

Related articles

Comments

More from this blog