Skip to main content

Command Palette

Search for a command to run...

Audio Stem Separation in Python: Demucs, Spleeter & API Compared (2026 Guide)

Four stems from any audio file — working code for local GPU inference, CPU-only, and a one-request cloud API

Published
8 min read

You've probably heard the term "stem separation" thrown around in music production, ML research, and audio tooling circles. But what exactly is it, and how do you do it programmatically in Python?

This guide is the definitive answer. We'll cover the two main open-source libraries (Demucs and Spleeter), a cloud REST API for environments without a GPU, and practical code examples for the most common use cases: karaoke generation, drum isolation for practice, and building ML training datasets.


What is audio stem separation?

Short answer: Stem separation (also called music source separation) uses a trained neural network to decompose a mixed audio file into isolated tracks — typically vocals, drums, bass, and "other" (guitars, keys, etc.). The model never hears the isolated parts during inference; it learned the signal statistics of each instrument class from thousands of labeled recordings during training.

For a deeper dive into the signal processing and machine learning behind it, see how AI stem separation works under the hood.

The practical result: given song.mp3, you get back vocals.wav, drums.wav, bass.wav, and other.wav — each as a clean mono or stereo WAV.

Common developer use cases:

Use caseStems neededWhy
Karaoke track generatorno_vocalsRemove lead vocals, keep everything else
Vocal isolation / acapellavocalsRemixes, pitch analysis, training data
Drum practice tooldrumsIsolated click track for drummers
ML dataset generationAll 4Train instrument classifiers, pitch detectors
DJ stem mixingdrums, bassLive stem mixing in Rekordbox / Serato
Podcast cleanupvocalsIsolate speech, remove background music

Demucs vs Spleeter — which Python library should you use?

Short answer: Use Demucs (htdemucs model) for anything you're building today. Spleeter is legacy — it hasn't had a major release since 2022 and depends on TensorFlow 1.x. Demucs outperforms it on every benchmark and is actively maintained by Meta AI Research.

The only reason to reach for Spleeter in 2026 is if you're maintaining an older pipeline that already uses it, or if you need its 5-stem model (vocals/drums/bass/piano/other).

Demucs htdemucsSpleeter 4stems
SDR vocals~9.4 dB~6.5 dB
SDR drums~8.9 dB~6.0 dB
SDR bass~8.2 dB~5.5 dB
GPU requiredNo (CPU ok, 5–10× slower)No
DependencyPyTorchTensorFlow 1.x
Active developmentYes (Meta AI)No (last release 2022)
Stem modes2-stem, 4-stem, 6-stem2-stem, 4-stem, 5-stem
LicenseMITMIT

For a full quality comparison with listening tests, see the Spleeter vs Demucs comparison.


Install

pip install demucs

Demucs downloads model weights on first run (~80 MB for htdemucs). PyTorch uses your GPU automatically if CUDA is available; otherwise it runs on CPU.

4-stem separation (vocals / drums / bass / other)

import subprocess
from pathlib import Path


def separate_stems(
    input_path: str | Path,
    output_dir: str | Path = "separated",
    model: str = "htdemucs",
) -> dict[str, Path]:
    """Separate audio into 4 stems using Demucs.

    Returns a dict mapping stem name → output file path.
    """
    input_path = Path(input_path).resolve()
    output_dir = Path(output_dir).resolve()
    output_dir.mkdir(parents=True, exist_ok=True)

    subprocess.run(
        ["demucs", "-n", model, "--out", str(output_dir), str(input_path)],
        check=True,
    )

    stem_dir = output_dir / model / input_path.stem
    return {
        stem: stem_dir / f"{stem}.wav"
        for stem in ("vocals", "drums", "bass", "other")
    }


if __name__ == "__main__":
    stems = separate_stems("my_song.mp3")
    for name, path in stems.items():
        size_kb = path.stat().st_size // 1024
        print(f"  {name}: {path} ({size_kb} KB)")

2-stem separation (vocals / no_vocals)

When you only need vocals vs instrumental (e.g. karaoke track), use --two-stems to skip the unnecessary drum/bass separation — it's faster and produces a cleaner blend in the "no_vocals" output:

def separate_two_stems(
    input_path: str | Path,
    output_dir: str | Path = "separated",
    model: str = "htdemucs",
) -> tuple[Path, Path]:
    """Returns (vocals_path, no_vocals_path)."""
    input_path = Path(input_path).resolve()
    output_dir = Path(output_dir).resolve()
    output_dir.mkdir(parents=True, exist_ok=True)

    subprocess.run(
        [
            "demucs", "-n", model,
            "--two-stems=vocals",
            "--out", str(output_dir),
            str(input_path),
        ],
        check=True,
    )

    stem_dir = output_dir / model / input_path.stem
    return stem_dir / "vocals.wav", stem_dir / "no_vocals.wav"

Model selection guide

Track typeRecommended modelWhy
Pop/rock post-1990htdemucs_ftFine-tuned, +0.5 dB SDR on vocals
Jazz, classicalhtdemucs (base)Better generalisation on complex polyphony
Electronic / heavily processedmdx_extraTrained on more electronic material
Speed priorityhtdemucsFaster than _ft variant

Switch model by changing the -n flag: demucs -n htdemucs_ft ...

Batch processing

def batch_separate(input_dir: str, output_dir: str = "separated") -> None:
    input_dir = Path(input_dir)
    audio_files = list(input_dir.glob("*.mp3")) + list(input_dir.glob("*.wav"))
    print(f"Processing {len(audio_files)} files…")
    for audio_file in sorted(audio_files):
        print(f"  → {audio_file.name}")
        try:
            stems = separate_stems(audio_file, output_dir)
            print(f"    ✓ {len(stems)} stems saved")
        except subprocess.CalledProcessError as e:
            print(f"    ✗ failed: {e}")

Method 2 — Spleeter (legacy, still useful for 5-stem)

Spleeter is slower and lower quality than Demucs for most use cases, but its 5-stem model (vocals / drums / bass / piano / other) has no direct Demucs equivalent — useful if you specifically need isolated piano.

Install

pip install spleeter

⚠️ Spleeter requires TensorFlow 1.x and can conflict with modern Python environments. Use a dedicated virtual environment or Docker image. It is not actively maintained.

4-stem separation

from spleeter.separator import Separator
from pathlib import Path


def spleeter_separate(
    input_path: str | Path,
    output_dir: str | Path = "separated_spleeter",
    stems: int = 4,
) -> Path:
    """Separate audio using Spleeter. Returns the output directory for this track."""
    input_path = Path(input_path).resolve()
    output_dir = Path(output_dir).resolve()

    separator = Separator(f"spleeter:{stems}stems")
    separator.separate_to_file(str(input_path), str(output_dir))

    return output_dir / input_path.stem


if __name__ == "__main__":
    result_dir = spleeter_separate("my_song.mp3", stems=4)
    for stem_file in sorted(result_dir.glob("*.wav")):
        print(f"  {stem_file.name}")

When to still use Spleeter

  • You need isolated piano stems (use stems=5)
  • You're maintaining an existing ML pipeline built on Spleeter and the migration cost isn't justified
  • You're benchmarking against older research that used Spleeter as the baseline

For everything else: use Demucs.


Method 3 — StemSplit REST API (no GPU, production-ready)

If you're deploying to a serverless function, a CPU-only cloud server, or building a web app where you can't run local inference, the StemSplit stem splitter API gives you cloud GPU separation through a simple REST call. No model weights, no CUDA dependencies, no cold-start latency.

Install

pip install requests

Full implementation (upload → poll → download all stems)

import time
import requests
from pathlib import Path


STEMSPLIT_API_BASE = "https://stemsplit.io/api"


def separate_via_api(
    input_path: str | Path,
    api_key: str,
    output_dir: str | Path = "separated_api",
    stems: list[str] | None = None,
    poll_interval: int = 5,
    timeout: int = 300,
) -> dict[str, Path]:
    """Separate audio into stems using the StemSplit REST API.

    Args:
        input_path: Local audio file (mp3, wav, flac, m4a — max 10 min).
        api_key: StemSplit API key.
        output_dir: Directory to save downloaded stems.
        stems: Which stems to request. Defaults to all 4 (vocals/drums/bass/other).
        poll_interval: Seconds between status polls.
        timeout: Max seconds to wait.

    Returns:
        Dict mapping stem name → downloaded file path.
    """
    input_path = Path(input_path)
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    stems = stems or ["vocals", "drums", "bass", "other"]
    headers = {"Authorization": f"Bearer {api_key}"}

    # 1. Upload and start the job
    with input_path.open("rb") as f:
        resp = requests.post(
            f"{STEMSPLIT_API_BASE}/separate",
            headers=headers,
            files={"file": (input_path.name, f, "audio/mpeg")},
            data={"stems": ",".join(stems)},
            timeout=60,
        )
    resp.raise_for_status()
    job_id = resp.json()["jobId"]
    print(f"Job started: {job_id}")

    # 2. Poll until complete
    elapsed = 0
    while elapsed < timeout:
        time.sleep(poll_interval)
        elapsed += poll_interval
        status = requests.get(
            f"{STEMSPLIT_API_BASE}/jobs/{job_id}", headers=headers, timeout=30
        ).json()
        print(f"  {status['status']} ({elapsed}s)")
        if status["status"] == "completed":
            stem_urls = status["stems"]
            break
        if status["status"] == "failed":
            raise RuntimeError(f"Job failed: {status.get('error')}")
    else:
        raise TimeoutError(f"Job {job_id} timed out after {timeout}s")

    # 3. Download each stem
    results: dict[str, Path] = {}
    for stem_name, url in stem_urls.items():
        out_path = output_dir / f"{input_path.stem}_{stem_name}.wav"
        dl = requests.get(url, timeout=120, stream=True)
        dl.raise_for_status()
        with out_path.open("wb") as f:
            for chunk in dl.iter_content(chunk_size=8192):
                f.write(chunk)
        results[stem_name] = out_path
        print(f"  ✓ {stem_name}: {out_path}")

    return results


if __name__ == "__main__":
    import os
    stems = separate_via_api(
        "my_song.mp3",
        api_key=os.environ["STEMSPLIT_API_KEY"],
        output_dir="output",
    )

Practical use cases

1 — Karaoke track (remove lead vocals)

_, instrumental = separate_two_stems("song.mp3")
print(f"Karaoke track: {instrumental}")
# → separated/htdemucs/song/no_vocals.wav

Or request only the instrumental stem via API to save processing time:

stems = separate_via_api("song.mp3", api_key=KEY, stems=["other", "drums", "bass"])

For the full karaoke generation workflow including synced lyrics, see how to extract acapella from any song in Python.

2 — Isolated drum track for practice

stems = separate_stems("song.mp3")
drum_track = stems["drums"]
print(f"Drum track: {drum_track}")
# Play this alongside the original to practice along

3 — ML training dataset generation

from pathlib import Path
import json


def build_stem_dataset(
    input_dir: str,
    output_dir: str = "dataset",
    model: str = "htdemucs",
) -> None:
    """Separate a folder of songs into stems for ML training."""
    input_dir = Path(input_dir)
    output_dir = Path(output_dir)
    manifest = []

    for audio_file in sorted(input_dir.glob("*.mp3")):
        stems = separate_stems(audio_file, output_dir / "stems", model)
        manifest.append({
            "source": str(audio_file),
            "stems": {k: str(v) for k, v in stems.items()},
        })
        print(f"  ✓ {audio_file.name}")

    manifest_path = output_dir / "manifest.json"
    manifest_path.write_text(json.dumps(manifest, indent=2))
    print(f"\nManifest saved: {manifest_path} ({len(manifest)} tracks)")

Conclusion

For new Python projects in 2026, the hierarchy is clear:

  1. Local Demucs (htdemucs_ft) — best quality, free, GPU optional
  2. StemSplit API — production cloud option, no GPU or model management
  3. Spleeter — only if you need 5-stem (piano) or are maintaining legacy code

The choice between local and API depends on your infrastructure: if you're running a batch job on a machine you control, go local. If you're building a web service or running on cloud functions, the REST API is far simpler to operate.