🎙️ Audio Data Quality Toolkit for TTS/ASR Training Pipelines

Detect clipping, silence, noisy samples, duplicate clips, transcript mismatch, speaker imbalance, and synthetic-data artifacts in speech datasets.

Designed for TTS, ASR, voice-cloning, and synthetic speech evaluation workflows.

Lint your audio datasets before training. Training-readiness checks for TTS, ASR, and voice-cloning pipelines, with roadmap support for duplicate detection, speaker balance, and ASR-based transcript alignment. No GPU required. All checks run on CPU with numpy/scipy/librosa.

Unlike perceptual scoring tools such as NISQA, PESQ, or UTMOS, which answer "how good does this sound?", this toolkit answers "is this file ready for training?" by catching the data-engineering issues that silently degrade model quality.

Upload one audio clip and inspect training-readiness quality signals.

Expected sample rate