SonicMaster

Towards Controllable All-in-One Music Restoration and Mastering

Jan Melechovsky*, Ambuj Mehrish, Dorien Herremans

The Audio, Music, and AI Lab at Singapore University of Technology and Design (SUTD)

Abstract. Music recordings often suffer from audio quality issues such as excessive reverberation, distortion, clipping, tonal imbalances, and a narrowed stereo image, especially when created in non-professional settings without specialized equipment or expertise. These problems are typically corrected using separate specialized tools and manual adjustments. In this paper, we introduce SonicMaster, the first unified generative model for music restoration and mastering that addresses a broad spectrum of audio artifacts with text-based control. SonicMaster is conditioned on natural language instructions to apply targeted enhancements, or can operate in an automatic mode for general restoration. To train this model, we construct the SonicMaster dataset, a large dataset of paired degraded and high-quality tracks by simulating common degradation types with nineteen degradation functions belonging to five enhancements groups: equalization, dynamics, reverb, amplitude, and stereo. Our approach leverages a flow-matching generative training paradigm to learn an audio transformation that maps degraded inputs to their cleaned, mastered versions guided by text prompts. Objective audio quality metrics demonstrate that SonicMaster significantly improves sound quality across all artifact categories. Furthermore, subjective listening tests confirm that listeners prefer SonicMaster's enhanced outputs over the original degraded audio, highlighting the effectiveness of our unified approach.

Key Contributions

🎵 Unified Restoration

All-In-One model to simultaneously handle reverb, clipping, EQ, dynamics, and stereo imbalances.

📝 Text-Based Control

Use natural-language instructions (e.g. “reduce reverb”) for fine-grained audio enhancement.

🚀 High-Quality Output

Objective metrics (FAD, SSIM, etc.) and listening tests show significant quality gains.

💾 SonicMaster Dataset

We release a large-scale dataset of 25k (208 hrs) paired clean and degraded music segments with natural-language prompts for training and evaluation.

SonicMaster Dataset

Due to the absence of natural language instruction-prompted music restoration datasets, we have generated SonicMaster music restoration and mastering dataset with text prompts. We source 580k recordings from the Jamendo [1] under creative commons licence using the official Jamendo API. To ensure equal representation of genre, we defined 10 groups, where each group consists of multiple related genre tags, e.g., Hip-Hop genre group containing the following tags: "rap", "hiphop", "trap", "alternativehiphop", "gangstarap".

Music Enhancement Samples

Example prompts with corresponding input to enhance, enhanced outputs from SonicMaster, and ground truth references. For optimal listening, please use headphones.

Text Prompt	Music to Enhance	SonicMaster	Ground Truth
Increase the clarity of this song by emphasizing treble frequencies.
Can you make this sound louder, please?
Improve the balance in this song.
Correct the unnatural frequency emphasis. Reduce the roominess or echo.
Increase the clarity of this song by emphasizing treble frequencies.
Clean this off any echoes!
Make the sound less squashed and more open.
Make this song sound more boomy by amplifying the low end bass frequencies.
Make the audio smoother and less distorted.
Disentangle the left and right channels to give this song a stereo feeling.
Raise the level of the vocals, please.
Please, dereverb this audio.
Disentangle the left and right channels to give this song a stereo feeling.

@article{melechovsky2025sonicmaster, title = {SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering}, author = {Melechovsky, Jan and Mehrish, Ambuj and Herremans, Dorien}, year = {2025}, eprint = {2508.03448}, archivePrefix = {arXiv}, primaryClass = {cs.SD}, url = {https://arxiv.org/abs/2508.03448} }