Introduction

Our work is about separating the harmonic from the percussive instruments/components that exist in a music mixture. That is, given a single channel (i.e. monaural) musical mixture (i.e. music song), our method separates the percussive and harmonic sounds that exist in the mixture. For example, in a band setup like guitar, bass, drums, percussion (e.g. conga), and vocals, our method separates the drums and the percussion from all the rest. Hence, the name harmonic/percussive source separation (HPSS).

Not caring for details, just want the demo or the results? Click here and go to the demonstration or here and go to the results!

Proposed method

Our proposed method for HPSS is based on two of out previous works. One is the MaD TwinNet and the other is the Phase Unwrapping (PU) phase recovery algorithm

For convenience, below you can find a brief introduction to the above mention methods. It has to be noted that we offer code for both of our methods and pre-trained weights (where applicable), in order to help reproducibility.

So, feel free and use our methods, visit, star, and clone out GitHub repositories, and enjoy separating sources!

Below you can see an illustration of our proposed method for HPSS.

MaD TwinNet

MaD TwinNet is based on the Masker-Denoiser architecture, augmented with the Twin Network. Thus, the "MaD" is from the "Masker-Denoiser" and TwinNet from the Twin Network. The role of MaD TwinNet in this work is to perform the separation of the percussive and the harmonic components. For a general presentation of the MaD TwinNet, you can check at the corresponding paper and demo.

The Masker is the first component of MaD TwinNet and accepts as an input the magnitude spectrogram of the mixture. Then, the Masker predicts and applies a time-frequency mask to its input and outputs a first estimate of the magnitude spectrogram of the percussive components. This estimate of the percussive components is then given as an input to the Denoiser.

The Denoiser predicts and applies a time-frequency denoising filter to the estimated percussive components. This filter aims at removing interferences, artifacts, and (in general) any other noise introduced by the separation process from the Masker.

After the application of the denoising filter by the Denoiser, the now cleaned estimated of the magnitude spectrogram of the percussive components can be used to estimate the harmonic components. This result in having separated the percussive and harmonic components.

The estimated harmonic components are given as an input to the PU algorithm to enhance them more, by applying improved phase recovery techniques.

Illustration of the MaD TwinNet for HPSS

Phase recovery

The most common approach when separating music signals by employing magnitude spectrogram is to use the phase of the mixture. This approach is equivalent to assuming that each time-frequency bin of the short-time Fourier transform (STFT) contains information for only one source. In a realistic scenario, such as the harmonic/percussive case, this assumption does no longer hold since the sources are strongly overlapping in time and frequency.

The PU algorithm consists in predicting the phase of the harmonic source by using a sinusoidal model. Then, from this initial estimate, an iterative procedure is applied to minimize the mixing error and yield the final sources estimates. We applied the PU algorithm on the predictions of the harmonic components, in order to reduce the interferences from the percussive sources.

The iterative process is illustrated in the image bellow, and more details can be found on the corresponding website.

The iterative scheme of PU-Iter when there are 2 complex numbers to be estimated.

Demo section

Demonstration

Below you can actually listen the performance of our method! We have a set of songs and for each one, we offer for listening the original mixture (i.e. the song), the original voice, and the voice as is separated by our method.

We have resulting audio from two different settings. These settings correspond to different set of hyper-parameters for MaD TwinNet.

In Setting 1 we used the hyper-parameters as they defined at the corresponding paper of MaD TwinNet.
In Setting 2 we altered them in order to match the settings where PU algorithm performs better.

You can find more information on the exact hyper-parameters in out paper!

Must be mentioned that we did not do any kind of extra post-processing to the files. You will just hear the actual, unprocessed, output of our method.

Original mixture

Original harmonic content

Original percussive content

Song information

Artist	Title	Genre
Signe Jakobsen	What Have You Done To Me	Rock Singer-Songwriter

Predicted content - Setting 1

Harmonic content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Percussive content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Predicted content - Setting 2

Harmonic content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Percussive content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Original mixture

Original harmonic content

Original percussive content

Song information

Artist	Title	Genre
Fergessen	Back From The Start	Melodic Indie Rock

Predicted content - Setting 1

Harmonic content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Percussive content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Predicted content - Setting 2

Harmonic content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Percussive content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Original mixture

Original harmonic content

Original percussive content

Song information

Artist	Title	Genre
Sambasevam Shanmugam	Kaathaadi	Bollywood

Predicted content - Setting 1

Harmonic content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Percussive content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Predicted content - Setting 2

Harmonic content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Percussive content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Original mixture

Original harmonic content

Original percussive content

Song information

Artist	Title	Genre
James Elder & Mark M Thompson	The English Actor	Indie Pop

Predicted content - Setting 1

Harmonic content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Percussive content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Predicted content - Setting 2

Harmonic content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Percussive content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Original mixture

Original harmonic content

Original percussive content

Song information

Artist	Title	Genre
Leaf	Come around	Atmospheric Indie Pop

Predicted content - Setting 1

Harmonic content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Percussive content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Predicted content - Setting 2

Harmonic content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Percussive content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Results section

Data and objective results

In other words, from what data our method learned, on what data it is tested, and how well it performed from an objective perspective?

To benchmark our complete method (i.e. MaD TwinNet plus PU algorithm), we compared the obtained results against a typical method (kernel additive model, KAM) and against MaD TwinNet but using the phase of the mixture.

You can see the information about the data at the "Dataset" section and information about the obtained objective results at the "Objective results" section.

Dataset

In order to train our method, we used the development subset of the Demixing Secret Dataset (DSD), which consists of 50 mixtures with their corresponding sources, plus music stems from MedleyDB.

For testing our method, we used the testing subset of the DSD, consisting of 50 mixtures and their corresponding sources.

Objective results

We objectively evaluated our method using the signal-to-distortion ratio (SDR), signal-to-interference ratio (SIR), and signal-to-artifacts ratio (SAR). The results can be seen at the table below.

The objective evaluation results of our method for Setting 1 and Setting 2, and for the percussive components, the harmonic components, and on average.

		Percussive			Harmonic			Average
		SDR	SIR	SAR	SDR	SIR	SAR	SDR	SIR	SAR
	KAM	01.42	00.44	03.76	06.60	06.71	17.66	04.01	03.57	10.71
Setting 1	MaDTwinNet & mix phase	03.35	04.65	06.10	08.62	14.22	10.75	05.99	09.44	08.43
	MaDTwinNet & PU	03.35	04.66	06.08	08.58	14.45	10.59	05.97	09.55	08.34

	KAM	00.98	05.03	-1.17	06.35	06.58	18.51	03.66	05.80	08.67
Setting 2	MaDTwinNet & mix phase	03.60	04.73	06.07	08.70	12.84	11.78	06.15	08.79	08.92
	MaDTwinNet & PU	03.59	04.76	06.00	08.69	13.11	11.57	06.14	08.94	08.78

Acknowledgements

We would like to kindly acknowledge all those that supported and helped us for this work.

K. Drossos and T. Virtanen wish to acknowledge the CSC-IT Center for Science, Finland, for computational resources
P. Magron is supported by the Academy of Finland, project no. 290190
S.-I. Mimilakis is supported by the European Union's H2020 Framework Programme (H2020-MSCA-ITN-2014) under grant agreement no 642685 MacSeNet
This research was partly funded by the European Research Council under the European Union’s H2020 Framework Programme through ERC Grant Agreement 637422 EVERYSOUND.
K. Drossos, P. Margon, and T, Virtanen wish to acknowledge the CSC-IT Center for Science, Finland, for computational resources
Part of the computations leading to these results was performed on a TITAN-X GPU donated by NVIDIA to K. Drossos

Harmonic/Percussive Separation On-line Demo

Introduction

Proposed method

MaD TwinNet

Phase recovery

Demonstration

Original mixture

Original harmonic content

Original percussive content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Original mixture

Original harmonic content

Original percussive content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Original mixture

Original harmonic content

Original percussive content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Original mixture

Original harmonic content

Original percussive content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

Original mixture

Original harmonic content

Original percussive content

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

KAM

MaDTwinNet & mix phase

MaDTwinNet & PU

KAM

MaDTwinNet & mix phase