Low-Latency Sound Source Separation Using Deep Neural Networks

Naithani, Gaurav; Parascandolo, Giambattista; Barker, Tom; Pontoppidan, Niels Henrik; Virtanen, Tuomas

Sound source separation at low-latency requires that each in- coming frame of audio data be processed at very low de- lay, and outputted as soon as possible. For practical pur- poses involving human listeners, a 20 ms algorithmic delay is the uppermost limit which is comfortable to the listener. In this paper, we propose a low-latency (algorithmic delay ≤ 20 ms) deep neural network (DNN) based source sepa- ration method. The proposed method takes advantage of an extended past context, outputting soft time-frequency mask- ing filters which are then applied to incoming audio frames to give better separation performance as compared to NMF baseline. Acoustic mixtures from five pairs of speakers from CMU Arctic database were used for the experiments. At least 1 dB average improvement in source to distortion ratios (SDR) was observed in our DNN-based system over a low- latency NMF baseline for different processing and analysis frame lengths. The effect of incorporating previous temporal context into DNN inputs yielded significant improvements in SDR for short processing frame lengths.


Source separation; Deep neural networks; Low-latency

Book title:
IEEE Global Conference on Signal and Information Processing