Online Blind Speech Separation using Multiple Acoustic Speaker Tracking and Time-Frequency Masking

Pertilä, Pasi
Abstract

Separating speech signals of multiple simultaneous talkers in a reverberant enclosure is known as the cocktail party problem. In real-time applications online solutions capable of separating the signals as they are observed are required in contrast to separating the signals offline after observation. Often a talker may move, which should also be considered by the separation system. This work proposes an online method for speaker detection, speaker direction tracking, and speech separation. The separation is based on multiple acoustic source tracking (MAST) using Bayesian filtering and time–frequency masking. Measurements from three room environments with varying amounts of reverberation using two different designs of microphone arrays are used to evaluate the capability of the method to separate up to four simultaneously active speakers. Separation of moving talkers is also considered. Results are compared to two reference methods: ideal binary masking (IBM) and oracle tracking (O-T). Simulations are used to evaluate the effect of number of microphones and their spacing.

Keywords

Blind source separation; Acoustic source tracking; Particle filtering; Time-frequency masking; Microphone arrays; Spatial Sound Source Separation

Year:
2013
Journal:
Computer Speech & Language
Volume:
27
Number:
3
Pages:
683–702
Month:
May
Note:
DOI: https://dx.doi.org/10.1016/j.csl.2012.08.003