Supervised Model Training for Overlapping Sound Events Based on Unsupervised Source Separation
Abstract
Sound event detection is addressed in the presence of overlapping sounds. Unsupervised sound source separation into streams is used as a preprocessing step to minimize the interference of overlapping events. This poses a problem in supervised model training, since there is no knowledge about which separated stream contains the targeted sound source. We propose two iterative approaches based on EM algorithm to select the most likely stream to contain the target sound: one by selecting always the most likely stream and another one by gradually eliminating the most unlikely streams from the training. The approaches were evaluated with a database containing recordings from various contexts, against the baseline system trained without applying stream selection. Both proposed approaches were found to give a reasonable increase of 8 percentage units in the detection accuracy.
Keywords Research areas- Year:
- 2013
- Book title:
- Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- Address:
- Vancouver, Canada