Sound event detection using spatial features and convolutional recurrent neural network


Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269
Adavanne, Sharath; Pertila, Pasi; Virtanen, Tuomas

Abstract

This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection. We extend the convolutional recurrent neural network to handle more than one type of these multichannel features by learning from each of them separately in the initial stages. We show that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multichannel audio better when they are presented as separate layers of a volume. Using the proposed spatial features over monaural features on the same network gives an absolute F-score improvement of 6.1% on the publicly available TUT-SED 2016 dataset and 2.7% on the TUT-SED 2009 dataset that is fifteen times larger

Keywords

Sound event detection; multichannel audio; spatial features; convolutional recurrent neural network

Research areas

Year:
2017
Book title:
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017)