Compact Long Context Spectral Factorisation Models for Noise Robust Recognition of Medium Vocabulary Speech


Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269

Deprecated: implode(): Passing glue string after array is deprecated. Swap the parameters in /var/www/html/arg/administrator/components/com_jresearch/helpers/publications.php on line 269
Hurmalainen, Antti; Gemmeke, Jort; Virtanen, Tuomas

Abstract

In environments containing multiple non-stationary sound sources, it becomes increasingly difficult to recognise speech from its short-time spectra alone. Long-context speech and noise models, where phonetic patterns and noise events may span hundreds of milliseconds, have been found beneficial in such separation tasks. Thus far the majority of work employing non-negative matrix factorisation to long-context spectrogram separation has been conducted on small vocabulary tasks by exploiting large speech and noise dictionaries containing thousands of atoms. In this work we study whether the previously proposed factorisation methods are applicable to more natural speech and limited noise context while keeping the model sizes practically feasible. Results are evaluated on the WSJ0 5k -based 2nd CHiME Challenge Track 2 corpus, where we achieve approximately 4% absolute improvement in speech recognition rates compared to baseline using the proposed enhancement framework.

Keywords

spectral factorisation; speech recognition; noise robustness

Research areas

Year:
2013
Book title:
Proceedings of the 2nd CHiME workshop
Pages:
13-18
Month:
June