Non-negative matrix deconvolution in noise robust speech recognition
Abstract
High noise robustness has been achieved in speech recognition by using sparse exemplar-based methods with spectrogram windows spanning up to 300 ms. A downside is that a large exemplar dictionary is required to cover sufficiently many spectral patterns and their temporal alignments within windows. We propose a recognition system based on a shift-invariant convolutive model, where exemplar activations at all the possible temporal positions jointly reconstruct an utterance. Recognition rates are evaluated using the AURORA-2 database, containing spoken digits with noise ranging from clean speech to -5 dB SNR. We obtain results superior to those, where the activations were found independently for each overlapping window.
Keywordsnon-negative matrix deconvolution; noise robustness; speech recognition
Research areas- Year:
- 2011
- Book title:
- Proceedings of International Conference on Audio, Speech and Signal Processing
- Address:
- Prague, Czech Republic
- Organization:
- IEEE Signal Processing Society
- Month:
- May