The TUM+TUT+KUL Approach to the CHiME Challenge 2013: Multi-Stream ASR Exploiting BLSTM Networks and Sparse NMF

Geiger, Jürgen; Weninger, Felix; Hurmalainen, Antti; Gemmeke, Jort; Wöllmer, Martin; Schuller, Björn; Rigoll, Gerhard; Virtanen, Tuomas

We present our joint contribution to the 2nd CHiME Speech Separation and Recognition Challenge. Our system combines speech enhancement by supervised sparse non-negative matrix factorisation (NMF) with a multi-stream speech recognition system. In addition to a conventional MFCC HMM recogniser, predictions by a bidirectional Long Short-Term Memory recurrent neural network (BLSTM-RNN) and from non-negative sparse classification (NSC) are integrated into a triple-stream recogniser. Experiments are carried out on the small vocabulary and the medium vocabulary recognition tasks of the Challenge. Consistent improvements over the Challenge baselines demonstrate the efficacy of the proposed system, resulting in an average word accuracy of 92.8% in the small vocabulary task and an average word error rate of 41.42% in the medium vocabulary task.


Long Short-Term Memory; recurrent neural networks; non-negative matrix factorisation; dynamic Bayesian networks

Research areas

Book title:
proceedings of the 2nd CHiME workshop