Comparative study of letter encoding for text-to-phoneme mapping

Bilcu, Enikö Beatrice; Astola, Jaakko; Saarinen, Jukka

Text-to-phoneme mapping is a very important preliminary step in any text-to-speech synthesis system. In this paper, we study the performances of the multilayer perceptron (MLP) neural network for the problem of text-to-phoneme mapping. Specifically, we study the influence of the input letter encoding in the conversion accuracy of such system. We show, that for large network complexities the orthogonal binary codes (as introduced in NetTalk) gives better performance. On the other hand in applications that require very small memory load and computational complexity other compact codes may be more suitable. This study is a first step toward implementation a neural network based text-to-phoneme mapping in mobile devices.

Book title:
Proceedings of 13. European Signal Processing Conference, EUSIPCO
Antalya, Turkey