Analysis of Duration Prediction Accuracy in HMM-Based Speech Synthesis
Silén, Hanna; Helander, Elina; Nurminen, Jani; Gabbouj, Moncef
Abstract
Abstract
Appropriate phoneme durations are essential for high quality speech synthesis. In hidden Markov model-based text-to-speech (HMM-TTS), durations are typically modeled statistically using state duration probability distributions and duration prediction for unseen contexts. Use of rich context features enables synthesis without high-level linguistic knowledge. In this paper we analyze the accuracy of state duration modeling against phone duration modeling using simple prediction techniques. In addition to the decision tree-based techniques, regression techniques for rich context features with high collinearity are discussed and evaluated.
Keywords Research areas- Year:
- 2010
- Book title:
- The Fifth International Conference on Speech Prosody
- Month:
- May