TAU logo
About      Research     Members     Publications      Resources      Contact

Misc resources (data, scripts etc.)

  • Speech processing textbook by Bäckstrom, Räsänen, Zewoudie, Zarazaga & Das. An introductory open access wiki-based textbook for Master's level speech processing. New contributions are also very welcome, and anyone can contribute.

  • Probabilistic dynamic time-warping (PDTW) algorithm for unsupervised discovery of recurring patterns from multivariate time-series data such as speech features. Winner of the Zero Resource Speech Challenge 2020 speech discovery task at Interspeech-2020.

  • Automatic LInguistic Unit Count Estimator (ALICE) tool for automatic analysis of children's linguistic exposure from child-centered daylong audio recordings (Räsänen et al., 2020, Behavior Research Methods).

  • SylNet: An Adaptable End-to-End Syllable Count Estimator for Speech for language-independent syllable count estimation (Seshadri & Räsänen, IEEE Signal Processing Letters, in press). Supports adaptation to new datasets and languages if syllable counts of training signals are available, but also provides state-of-the-art performance out-of-the-box. Runs on Python with TensorFlow backend.

  • PiENet: A noise robust neural network F0 estimator for speech. (Airaksinen, Juvela, Alku & Räsänen, Proc. ICASSP-2019). High-performance F0 estimation from clean and noisy recordings. Please see the paper for more information. Runs on Python with TensorFlow backend.

  • Word count estimation (WCE) tools for child-centered daylong recordings, as described in Räsänen et al. (submitted): "Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech". Includes MATLAB/python scripts + corresponding Linux standalone executable (for MATLAB MCR).

  • ACLEW Diarization Virtual Machine (DiViMe): A Linux virtual machine (in development) that will contain a pre-installed set of tools for the automatic analysis of child-centered daylong recordings. Currently includes a number of speech activity detectors, diarization tools (broad class + normal), and a tool for automatic word count estimation (see above). Obsolete! Use ALICE (above) instead.

  • TensorFlow (Python) implementation of CycleGANs with a Convolutional Neural Network (CNN) model with Gated activations, Residual connections, dilations and PostNets (Seshadri, Juvela, Yamagishi, Räsänen & Alku: "Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion", submitted to ICASSP-2019).

  • MATLAB scripts for Bayesian Gaussian mixture model (BGMM) regression, used in our paper: Lopez et al.: "Speaking style conversion from normal to Lombard speech using a glottal vocoder and Bayesian GMMs" (Proc. Interspeech 2017).

  • Sonority-envelope based algorithm for automatic syllabification of speech (from Räsänen, Doyle & Frank, Cognition, 2018). See here for Adriana Stan's re-implementation of the algorithm in Python.

  • MATLAB toolbox for approximate variational inference of Dirichlet and Pitman-Yor process -based Bayesian mixture models with Gaussian or Von Mises-Fisher mixture components. Used in: Seshadri S., Remes U. & Räsänen O. "Dirichlet process mixture models for clustering i-vector data" (2017) and in "Comparison of Non-parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing" (2017).

  • Syllable-based algorithms for Zero Resource Speech Processing. Unsupervised word discovery codes for MATLAB from the paper by Räsänen, Doyle & Frank Proc. Interspeech-2015 as part of the Zero Resource Speech Processing Challenge .

  • Feature selection algorithms. MATLAB implementations of mutual information (MI), statistical dependency (SD), and random subset feature selection (RSFS) feature selection algorithms presented in Pohjalainen, Räsänen & Kadioglu (Comp. Speech and Language, 2015).


  • Contact: firstname.surname@tuni.fi