Speech and Cognition Research Group
Unit of Computing Sciences,
Faculty of Information Technology and Communication Sciences,
Tampere University, Finland
|
|
About Research Members Publications Resources Contact
|
Misc resources (data, scripts etc.)
Speech processing textbook by Bäckstrom, Räsänen, Zewoudie, Zarazaga & Das. An introductory open access wiki-based textbook for Master's level speech processing. New contributions are also very welcome, and anyone can contribute.
Probabilistic dynamic time-warping (PDTW) algorithm for unsupervised discovery of recurring patterns from multivariate time-series data such as speech features.
Winner of the Zero Resource Speech Challenge 2020 speech discovery task at Interspeech-2020.
Automatic LInguistic Unit Count Estimator (ALICE) tool for automatic analysis of children's linguistic exposure from child-centered daylong audio recordings (Räsänen et al., 2020, Behavior Research Methods).
SylNet: An Adaptable End-to-End Syllable Count Estimator for Speech for language-independent syllable count estimation (Seshadri & Räsänen, IEEE Signal Processing Letters, in press).
Supports adaptation to new datasets and languages if syllable counts of training signals are available, but also provides state-of-the-art performance out-of-the-box. Runs on Python with TensorFlow backend.
PiENet: A noise robust neural network F0 estimator for speech. (Airaksinen, Juvela, Alku & Räsänen, Proc. ICASSP-2019).
High-performance F0 estimation from clean and noisy recordings. Please see the paper for more information. Runs on Python with TensorFlow backend.
Word count estimation (WCE) tools for child-centered daylong recordings, as
described in Räsänen et al. (submitted): "Automatic word count estimation from daylong child-centered recordings in various
language environments using language-independent syllabification of speech". Includes MATLAB/python scripts + corresponding Linux standalone executable (for MATLAB MCR).
ACLEW Diarization Virtual Machine (DiViMe): A Linux virtual machine (in development) that will contain a pre-installed set of tools for the automatic analysis of
child-centered daylong recordings. Currently includes a number of speech activity detectors, diarization tools (broad class + normal), and a tool for automatic word count estimation (see above). Obsolete! Use ALICE (above) instead.
TensorFlow (Python) implementation of CycleGANs with a Convolutional Neural Network (CNN) model with Gated activations, Residual connections, dilations and PostNets
(Seshadri, Juvela, Yamagishi, Räsänen & Alku: "Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion", submitted to ICASSP-2019).
MATLAB scripts for Bayesian Gaussian mixture model (BGMM) regression, used in our paper: Lopez et al.: "Speaking style conversion from normal to Lombard speech using a glottal vocoder and Bayesian GMMs" (Proc. Interspeech 2017).
Sonority-envelope based algorithm for automatic syllabification of speech (from Räsänen, Doyle & Frank, Cognition, 2018).
See here for Adriana Stan's re-implementation of the algorithm in Python.
MATLAB toolbox for approximate variational inference of Dirichlet and Pitman-Yor process -based Bayesian mixture models with Gaussian or Von Mises-Fisher mixture components. Used in: Seshadri S., Remes U. & Räsänen O. "Dirichlet process mixture models for clustering i-vector data" (2017) and in "Comparison of Non-parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing" (2017).
Syllable-based algorithms for Zero Resource Speech Processing. Unsupervised word discovery codes for MATLAB from the paper by Räsänen, Doyle & Frank Proc. Interspeech-2015 as part of the Zero Resource Speech Processing Challenge .
Feature selection algorithms. MATLAB implementations of mutual information (MI), statistical dependency (SD), and random subset feature selection (RSFS) feature selection algorithms presented in Pohjalainen, Räsänen & Kadioglu (Comp. Speech and Language, 2015).
|
Contact: firstname.surname@tuni.fi |