Speech and Cognition research group

Speech and Cognition Research Group

Unit of Computing Sciences,
Faculty of Information Technology and Communication Sciences,
Tampere University,
Finland

News

June 2024
Does self-supervised statistical learning from realistic-scale audiovisual speech input enable bootstrapping of infant language learning? Our new modeling study suggests that's the case: https://arxiv.org/abs/2406.05259.

August 2021
Could infants learn linguistic units as a side product of their developing understanding of multimodal experiences? Check out our new paper in Language Development Research where we explore this idea!

June 2021
Our Bachelor's thesis workers did an amazing job in implementing a complete virtual avatar capable of spoken conversation with its interlocutors. Avatar ASR module thesis by Noora Pöysti here and TTS module description by Väino-Waltteri Granat here. See also thesis on dialogue implementation by Tariq Harb here.

October 2020
Our paper on probabilistic DTW won the spoken term discovery track of the Zero Resource Speech Challenge 2020, held at Interspeech-2020, Shanghai, China!

August 2020
ALICE, an open-source alternative to LENA software for analyzing linguistic content in child-centered daylong audio recordings is now available here. More information from the paper available here (Behavior Research Methods, in press).

January 2020
Our paper on tracking of infant posture and spontaneous movement using a smart jumpsuit has been published in Scientific Reports. Open access full-text is available at nature.com.

Jun 2019
SylNet, our new algorithm for end-to-end automatic syllable count estimation from acoustic speech signals (Seshadri & Räsänen, IEEE Signal Processing Letters, link) is now available for download.

About Research Members Publications Resources Contact

How do human children learn to understand and produce speech without explicit teaching? What aspects of language development are built-in to our brains and bodies, and how much is actually learnable from the environment using generic cognitive skills? How can we make machines to use and understand language in the way humans do, not necessarily through textual representations, but by truly understanding and communicating meanings in the signal?

These are some of the key questions that we work on in the Speech and Cognition research group. Our primary research method is computational modeling that combines signal processing and machine learning to (potentially large-scale) language and multimodal data in order to address these questions. In addition, we work on various other topics related to speech technology and signal processing, such as development of automatic detection of neurophysiological problems in infants and development of technological tools for large-scale audio- and language data analysis.

Some selected publications

Khorrami, K. & Räsänen, O. (2025). A model of early word acquisition based on realistic-scale audiovisual naming events. Speech Communication, 167, 103169, https://doi.org/10.1016/j.specom.2024.103169. Slides of the associated ICIS-2024 presentation here.

Cruz Blandón, M. A., Gonzalez-Gomez, N., Lavechin, M., & Räsänen, O. (2025). Simulating prenatal language exposure in computational models: an exploration study. Cognition, 256, 106044, https://doi.org/10.1016/j.cognition.2024.106044.

Cruz Blandón, M. A., Cristia, A., & Räsänen, O. (2023). Introducing meta-analysis in the evaluation of computational models of infant language development. Cognitive Science, 47, e13307, https://doi.org/10.1111/cogs.13307.

Airaksinen, M., Gallen, A., Kivi, A., Vijayakrishnan, P., Häyrinen, T., Ilen, E., Räsänen, O., Haataja, L. & Vanhatalo S. (2022). Intelligent wearable allows out-of-the-lab tracking of developing motor abilities in infants. Communications Medicine, 2, 69, https://doi.org/10.1038/s43856-022-00131-6.

Khorrami, K. & Räsänen, O. (2021). Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? – A computational investigation. Language Development Research, https://doi.org/10.34842/w3vw-s845.

Räsänen, O., Seshadri, S., Lavechin, M., Cristia, A., & Casillas, M. (2021). ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings. Behavior Research Methods, 53, 818–835, https://doi.org/10.3758/s13428-020-01460-x (source code).

Räsänen, O., Doyle, G., & Frank, M. C. (2018). Pre-linguistic segmentation of speech into syllable-like units. Cognition, 171, 130–150, https://doi.org/10.1016/j.cognition.2017.11.003 (.pdf).

Räsänen, O., Kakouros, S. & Soderstrom, M. (2018). Is infant-directed speech interesting because it is surprising? — Linking properties of IDS to statistical learning and attention at the prosodic level. Cognition, 178, 193–206 (.pdf).

Räsänen, O. & Rasilo, H. (2015). A joint model of word segmentation and meaning acquisition through cross-situational learning. Psychological Review, 122(4), 792–829 (.pdf).

Contact: firstname.surname@tuni.fi