Speech and Cognition research group

Speech and Cognition Research Group

Unit of Computing Sciences,
Faculty of Information Technology and Communication Sciences,
Tampere University,
Finland

News

September 2025
How to get enough representative yet controllable speech data to train computatinal models of infant language learning? Check out our solution for this at https://doi.org/10.3758/s13428-025-02772-6 published in Behavior Research Methods.

June 2025
Did you know that substantial proportions of child-centered long-form audio recordings can be transcribed automatically? We now have a solution for this long-standing technical problem: https://arxiv.org/abs/2506.11747.

April 2025
Are you looking for a self-supervised learning algorithm for time-series data that works out-of-the-box on different data types, including small datasets, and doesn't suffer from representation collapse? If so, check this out https://doi.org/10.1109/ACCESS.2025.3556957.

March 2025
We've just built and published a simulation setup to model statistical language learning from speech input already before (and after) birth. Check out our new paper published in Cognition: https://doi.org/10.1016/j.cognition.2024.106044.

January 2025
Can infants use statistical learning to learn phonemes, words, and word meanings merely from realistic amount of speech and visual input they perceive during the first year of life, and without any linguistic priors? Our new model suggests that this is indeed the case: https://doi.org/10.1016/j.specom.2024.103169.

About Research Members Publications Resources Contact

How do human children learn to understand and produce speech without explicit teaching? What aspects of language development are built-in to our brains and bodies, and how much is actually learnable from the environment using generic cognitive skills? How can we make machines to use and understand language in the way humans do, not necessarily through textual representations, but by truly understanding and communicating meanings in the signal?

These are some of the key questions that we work on in the Speech and Cognition research group. Our primary research method is computational modeling that combines signal processing and machine learning to (potentially large-scale) language and multimodal data in order to address these questions. In addition, we work on various other topics related to speech technology and signal processing, such as development of automatic detection of neurophysiological problems in infants and development of technological tools for large-scale audio- and language data analysis.

Selected publications

Räsänen, O. & Kocharov, D. (in press). A pipeline for stochastic and controlled generation of realistic language input for simulating infant language acquisition. Behavior Research Methods. https://doi.org/10.3758/s13428-025-02772-6.

Khorrami, K. & Räsänen, O. (2025). A model of early word acquisition based on realistic-scale audiovisual naming events. Speech Communication, 167, 103169, https://doi.org/10.1016/j.specom.2024.103169. Slides of the associated ICIS-2024 presentation here.

Cruz Blandón, M. A., Gonzalez-Gomez, N., Lavechin, M., & Räsänen, O. (2025). Simulating prenatal language exposure in computational models: an exploration study. Cognition, 256, 106044, https://doi.org/10.1016/j.cognition.2024.106044.

Cruz Blandón, M. A., Cristia, A., & Räsänen, O. (2023). Introducing meta-analysis in the evaluation of computational models of infant language development. Cognitive Science, 47, e13307, https://doi.org/10.1111/cogs.13307.

Airaksinen, M., Gallen, A., Kivi, A., Vijayakrishnan, P., Häyrinen, T., Ilen, E., Räsänen, O., Haataja, L. & Vanhatalo S. (2022). Intelligent wearable allows out-of-the-lab tracking of developing motor abilities in infants. Communications Medicine, 2, 69, https://doi.org/10.1038/s43856-022-00131-6.

Khorrami, K. & Räsänen, O. (2021). Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? – A computational investigation. Language Development Research, https://doi.org/10.34842/w3vw-s845.

Räsänen, O., Seshadri, S., Lavechin, M., Cristia, A., & Casillas, M. (2021). ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings. Behavior Research Methods, 53, 818–835, https://doi.org/10.3758/s13428-020-01460-x (source code).

Räsänen, O., Doyle, G., & Frank, M. C. (2018). Pre-linguistic segmentation of speech into syllable-like units. Cognition, 171, 130–150, https://doi.org/10.1016/j.cognition.2017.11.003 (.pdf).

Räsänen, O., Kakouros, S. & Soderstrom, M. (2018). Is infant-directed speech interesting because it is surprising? — Linking properties of IDS to statistical learning and attention at the prosodic level. Cognition, 178, 193–206 (.pdf).

Räsänen, O. & Rasilo, H. (2015). A joint model of word segmentation and meaning acquisition through cross-situational learning. Psychological Review, 122(4), 792–829 (.pdf).

Contact: firstname.surname@tuni.fi