Audio research group - Tampere University

The on-line and personal music collections nowadays are huge, not being limited by space requirements anymore. Such collections created the need to develop efficient music information retrieval (MIR) techniques to automatically organize and search through collections. Our research team is a world-leading research group in audio-based MIR, where the information about the music is analyzed automatically based on the sound signals of the music pieces.

Automatic musical instrument recognition

Understanding the timbre of musical instruments or drums is an important issue for automatic music transcription, music information retrieval, and computational auditory scene analysis. In particular, the recent worldwide popularization of online music distribution services and portable digital music players makes musical instrument recognition even more important. Musical instruments are one of the main criteria (besides musical genre), which can be used to search a certain type of music from music databases. Some classical music pieces are even characterized by the used musical instruments (e.g. piano sonata, string quartet).

Bibliography

incollection

Group Delay Function from All-Pole Models for Musical Instrument Recognition
Aleksandr Diment, Padmanabhan Rajan, Toni Heittola, Tuomas Virtanen, 2014

article

Modified Group Delay Feature for Musical Instrument Recognition
Aleksandr Diment, Rajan Padmanabhan, Toni Heittola, Tuomas Virtanen, 2013

conference

Semi-supervised Learning for Musical Instrument Recognition
Aleksandr Diment, Toni Heittola, Tuomas Virtanen, 2013

mastersthesis

Semi-supervised musical instrument recognition
Aleksandr Diment, 2013

conference

Multiple Instrument Mixtures Source Separation Evaluation Using Instrument-Dependent NMF Models
F. Rodriguez-Serrano, Julio José Carabias Orti, P. Vera-Candeas, Tuomas Virtanen, N. Ruiz-Reyes, 2012

article

Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization
Julio José Carabias Orti, Tuomas Virtanen, P. Vera-Candeas, N. Ruiz-Reyes, F.J. Canadas-Quesada, 2011

article

Representing Musical Sounds with an Interpolating State Model
Anssi Klapuri, Tuomas Virtanen, 2010

conference

Interpolating Hidden Markov Model and Its Application to Automatic Instrument Recognition
Tuomas Virtanen, Toni Heittola, 2009

conference

Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation
Toni Heittola, Anssi Klapuri, Tuomas Virtanen, 2009

conference

Analysis of musical instrument sounds by source-filter-decay model
Anssi Klapuri, 2007

conference

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization
Tuomas Virtanen, Anssi Klapuri, 2006

conference

Modeling musical sounds with an interpolating state model
Anssi Klapuri, Tuomas Virtanen, Marko Helén, 2005

conference

Musical instrument recognition using ICA-based transform of features and discriminatively trained HMMs
Antti Eronen, 2003

conference

Comparison of features for musical instrument recognition
Antti Eronen, 2001

conference

Musical Instrument Recognition Using Cepstral Coefficients and Temporal Features
Antti Eronen, Anssi Klapuri, 2000

inbook

Microphone-Array-Based Speech Enhancement Using Neural Networks
"Pasi Pertil\{"a, 0

Singer identification

Singing is used to produce musically relevant sounds by the human voice, and it is employed in most cultures for entertainment or self-expression. The singing voice becomes immediately the main focus of attention when we listen to musical pieces with a vocal part. Singing consists of two main aspects: melodic (represented by the time-varying pitch) and verbal (represented by the lyrics). Both the melody and the lyrics allow us to identify the song, at the same time the singing voice reflects the identity of the singer. The singing voice carries a lot of information, therefore it can be used for different music information retrieval tasks and other applications.

Most people use the singer's voice as the primary cue for identifying a song. Also, the natural classification of music, besides genre, is the artist's name (often equivalent to the singer's name). A singer identification system would be useful for music information retrieval systems in the case of identifying singers for songs. The inherent difficulties lie in the nature of the problem: the voice is usually accompanied by other musical instruments and even though humans are extremely skillful in recognizing sounds in acoustic mixtures, interfering sounds usually make the automatic recognition very difficult.

Bibliography

conference

Adaptation of a speech recognizer for singing voice
Annamaria Mesaros, Tuomas Virtanen, 2009

conference

Singer Identification in Polyphonic Music Using Vocal Separation and Pattern Recognition Methods
Annamaria Mesaros, Tuomas Virtanen, Anssi Klapuri, 2007

Automatic alignment of singing and lyrics

This topic deals with the alignment of music that contains singing voice and instrumental accompaniment with the corresponding textual lyrics, i.e., finding the temporal relationship between the two inputs. The alignment is based on the phonetic transcription of the textual lyrics that will be aligned with the phonemes from the singing voice content of the audio. The alignment can be directly applied in automated karaoke annotation systems, but it has also potential in automatic singing database labeling and keyword spotting in singing database search algorithms. The problem can be viewed as an intermediate goal in the significantly harder problem of recognizing lyrics in polyphonic audio.

Bibliography

article

Automatic recognition of lyrics in singing
Annamaria Mesaros, Tuomas Virtanen, 2010

conference

Automatic Alignment of Music Audio and Lyrics
Annamaria Mesaros, Tuomas Virtanen, 2008

Lyrics recognition

The transcription of lyrics using a large vocabulary speech recognizer (LVCSR) is still regarded as a nearly impossible task because of many aspects. First of all, the performance of automatic speech recognition using an LVCSR is limited. Second, there are important phonetic and timing differences between speech and the singing voice, that must be dealt with. Last but not least, real-world music is polyphonic. Even having a system that can recognize singing, the interference of the instrumental background would degrade significantly its performance. In polyphonic music, the lyrics recognition problem becomes more difficult, and it relies on separating the vocals from the polyphonic mixture. Still, singing and speech convey similar information and originate from the same physical model. It is plausible that singing recognition can be done using standard techniques in automatic speech recognition. Even though the results are far from being perfect, they have the potential for particular tasks such as word spotting, automatic tagging, or song retrieval.

Bibliography

article

Automatic recognition of lyrics in singing
Annamaria Mesaros, Tuomas Virtanen, 2010

conference

Automatic Alignment of Music Audio and Lyrics
Annamaria Mesaros, Tuomas Virtanen, 2008

Music transcription

Music transcription means notating previously unannotated music into symbolic form (e.g. MIDI). In order to be able to automatically transcribe music, notes and tempo or beat has to be detected. Over the years, our research group has produced several state-of-the-art results in the field of multiple fundamental frequency analysis, beat tracking, and musical meter analysis. The aim of multiple fundamental frequency analysis is to find frequencies of multiple simultaneous sounds. The aim of beat tracking is to find the rhythmic pulse in music which corresponds to the tempo of the piece and matches the "foot-tapping" times of a human listener. The meter consists of the beat (tactus) pulse, together with faster and lower pulses at different time scales.

Research results in multiple fundamental frequency analysis and musical meter analysis have been successfully applied to the transcription of polyphonic music and more specifically singing, bass line, and percussions. Our group has produced several state-of-the-art results also in the field of musical transcription.

Singing
Singing transcription aims at automatically converting a recorded singing signal into a parametric representation, e.g., a MIDI file. Examples of singing melody transcription in monophonic and polyphonic music.
Bass line
Automatically transcribe the bass line in polyphonic music signals.
Percussion
To recognize the percussive content (drums) from a musical performance and create a symbolic representation from it. Given an input signal, the system creates a score of the played drums.
Polyphonic music
Polyphonic music transcription aims at transcribing simultaneously sounding notes played with pitched instruments from real-world music from any musical genre.

Bibliography

conference

Automatic Scoring of Guitar Chords
Fawad Mazhar, Toni Heittola, Tuomas Virtanen, Jukka Holm, 2012

conference

Drum transcription from multichannel recordings with non-negative matrix factorization
David Dos Santos Alves, Jouni Paulus, Jose Fonseca, 2009

article

Automatic Transcription of Melody, Bass Line, and Chords in Polyphonic Music
Matti Ryynänen, Anssi Klapuri, 2008

article

Automatic music transcription as we know it today
Anssi Klapuri, 2004

conference

Multipitch estimation and sound separation by the spectral smoothness principle
Anssi Klapuri, 2001

conference

Automatic transcription of musical recordings
Anssi Klapuri, Tuomas Virtanen, Antti Eronen, Jarno Seppänen, 2001

conference

Robust multipitch estimation for the analysis and manipulation of polyphonic musical signals
Anssi Klapuri, Tuomas Virtanen, Jan-Markus Holm, 2000

conference

Qualitative and quantitative aspects in the design of periodicity estimation algorithms
Anssi Klapuri, 2000

conference

Pitch Estimation Using Multiple Independent Time-Frequency Windows
Anssi Klapuri, 1999

conference

Wide-band Pitch Estimation for Natural Sound Sources with Inharmonicities
Anssi Klapuri, 1999

conference

Sound Onset Detection by Applying Psychoacoustic Knowledge
Anssi Klapuri, 1999

Music structure analysis

Music structure analysis means subdividing a musical piece into parts and sections at the largest time-scale. Especially popular music pieces have a distinct structure defined by repetitions of different parts (e.g., verse and chorus). Being able to infer the structure from the audio enables several applications, such as easier navigation within the piece, music thumbnailing, and mash-ups.

Bibliography

conference

Music structure analysis with a probabilistic fitness function in MIREX2009
Jouni Paulus, Anssi Klapuri, 2009

incollection

Labelling the Structural Parts of a Music Piece with Markov Models
Jouni Paulus, Anssi Klapuri, 2009

article

Music Structure Analysis Using a Probabilistic Fitness Measure and a Greedy Search Algorithm
Jouni Paulus, Anssi Klapuri, 2009

conference

Music Structure Analysis Using a Probabilistic Fitness Measure And an Integrated Musicological Model
Jouni Paulus, Anssi Klapuri, 2008

conference

Acoustic Features for Music Piece Structure Analysis
Jouni Paulus, Anssi Klapuri, 2008

conference

Labelling the Structural Parts of a Music Piece with Markov Models
Jouni Paulus, Anssi Klapuri, 2008

Music classification

Classification of music according to the style, the genre or the musical instruments involved.

Bibliography

mastersthesis

Automatic Classification of Music Signals
Toni Heittola, 2004

Sound source separation

In music, there are often instruments or singers active at the same time, which makes the automatic analysis difficult. In sound source separation, the key idea is to estimate the signal produced by each sound source from a mixture signal consisting of several sources.

Structured audio coding

In object-based audio coding, the original signal is represented as a set of objects that have time dependent gain. The extraction of these sound objects makes the object-based coding to be closely related to sound source separation. Recently, the non-negative matrix factorization (NMF) applied to audio spectrogram has been studied for sound separation purposes, and it has provided promising results for extracting the sound sources from a mixture signal. Therefore a study of audio compression based on NMF representation was launched by our research team.

Music analysis

Automatic musical instrument recognition

Bibliography

Singer identification

Bibliography

Automatic alignment of singing and lyrics

Bibliography

Lyrics recognition

Bibliography

Music transcription

Bibliography

Music structure analysis

Bibliography

Music classification

Bibliography

Sound source separation

Structured audio coding