Deprecated: Array and string offset access syntax with curly braces is deprecated in /var/www/html/arg/administrator/components/com_jresearch/tables/publication.php on line 150

Deprecated: Array and string offset access syntax with curly braces is deprecated in /var/www/html/arg/administrator/components/com_jresearch/tables/publication.php on line 166

Automatic musical instrument recognition

Eronen, Antti

This thesis concerns the automatic recognition of musical instruments, where the idea is to build computer systems that “listen” to musical sounds and recognize which instrument is playing. Experimental material consisted of 5286 single notes from Western orchestral instruments, the timbre of which have been studied in great depth. The literature review part of this thesis introduces the studies on the sound of musical instruments, as well as related knowledge on instrument acoustics. Together with the state-of-the-art in automatic sound source recognition systems, these form the foundation for the most important part of this thesis: the extraction of perceptually relevant features from acoustic musical signals. Several different feature extraction algorithms were implemented and developed, and used as a front-end for a pattern recognition system. The performance of the system was evaluated in several experiments. Using feature vectors that included cepstral coefficients and features relating to the type of excitation, brightness, modulations, asynchronity and fundamental frequency of tones, an accuracy of 35 % was obtained on a database including several examples of 29 instruments. The recognition of the family of the instrument between six possible classes was successful in 77 % of the cases. The performance of the system and the confusions it made were compared to the results reported for human perception. The comparison shows that the performance of the system is worse than that of humans in a similar task (46 % in individual instrument and 92 % in instrument family recognition [Martin99]), although it is comparable to the performance of other reported systems. Confusions of the system resemble those of human subjects, indicating that the feature extraction algorithms have managed to capture perceptually relevant information from the acoustic signals.