Spatial and Content-based Audio Processing using Stochastic Optimization Methods

Mäkinen, Toni

Stochastic optimization (SO) represents a category of numerical optimization approaches, in which the search for the optimal solution involves randomness in a constructive manner. As shown also in this thesis, the stochastic optimization techniques and models have become an important and notable paradigm in a wide range of application areas, including transportation models, financial instruments, and network design. Stochastic optimization is especially developed for solving the problems that are either too difficult or impossible to solve analytically by deterministic optimization approaches. In this thesis, the focus is put on applying several stochastic optimization algorithms to two audio-specific application areas, namely sniper positioning and content-based audio classification and retrieval. In short, the first application belongs to an area of spatial audio, whereas the latter is a topic of machine learning and, more specifically, multimedia information retrieval. The SO algorithms considered in the thesis are particle filtering (PF), particle swarm optimization (PSO), and simulated annealing (SA), which are extended, combined and applied to the specified problems in a novel manner. Based on their iterative and evolving nature, especially the PSO algorithms are often included to the category of evolutionary algorithms. Considering the sniper positioning application, in this thesis the PF and SA algorithms are employed to optimize the parameters of a mathematical shock wave model based on observed firing event wavefronts. Such an inverse problem is suitable for Bayesian approach, which is the main motivation for including the PF approach among the considered optimization methods. It is shown – also with SA – that by applying the stated shock wave model, the proposed stochastic parameter estimation approach provides statistically reliable and qualified results. The content-based audio classification part of the thesis is based on a dedicated framework consisting of several individual binary classifiers. In this work, artificial neural networks (ANNs) are used within the framework, for which the parameters and network structures are optimized based the desired item outputs, i.e. the ground truth class labels. The optimization process is carried out using a multi-dimensional extension of the regular PSO algorithm (MD PSO). The audio retrieval experiments are performed in the context of feature generation (synthesis), which is an approach for generating new audio features/attributes based on some conventional features originally extracted from a particular audio database. Here the MD PSO algorithm is applied to optimize the parameters of the feature generation process, wherein the dimensionality of the generated feature vector is also optimized. Both from practical perspective and the viewpoint of complexity theory, stochastic optimization techniques are often computationally demanding. Because of this, the practical implementations discussed in this thesis are designed as directly applicable to parallel computing. This is an important and topical issue considering the continuous increase of computing grids and cloud services. Indeed, many of the results achieved in this thesis are computed using a grid of several computers. Furthermore, since also personal computers and mobile handsets include an increasing number of processor cores, such parallel implementations are not limited to grid servers only.

Research areas

Awarding institution:Tampereen teknillinen yliopisto - Tampere University of Technology<br/>Submitter:Submitted by Toni Mäkinen ( on 2013-05-08T13:23:45Z No. of bitstreams: 1 Makinen.pdf: 7970430 bytes, checksum: cbecd78d78c862fd2e1ea9e24fde900f (MD5)<br/>Submitter:Approved for entry into archive by Kaisa Kulkki ( on 2013-05-15T09:52:40Z (GMT) No. of bitstreams: 1 Makinen.pdf: 7970430 bytes, checksum: cbecd78d78c862fd2e1ea9e24fde900f (MD5)<br/>Submitter:Made available in DSpace on 2013-05-15T09:52:40Z (GMT). No. of bitstreams: 1 Makinen.pdf: 7970430 bytes, checksum: cbecd78d78c862fd2e1ea9e24fde900f (MD5)