Microphone arrays provide a link between the physical locations of sound objects for the computer software and can allow capturing of the sound field. The applications of microphone arrays include physical location determination (speaker localization), enhancement, and separation. A distant speech interface system can benefit through the enhancement of the captured signal using a microphone array.

Sound localization

Sound source localization can be approached by spatial capture of signals and activity detection. During active segments, the spatial information carried by the observed wave can be extracted via microphone array processing. A standard method of source localization is based on steered response power. Microphone placement affects the capability of the array. 

 

Self-localization

The knowledge of microphone locations is required by multichannel signal processing methods relying on geometry, such as beamforming and speaker localization. Devices used as distant talking interfaces such as smartphones and laptops are ubiquitous, inherently asynchronous, and have a known microphone and loudspeaker layout. The joint utilization of such devices in geometrical multichannel signal processing applications is dependent on the accurate knowledge of the microphone placements, i.e., rotation and translation of the devices. This information is too cumbersome to measure by hand. This research focuses on the automatic localization and synchronization of the device microphones. Self-localization provides useful spatial information and enables the use of array signal processing methods developed originally for fixed arrays with known geometry.

 

Speaker position tracking

The real-life sound sources are typically non-stationary and non-continuously emitting. Therefore, to keep track of their physical location during their movement and inactivity periods, tracking methods such as Kalman filtering, extended Kalman filtering, and particle filtering has been employed. When multiple speakers are present, also the association of observation or measurement to the correct speaker, and generating and deleting tracks due to new speaker activity or inactivity must be handled. 

Beamforming and enhancement

Beamforming is the linear combination of multiple input signals using a set of complex weights for the aim of enhancing the target signal. Beamforming has a rich literature of different methods for estimating the beamforming weights with various criteria, and recent development in deep learning methods has allowed a further improvement in the estimation of such parameters. Our research contribution in this field is in the proposal of a new type of spatial features, that have been shown to be important features in learning to predict a post-filter for the beamformer, to further enhance the signal by removing unwanted interference components and noise.