Source Separation and Reconstruction of Spatial Audio Using Spectrogram Factorization
This chapter introduces methods for factorizing the spectrogram of multichannel audio into repetitive spectral objects and apply the introduced models to the analysis of spatial audio and modification of spatial sound through source separation. The purpose of decomposing an audio spectrogram using spectral templates is to learn the underlying structures (audio objects) from the observed data. The chapter discusses two main scenarios such as parameterization of multichannel surround sound and parameterization of microphone array signals. It explains the principles of source separation by time-frequency filtering using separation masks constructed from the spectrogram models. The chapter introduces a spatial covariance matrix model based on the directions of arrival of sound events and spectral templates, and discusses its relationship to conventional spatial audio signal processing. Source separation using spectrogram factorization models is achieved via time- frequency filtering of the original observation short-time Fourier transform (STFT) by a generalized Wiener filter obtained from the spectrogram model parameters.
- Ville Pulkki and Symeon Delikaris-Manias and Archontis Politis
- John Wiley & Sons