68 research outputs found

    Array processing based on time-frequency analysis and higher-order statistics

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Source Separation for Hearing Aid Applications

    Get PDF

    Multiple source localization using spherical microphone arrays

    Get PDF
    Direction-of-Arrival (DOA) estimation is a fundamental task in acoustic signal processing and is used in source separation, localization, tracking, environment mapping, speech enhancement and dereverberation. In applications such as hearing aids, robot audition, teleconferencing and meeting diarization, the presence of multiple simultaneously active sources often occurs. Therefore DOA estimation which is robust to Multi-Source (MS) scenarios is of particular importance. In the past decade, interest in Spherical Microphone Arrays (SMAs) has been rapidly grown due to its ability to analyse the sound field with equal resolution in all directions. Such symmetry makes SMAs suitable for applications in robot audition where potential variety of heights and positions of the talkers are expected. Acoustic signal processing for SMAs is often formulated in the Spherical Harmonic Domain (SHD) which describes the sound field in a form that is independent of the geometry of the SMA. DOA estimation methods for the real-world scenarios address one or more performance degrading factors such as noise, reverberation, multi-source activity or tackled problems such as source counting or reducing computational complexity. This thesis addresses various problems in MS DOA estimation for speech sources each of which focuses on one or more performance degrading factor(s). Firstly a narrowband DOA estimator is proposed utilizing high order spatial information in two computationally efficient ways. Secondly, an autonomous source counting technique is proposed which uses density-based clustering in an evolutionary framework. Thirdly, a confidence metric for validity of Single Source (SS) assumption in a Time-Frequency (TF) bin is proposed. It is based on MS assumption in a short time interval where the number and the TF bin of active sources are adaptively estimated. Finally two analytical narrowband MS DOA estimators are proposed based on MS assumption in a TF bin. The proposed methods are evaluated using simulations and real recordings. Each proposed technique outperforms comparative baseline methods and performs at least as accurately as the state-of-the-art.Open Acces

    High-resolution imaging methods in array signal processing

    Get PDF

    Object-based Modeling of Audio for Coding and Source Separation

    Get PDF
    This thesis studies several data decomposition algorithms for obtaining an object-based representation of an audio signal. The estimation of the representation parameters are coupled with audio-specific criteria, such as the spectral redundancy, sparsity, perceptual relevance and spatial position of sounds. The objective is to obtain an audio signal representation that is composed of meaningful entities called audio objects that reflect the properties of real-world sound objects and events. The estimation of the object-based model is based on magnitude spectrogram redundancy using non-negative matrix factorization with extensions to multichannel and complex-valued data. The benefits of working with object-based audio representations over the conventional time-frequency bin-wise processing are studied. The two main applications of the object-based audio representations proposed in this thesis are spatial audio coding and sound source separation from multichannel microphone array recordings. In the proposed spatial audio coding algorithm, the audio objects are estimated from the multichannel magnitude spectrogram. The audio objects are used for recovering the content of each original channel from a single downmixed signal, using time-frequency filtering. The perceptual relevance of modeling the audio signal is considered in the estimation of the parameters of the object-based model, and the sparsity of the model is utilized in encoding its parameters. Additionally, a quantization of the model parameters is proposed that reflects the perceptual relevance of each quantized element. The proposed object-based spatial audio coding algorithm is evaluated via listening tests and comparing the overall perceptual quality to conventional time-frequency block-wise methods at the same bitrates. The proposed approach is found to produce comparable coding efficiency while providing additional functionality via the object-based coding domain representation, such as the blind separation of the mixture of sound sources in the encoded channels. For the sound source separation from multichannel audio recorded by a microphone array, a method combining an object-based magnitude model and spatial covariance matrix estimation is considered. A direction of arrival-based model for the spatial covariance matrices of the sound sources is proposed. Unlike the conventional approaches, the estimation of the parameters of the proposed spatial covariance matrix model ensures a spatially coherent solution for the spatial parameterization of the sound sources. The separation quality is measured with objective criteria and the proposed method is shown to improve over the state-of-the-art sound source separation methods, with recordings done using a small microphone array

    Spatial dissection of a soundfield using spherical harmonic decomposition

    Get PDF
    A real-world soundfield is often contributed by multiple desired and undesired sound sources. The performance of many acoustic systems such as automatic speech recognition, audio surveillance, and teleconference relies on its ability to extract the desired sound components in such a mixed environment. The existing solutions to the above problem are constrained by various fundamental limitations and require to enforce different priors depending on the acoustic condition such as reverberation and spatial distribution of sound sources. With the growing emphasis and integration of audio applications in diverse technologies such as smart home and virtual reality appliances, it is imperative to advance the source separation technology in order to overcome the limitations of the traditional approaches. To that end, we exploit the harmonic decomposition model to dissect a mixed soundfield into its underlying desired and undesired components based on source and signal characteristics. By analysing the spatial projection of a soundfield, we achieve multiple outcomes such as (i) soundfield separation with respect to distinct source regions, (ii) source separation in a mixed soundfield using modal coherence model, and (iii) direction of arrival (DOA) estimation of multiple overlapping sound sources through pattern recognition of the modal coherence of a soundfield. We first employ an array of higher order microphones for soundfield separation in order to reduce hardware requirement and implementation complexity. Subsequently, we develop novel mathematical models for modal coherence of noisy and reverberant soundfields that facilitate convenient ways for estimating DOA and power spectral densities leading to robust source separation algorithms. The modal domain approach to the soundfield/source separation allows us to circumvent several practical limitations of the existing techniques and enhance the performance and robustness of the system. The proposed methods are presented with several practical applications and performance evaluations using simulated and real-life dataset
    corecore