446 research outputs found

    Ad Hoc Microphone Array Calibration: Euclidean Distance Matrix Completion Algorithm and Theoretical Guarantees

    Get PDF
    This paper addresses the problem of ad hoc microphone array calibration where only partial information about the distances between microphones is available. We construct a matrix consisting of the pairwise distances and propose to estimate the missing entries based on a novel Euclidean distance matrix completion algorithm by alternative low-rank matrix completion and projection onto the Euclidean distance space. This approach confines the recovered matrix to the EDM cone at each iteration of the matrix completion algorithm. The theoretical guarantees of the calibration performance are obtained considering the random and locally structured missing entries as well as the measurement noise on the known distances. This study elucidates the links between the calibration error and the number of microphones along with the noise level and the ratio of missing distances. Thorough experiments on real data recordings and simulated setups are conducted to demonstrate these theoretical insights. A significant improvement is achieved by the proposed Euclidean distance matrix completion algorithm over the state-of-the-art techniques for ad hoc microphone array calibration.Comment: In Press, available online, August 1, 2014. http://www.sciencedirect.com/science/article/pii/S0165168414003508, Signal Processing, 201

    Self-Localization of Ad-Hoc Arrays Using Time Difference of Arrivals

    Get PDF
    This work was supported by the U.K. Engineering and Physical Sciences Research Council (EPSRC) under Grant EP/K007491/1

    Dual input neural networks for positional sound source localization

    Full text link
    In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a high dimensional, multichannel audio signals received by many distributed microphones is combined with information describing acoustic properties of the scene, such as the microphones' coordinates in space, to estimate the position of a sound source. We introduce Dual Input Neural Networks (DI-NNs) as a simple and effective way to model these two data types in a neural network. We train and evaluate our proposed DI-NN on scenarios of varying difficulty and realism and compare it against an alternative architecture, a classical Least-Squares (LS) method as well as a classical Convolutional Recurrent Neural Network (CRNN). Our results show that the DI-NN significantly outperforms the baselines, achieving a five times lower localization error than the LS method and two times lower than the CRNN in a test dataset of real recordings

    Detecting multiple, simultaneous talkers through localising speech recorded by ad-hoc microphone arrays

    Get PDF
    This paper proposes a novel approach to detecting multiple, simultaneous talkers in multi-party meetings using localisation of active speech sources recorded with an ad-hoc microphone array. Cues indicating the relative distance between sources and microphones are derived from speech signals and room impulse responses recorded by each of the microphones distributed at unknown locations within a room. Multiple active sources are localised by analysing a surface formed from these cues and derived at different locations within the room. The number of localised active sources per each frame or utterance is then counted to estimate when multiple sources are active. The proposed approach does not require prior information about the number and locations of sources or microphones. Synchronisation between microphones is also not required. A meeting scenario with competing speakers is simulated and results show that simultaneously active sources can be detected with an average accuracy of 75% and the number of active sources counted accurately 65% of the time

    Acoustic sensor network geometry calibration and applications

    Get PDF
    In the modern world, we are increasingly surrounded by computation devices with communication links and one or more microphones. Such devices are, for example, smartphones, tablets, laptops or hearing aids. These devices can work together as nodes in an acoustic sensor network (ASN). Such networks are a growing platform that opens the possibility for many practical applications. ASN based speech enhancement, source localization, and event detection can be applied for teleconferencing, camera control, automation, or assisted living. For this kind of applications, the awareness of auditory objects and their spatial positioning are key properties. In order to provide these two kinds of information, novel methods have been developed in this thesis. Information on the type of auditory objects is provided by a novel real-time sound classification method. Information on the position of human speakers is provided by a novel localization and tracking method. In order to localize with respect to the ASN, the relative arrangement of the sensor nodes has to be known. Therefore, different novel geometry calibration methods were developed. Sound classification The first method addresses the task of identification of auditory objects. A novel application of the bag-of-features (BoF) paradigm on acoustic event classification and detection was introduced. It can be used for event and speech detection as well as for speaker identification. The use of both mel frequency cepstral coefficient (MFCC) and Gammatone frequency cepstral coefficient (GFCC) features improves the classification accuracy. By using soft quantization and introducing supervised training for the BoF model, superior accuracy is achieved. The method generalizes well from limited training data. It is working online and can be computed in a fraction of real-time. By a dedicated training strategy based on a hierarchy of stationarity, the detection of speech in mixtures with noise was realized. This makes the method robust against severe noises levels corrupting the speech signal. Thus it is possible to provide control information to a beamformer in order to realize blind speech enhancement. A reliable improvement is achieved in the presence of one or more stationary noise sources. Speaker localization The localization method enables each node to determine the direction of arrival (DoA) of concurrent sound sources. The author's neuro-biologically inspired speaker localization method for microphone arrays was refined for the use in ASNs. By implementing a dedicated cochlear and midbrain model, it is robust against the reverberation found in indoor rooms. In order to better model the unknown number of concurrent speakers, an application of the EM algorithm that realizes probabilistic clustering according to auditory scene analysis (ASA) principles was introduced. Based on this approach, a system for Euclidean tracking in ASNs was designed. Each node applies the node wise localization method and shares probabilistic DoA estimates together with an estimate of the spectral distribution with the network. As this information is relatively sparse, it can be transmitted with low bandwidth. The system is robust against jitter and transmission errors. The information from all nodes is integrated according to spectral similarity to correctly associate concurrent speakers. By incorporating the intersection angle in the triangulation, the precision of the Euclidean localization is improved. Tracks of concurrent speakers are computed over time, as is shown with recordings in a reverberant room. Geometry calibration The central task of geometry calibration has been solved with special focus on sensor nodes equipped with multiple microphones. Novel methods were developed for different scenarios. An audio-visual method was introduced for the calibration of ASNs in video conferencing scenarios. The DoAs estimates are fused with visual speaker tracking in order to provide sensor positions in a common coordinate system. A novel acoustic calibration method determines the relative positioning of the nodes from ambient sounds alone. Unlike previous methods that only infer the positioning of distributed microphones, the DoA is incorporated and thus it becomes possible to calibrate the orientation of the nodes with a high accuracy. This is very important for all applications using the spatial information, as the triangulation error increases dramatically with bad orientation estimates. As speech events can be used, the calibration becomes possible without the requirement of playing dedicated calibration sounds. Based on this, an online method employing a genetic algorithm with incremental measurements was introduced. By using the robust speech localization method, the calibration is computed in parallel to the tracking. The online method is be able to calibrate ASNs in real time, as is shown with recordings of natural speakers in a reverberant room. The informed acoustic sensor network All new methods are important building blocks for the use of ASNs. The online methods for localization and calibration both make use of the neuro-biologically inspired processing in the nodes which leads to state-of-the-art results, even in reverberant enclosures. The high robustness and reliability can be improved even more by including the event detection method in order to exclude non-speech events. When all methods are combined, both semantic information on what is happening in the acoustic scene as well as spatial information on the positioning of the speakers and sensor nodes is automatically acquired in real time. This realizes truly informed audio processing in ASNs. Practical applicability is shown by application to recordings in reverberant rooms. The contribution of this thesis is thus not only to advance the state-of-the-art in automatically acquiring information on the acoustic scene, but also pushing the practical applicability of such methods

    Ad-Hoc Microphone Array Calibration from Partial Distance Measurements

    Get PDF
    We address the problem of ad~hoc microphone array calibration where some of the distances between the microphones can not be measured. The conventional techniques require information about all the distances for accurate reconstruction of the array geometry. To alleviate this condition, we propose to exploit the properties of Euclidean distance matrices within the framework of low-rank matrix completion to recover the missing entries. We provide rigorous analysis to bound the calibration error using noisy measurements. This study elucidates the links between the performance and the structure of the missing distances, along with the size of the network. The experiments carried out on real data recordings demonstrate these theoretical insights. A significant improvement is achieved by the proposed Euclidean distance matrix completion algorithm over the state-of-the-art techniques for ad~hoc microphone array calibration
    • …
    corecore