14 research outputs found

    Detecting multiple, simultaneous talkers through localising speech recorded by ad-hoc microphone arrays

    Get PDF
    This paper proposes a novel approach to detecting multiple, simultaneous talkers in multi-party meetings using localisation of active speech sources recorded with an ad-hoc microphone array. Cues indicating the relative distance between sources and microphones are derived from speech signals and room impulse responses recorded by each of the microphones distributed at unknown locations within a room. Multiple active sources are localised by analysing a surface formed from these cues and derived at different locations within the room. The number of localised active sources per each frame or utterance is then counted to estimate when multiple sources are active. The proposed approach does not require prior information about the number and locations of sources or microphones. Synchronisation between microphones is also not required. A meeting scenario with competing speakers is simulated and results show that simultaneously active sources can be detected with an average accuracy of 75% and the number of active sources counted accurately 65% of the time

    Multi-Channel Compression and Coding of Reverberant Ad-Hoc Recordings Through Spatial Autoregressive Modelling

    No full text
    Autoregressive modelling techniques such as multi-channel linear prediction are widely used for applications such as coding, dereverberation and compression of the speech signals. State of the art multi-channel linear prediction methods do not take into account the locations of the microphones and assume single distance compact microphone arrays. In this paper a spatially modified multichannel autoregressive compression and coding method is proposed and successfully tested in order to adapt the standard multi-channel method to the virtual reality and immersive video conferencing applications where the microphones can be meters away from each other. The proposed method estimates the spatial distances between each microphone and the source to optimise the joint compression of the signals recorded within a wide area. The results suggest that the proposed method outperforms the standard multi-channel compression and coding when applied to the ad-hoc scenarios

    Informed source location and DOA estimation using acoustic room impulse response parameters

    No full text
    This paper investigates sound source localisation using an ad-hoc microphone array, where the microphones arbitrarily distributed in a room without knowledge of their locations. The proposed method is applicable to scenarios where microphones positions do not comply with conventional microphone arrays such as a Uniform Linear Array (ULA) and large aperture microphone panels. The proposed method utilizes the amplitude attenuation of the Room Impulse Responses (RIRs) and Time Difference Of Arrivals (TDOA) cues to fit two surfaces across the room and estimate the location of the source. The areas with the least TDOA and the highest direct path impulse amplitude are highlighted as the source areas. Also proposed is an approach to forming subsets of microphones that are chosen such that they can localise the source with the highest accuracy when compared to using all microphones in the room. The approach is investigated for a various numbers of source positions and including additive noise. Results show that the localisation accuracy is increased by utilizing the formed subset of microphones

    Blind speaker counting in highly reverberant environments by clustering coherence features

    No full text
    This paper proposes the use of the frequency- domain Magnitude Squared Coherence (MSC) between two ad- hoc recordings of speech as a reliable speaker discrimination feature for source counting applications in highly reverberant environments. The proposed source counting method does not require knowledge of the microphone spacing and does not assume any relative distance between the sources and the microphones. Source counting is based on clustering the frequency domain MSC of the speech signals derived from short time segments. Experiments show that the frequency domain MSC is speaker-dependent and the method was successfully used to obtain highly accurate source counting results for up to six active speakers for varying levels of reverberation and microphone spacing

    Coherence Based Source Counter: v1.0.1

    No full text
    Included verbose option to suppress rir_generator output to command window and included missing tightPlots package

    A Survey on Ad Hoc Signal Processing: Applications, Challenges and State-of-the-Art Techniques

    No full text
    © 2019 IEEE. In an era of ubiquitous digital devices with built-in microphones and recording capability, distributed microphone arrays of a few digital recording devices are the emerging recording tool in hands-free speech communications and immersive meetings. Such so-called ad hoc microphone arrays can facilitate high-quality spontaneous recording experiences for a wide range of applications and scenarios, though critical challenges have limited their applications. These challenges include unknown and changeable positions of the recording devices and sound sources, resulting in varying time delays of arrival between microphones in the ad hoc array as well as varying recorded sound power levels. This paper reviews state-of-the-art techniques to overcome these issues and provides insight into possible ways to make existing methods more effective and flexible. The focus of this paper is on scenarios in which the microphones are arbitrarily located in an acoustic scene and do not communicate directly or through a fusion centre

    Forming ad-hoc microphone arrays through clustering of acoustic room impulse responses

    No full text
    This paper investigates the formation of ad-hoc microphone arrays for the purpose of recording multiple sound sources by clustering microphones spatially distributed within a room. A novel codebook-based unsupervised method for cluster formation using features derived from the Room Impulse Responses (RIRs) corresponding to each microphone is proposed and compared with baseline clustering and classification methods. The features correspond to the sequence of arrival time and time delays of echoes as estimated by peaks of the RIRs along with peak amplitudes. Results suggest that the proposed codebook based clustering algorithm can outperform KNN supervised classification method and kmeans unsupervised clustering method applied to microphone segmentation and clustering, in terms of clustering success rate and noise robustness

    Distributed microphone arrays, emerging speech and audio signal processing platforms: A review

    No full text
    © 2020 ASTES Publishers. All rights reserved. Given ubiquitous digital devices with recording capability, distributed microphone arrays are emerging recording tools for hands-free communications and spontaneous tele-conferencings. However, the analysis of signals recorded with diverse sampling rates, time delays, and qualities by distributed microphone arrays is not straightforward and entails important considerations. The crucial challenges include the unknown/changeable geometry of distributed arrays, asynchronous recording, sampling rate mismatch, and gain inconsistency. Researchers have recently proposed solutions to these problems for applications such as source localization and dereverberation, though there is less literature on real-time practical issues. This article reviews recent research on distributed signal processing techniques and applications. New applications benefitting from the wide coverage of distributed microphones are reviewed and their limitations are discussed. This survey does not cover partially or fully connected wireless acoustic sensor networks

    Towards real-time source counting by estimation of coherent-to-diffuse ratios from ad-hoc microphone array recordings

    No full text
    Coherent-to-diffuse ratio (CDR) estimates over short time frames are utilized for source counting using ad-hoc microphone arrays to record speech from multiple participants in scenarios such as a meeting. It is shown that the CDR estimates obtained at ad-hoc dual (two channel) microphone nodes, located at unknown locations within an unknown reverberant room, can detect time frames with more than one active source and are informative for source counting applications. Results show that interfering sources can be detected with accuracies ranging from 69% to 89% for delays ranging from 20 ms to 300 ms, with source counting accuracies ranged from 61% to 81% for two sources and the same range of delays
    corecore