16 research outputs found

    Detecting multiple, simultaneous talkers through localising speech recorded by ad-hoc microphone arrays

    Get PDF
    This paper proposes a novel approach to detecting multiple, simultaneous talkers in multi-party meetings using localisation of active speech sources recorded with an ad-hoc microphone array. Cues indicating the relative distance between sources and microphones are derived from speech signals and room impulse responses recorded by each of the microphones distributed at unknown locations within a room. Multiple active sources are localised by analysing a surface formed from these cues and derived at different locations within the room. The number of localised active sources per each frame or utterance is then counted to estimate when multiple sources are active. The proposed approach does not require prior information about the number and locations of sources or microphones. Synchronisation between microphones is also not required. A meeting scenario with competing speakers is simulated and results show that simultaneously active sources can be detected with an average accuracy of 75% and the number of active sources counted accurately 65% of the time

    Analysis and Enhancement of Spatial Sound Scenes Recorded using Ad-Hoc Microphone Arrays

    Get PDF
    Ad-hoc microphone arrays formed from the microphones of mobile devices such as smart phones, tablets and notebooks are emerging recording platforms for meetings, press conferences and other sound scenes. As opposed to the Wireless Acoustic Sensor Networks (WASN), ad-hoc microphones do not communicate within the array and location of each microphone is unknown. Analysing speech signals and the acoustic scene in the context of ad-hoc microphones is the goal of this thesis. Despite conventional known geometry microphone arrays (e.g. a Uniform Linear array), ad-hoc arrays do not have fixed geometries and structures and therefore standard speech processing techniques such as beamforming and dereverbearion techniques cannot be directly applied to these. The main reasons for this include unknown distances between microphones an hence unknown relative time delays and the changeable array topology. This thesis focuses on utilising the side information obtained by the acoustic scene analysis to improve the speech enhancement by ad-hoc microphone arrays randomly distributed within a reverberant environment. New discriminative features are proposed, applied and tested for various signal and audio processing applications such as microphone clustering, source localisation, multi-channel dereverberation, source counting and multi-talk detection. The main contributions of this thesis fall into two categories: 1) Novel spatial features extracted from Room Impulse Responses (RIRs) and speech signals 2) Speech enhancement and acoustic scene analysis methods specifically designed for the ad-hoc arrays. Microphone clustering, source localisation, speech enhancement, source counting and multi-talk detection in the context of ad-hoc arrays are investigated in this thesis and novel methods are proposed and tested. A clustered speech enhancement and dereverberation method tailored for the ad-hoc microphones is proposed and it is concluded that exclusively using a cluster of microphones located closer to the source, improves the dereverberation performance. Also proposed is a multi-channel speech dereverberation method based on a novel spatial multi-channel linear prediction analysis approach for the ad-hoc microphones. The spatially modified multi-channel linear prediction approach takes into account the estimated relative distances between the source and the microphones and improves the dereverberation performance. The coherence based features are applied for multi-talk detection and source counting in highly reverberant environments and it is shown that the proposed features are reliable source counting features in the context of ad-hoc microphones. Highly accurate offline source counting and pseudo real-time multi-talk detection results are achieved by the proposed methods

    Clustered multi-channel dereverberation for ad hoc microphone arrays

    Get PDF
    A novel unsupervised multi-channel dereverberation approach in ad-hoc microphone arrays context based on removing microphones with relatively higher level of reverberation from the array and applying the dereverberation method on a subset of microphones with lower level of reverberation is proposed in this paper. This approach does not require any prior information about the number of microphones and their relative locations, however based on kurtosis of Linear Prediction (LP) residual signals, microphones located close to the active source are detected and utilized for the dereverberation process. The proposed method is a clustered enhancement method which can be applied with any dereverberation algorithm. The proposed method is not dependent on the recording setup so it requires no predefined threshold and it can be applied to unknown rooms with unseen speakers. Dereverberation results suggest that regardless of the applied dereverberation method, using a consciously chosen subset of microphones always yield better dereverberation results compared to blind use of all microphones

    Multi-Channel Compression and Coding of Reverberant Ad-Hoc Recordings Through Spatial Autoregressive Modelling

    No full text
    Autoregressive modelling techniques such as multi-channel linear prediction are widely used for applications such as coding, dereverberation and compression of the speech signals. State of the art multi-channel linear prediction methods do not take into account the locations of the microphones and assume single distance compact microphone arrays. In this paper a spatially modified multichannel autoregressive compression and coding method is proposed and successfully tested in order to adapt the standard multi-channel method to the virtual reality and immersive video conferencing applications where the microphones can be meters away from each other. The proposed method estimates the spatial distances between each microphone and the source to optimise the joint compression of the signals recorded within a wide area. The results suggest that the proposed method outperforms the standard multi-channel compression and coding when applied to the ad-hoc scenarios

    Informed source location and DOA estimation using acoustic room impulse response parameters

    No full text
    This paper investigates sound source localisation using an ad-hoc microphone array, where the microphones arbitrarily distributed in a room without knowledge of their locations. The proposed method is applicable to scenarios where microphones positions do not comply with conventional microphone arrays such as a Uniform Linear Array (ULA) and large aperture microphone panels. The proposed method utilizes the amplitude attenuation of the Room Impulse Responses (RIRs) and Time Difference Of Arrivals (TDOA) cues to fit two surfaces across the room and estimate the location of the source. The areas with the least TDOA and the highest direct path impulse amplitude are highlighted as the source areas. Also proposed is an approach to forming subsets of microphones that are chosen such that they can localise the source with the highest accuracy when compared to using all microphones in the room. The approach is investigated for a various numbers of source positions and including additive noise. Results show that the localisation accuracy is increased by utilizing the formed subset of microphones

    Blind speaker counting in highly reverberant environments by clustering coherence features

    No full text
    This paper proposes the use of the frequency- domain Magnitude Squared Coherence (MSC) between two ad- hoc recordings of speech as a reliable speaker discrimination feature for source counting applications in highly reverberant environments. The proposed source counting method does not require knowledge of the microphone spacing and does not assume any relative distance between the sources and the microphones. Source counting is based on clustering the frequency domain MSC of the speech signals derived from short time segments. Experiments show that the frequency domain MSC is speaker-dependent and the method was successfully used to obtain highly accurate source counting results for up to six active speakers for varying levels of reverberation and microphone spacing

    Coherence Based Source Counter: v1.0.1

    No full text
    Included verbose option to suppress rir_generator output to command window and included missing tightPlots package

    A Survey on Ad Hoc Signal Processing: Applications, Challenges and State-of-the-Art Techniques

    No full text
    © 2019 IEEE. In an era of ubiquitous digital devices with built-in microphones and recording capability, distributed microphone arrays of a few digital recording devices are the emerging recording tool in hands-free speech communications and immersive meetings. Such so-called ad hoc microphone arrays can facilitate high-quality spontaneous recording experiences for a wide range of applications and scenarios, though critical challenges have limited their applications. These challenges include unknown and changeable positions of the recording devices and sound sources, resulting in varying time delays of arrival between microphones in the ad hoc array as well as varying recorded sound power levels. This paper reviews state-of-the-art techniques to overcome these issues and provides insight into possible ways to make existing methods more effective and flexible. The focus of this paper is on scenarios in which the microphones are arbitrarily located in an acoustic scene and do not communicate directly or through a fusion centre

    Forming ad-hoc microphone arrays through clustering of acoustic room impulse responses

    No full text
    This paper investigates the formation of ad-hoc microphone arrays for the purpose of recording multiple sound sources by clustering microphones spatially distributed within a room. A novel codebook-based unsupervised method for cluster formation using features derived from the Room Impulse Responses (RIRs) corresponding to each microphone is proposed and compared with baseline clustering and classification methods. The features correspond to the sequence of arrival time and time delays of echoes as estimated by peaks of the RIRs along with peak amplitudes. Results suggest that the proposed codebook based clustering algorithm can outperform KNN supervised classification method and kmeans unsupervised clustering method applied to microphone segmentation and clustering, in terms of clustering success rate and noise robustness

    Spatial multi-channel linear prediction for dereverberation of ad-hoc microphones

    No full text
    A spatially modified multi-channel linear prediction analysis is proposed and tested for the dereverberation of ad-hoc microphone arrays. The proposed spatial multi-channel linear prediction takes into account the estimated spatial distances between each microphone and the source and is applied for the short-term dereverberation (pre-whitening). The delayed linear prediction is then applied for the suppression of the late reverberation. Results suggest that the proposed method outperforms the standard linear prediction based methods when applied to the ad-hoc microphones. It is also concluded that the kurtosis of the linear prediction residual signal is a reliable distance feature when the microphone gains are inconsistent and the sources energy levels vary
    corecore