1,981 research outputs found

    Jointly Tracking and Separating Speech Sources Using Multiple Features and the generalized labeled multi-Bernoulli Framework

    Full text link
    This paper proposes a novel joint multi-speaker tracking-and-separation method based on the generalized labeled multi-Bernoulli (GLMB) multi-target tracking filter, using sound mixtures recorded by microphones. Standard multi-speaker tracking algorithms usually only track speaker locations, and ambiguity occurs when speakers are spatially close. The proposed multi-feature GLMB tracking filter treats the set of vectors of associated speaker features (location, pitch and sound) as the multi-target multi-feature observation, characterizes transitioning features with corresponding transition models and overall likelihood function, thus jointly tracks and separates each multi-feature speaker, and addresses the spatial ambiguity problem. Numerical evaluation verifies that the proposed method can correctly track locations of multiple speakers and meanwhile separate speech signals

    Speaker Diarization Based on Intensity Channel Contribution

    Get PDF
    The time delay of arrival (TDOA) between multiple microphones has been used since 2006 as a source of information (localization) to complement the spectral features for speaker diarization. In this paper, we propose a new localization feature, the intensity channel contribution (ICC) based on the relative energy of the signal arriving at each channel compared to the sum of the energy of all the channels. We have demonstrated that by joining the ICC features and the TDOA features, the robustness of the localization features is improved and that the diarization error rate (DER) of the complete system (using localization and spectral features) has been reduced. By using this new localization feature, we have been able to achieve a 5.2% DER relative improvement in our development data, a 3.6% DER relative improvement in the RT07 evaluation data and a 7.9% DER relative improvement in the last year's RT09 evaluation data

    Statistically Efficient Methods for Pitch and DOA Estimation

    Get PDF

    Acoustic Speaker Localization with Strong Reverberation and Adaptive Feature Filtering with a Bayes RFS Framework

    Get PDF
    The thesis investigates the challenges of speaker localization in presence of strong reverberation, multi-speaker tracking, and multi-feature multi-speaker state filtering, using sound recordings from microphones. Novel reverberation-robust speaker localization algorithms are derived from the signal and room acoustics models. A multi-speaker tracking filter and a multi-feature multi-speaker state filter are developed based upon the generalized labeled multi-Bernoulli random finite set framework. Experiments and comparative studies have verified and demonstrated the benefits of the proposed methods

    Joint Pitch and DOA Estimation Using the ESPRIT method

    Get PDF

    ‘Did the speaker change?’: Temporal tracking for overlapping speaker segmentation in multi-speaker scenarios

    Get PDF
    Diarization systems are an essential part of many speech processing applications, such as speaker indexing, improving automatic speech recognition (ASR) performance and making single speaker-based algorithms available for use in multi-speaker domains. This thesis will focus on the first task of the diarization process, that being the task of speaker segmentation which can be thought of as trying to answer the question ‘Did the speaker change?’ in an audio recording. This thesis starts by showing that time-varying pitch properties can be used advantageously within the segmentation step of a multi-talker diarization system. It is then highlighted that an individual’s pitch is smoothly varying and, therefore, can be predicted by means of a Kalman filter. Subsequently, it is shown that if the pitch is not predictable, then this is most likely due to a change in the speaker. Finally, a novel system is proposed that uses this approach of pitch prediction for speaker change detection. This thesis then goes on to demonstrate how voiced harmonics can be useful in detecting when more than one speaker is talking, such as during overlapping speaker activity. A novel system is proposed to track multiple harmonics simultaneously, allowing for the determination of onsets and end-points of a speaker’s utterance in the presence of an additional active speaker. This thesis then extends this work to explore the use of a new multimodal approach for overlapping speaker segmentation that tracks both the fundamental frequency (F0) and direction of arrival (DoA) of each speaker simultaneously. The proposed multiple hypothesis tracking system, which simultaneously tracks both features, shows an improvement in segmentation performance when compared to tracking these features separately. Lastly, this thesis focuses on the DoA estimation part of the newly proposed multimodal approach. It does this by exploring a polynomial extension to the multiple signal classification (MUSIC) algorithm, spatio-spectral polynomial (SSP)-MUSIC, and evaluating its performance when using speech sound sources.Open Acces

    Nonlinear Least Squares Methods for Joint DOA and Pitch Estimation

    Get PDF
    • …
    corecore