3 research outputs found

    Joint optimization of diffusion probabilistic-based multichannel speech enhancement with far-field speaker verification

    No full text
    International audienceToday's smart devices using speaker verification are getting equipped with multiple microphones resulting in improving spatial ambiguity and directivity. However, unlike any other speech-based applications, the performance of speaker verification degrades in far-field scenarios due to the adverse effects of a noisy environment and room reverberation. This paper presents a novel multichannel speech enhancement module based on the diffusion probabilistic model. It is used as the front-end of the ECAPA-TDNN speaker verification system in far-field scenarios under a noisy-reverberant environment. The proposed system incorporates a two-stage training approach. In the first stage, both speech enhancement and speaker verification modules are trained individually. In the second stage, both the modules are combined to jointly trained them. We use similaritypreserving knowledge distillation loss that guides the network to produce similar activation for enhanced signals to that of clean speech signals. Using joint optimization with knowledge distillation loss achieved the best performance on both the evaluation composed of synthetic clips similar to those used at training and on unseen recorded clips from the VOiCES dataset
    corecore