29,206 research outputs found

    Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization

    Full text link
    Audio-visual learning has demonstrated promising results in many classical speech tasks (e.g., speech separation, automatic speech recognition, wake-word spotting). We believe that introducing visual modality will also benefit speaker diarization. To date, Target-Speaker Voice Activity Detection (TS-VAD) plays an important role in highly accurate speaker diarization. However, previous TS-VAD models take audio features and utilize the speaker's acoustic footprint to distinguish his or her personal speech activities, which is easily affected by overlapped speech in multi-speaker scenarios. Although visual information naturally tolerates overlapped speech, it suffers from spatial occlusion, low resolution, etc. The potential modality-missing problem blocks TS-VAD towards an audio-visual approach. This paper proposes a novel Multi-Input Multi-Output Target-Speaker Voice Activity Detection (MIMO-TSVAD) framework for speaker diarization. The proposed method can take audio-visual input and leverage the speaker's acoustic footprint or lip track to flexibly conduct audio-based, video-based, and audio-visual speaker diarization in a unified sequence-to-sequence framework. Experimental results show that the MIMO-TSVAD framework demonstrates state-of-the-art performance on the VoxConverse, DIHARD-III, and MISP 2022 datasets under corresponding evaluation metrics, obtaining the Diarization Error Rates (DERs) of 4.18%, 10.10%, and 8.15%, respectively. In addition, it can perform robustly in heavy lip-missing scenarios.Comment: Under review of IEEE/ACM Transactions on Audio, Speech, and Language Processin

    Eigenvalues of the Laplacian on Riemannian manifolds

    Full text link
    For a bounded domain Ω\Omega with a piecewise smooth boundary in a complete Riemannian manifold MM, we study eigenvalues of the Dirichlet eigenvalue problem of the Laplacian. By making use of a fact that eigenfunctions form an orthonormal basis of L2(Ω)L^2(\Omega) in place of the Rayleigh-Ritz formula, we obtain inequalities for eigenvalues of the Laplacian. In particular, for lower order eigenvalues, our results extend the results of Chen and Cheng \cite{CC}.Comment: 17 page
    • …
    corecore