29,206 research outputs found
Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization
Audio-visual learning has demonstrated promising results in many classical
speech tasks (e.g., speech separation, automatic speech recognition, wake-word
spotting). We believe that introducing visual modality will also benefit
speaker diarization. To date, Target-Speaker Voice Activity Detection (TS-VAD)
plays an important role in highly accurate speaker diarization. However,
previous TS-VAD models take audio features and utilize the speaker's acoustic
footprint to distinguish his or her personal speech activities, which is easily
affected by overlapped speech in multi-speaker scenarios. Although visual
information naturally tolerates overlapped speech, it suffers from spatial
occlusion, low resolution, etc. The potential modality-missing problem blocks
TS-VAD towards an audio-visual approach.
This paper proposes a novel Multi-Input Multi-Output Target-Speaker Voice
Activity Detection (MIMO-TSVAD) framework for speaker diarization. The proposed
method can take audio-visual input and leverage the speaker's acoustic
footprint or lip track to flexibly conduct audio-based, video-based, and
audio-visual speaker diarization in a unified sequence-to-sequence framework.
Experimental results show that the MIMO-TSVAD framework demonstrates
state-of-the-art performance on the VoxConverse, DIHARD-III, and MISP 2022
datasets under corresponding evaluation metrics, obtaining the Diarization
Error Rates (DERs) of 4.18%, 10.10%, and 8.15%, respectively. In addition, it
can perform robustly in heavy lip-missing scenarios.Comment: Under review of IEEE/ACM Transactions on Audio, Speech, and Language
Processin
Eigenvalues of the Laplacian on Riemannian manifolds
For a bounded domain with a piecewise smooth boundary in a complete
Riemannian manifold , we study eigenvalues of the Dirichlet eigenvalue
problem of the Laplacian. By making use of a fact that eigenfunctions form an
orthonormal basis of in place of the Rayleigh-Ritz formula, we
obtain inequalities for eigenvalues of the Laplacian. In particular, for lower
order eigenvalues, our results extend the results of Chen and Cheng \cite{CC}.Comment: 17 page
- …