The FlySpeech Audio-Visual Speaker Diarization System for MISP Challenge
  2022

Li, Yue; Pang, Bowen; Rao, Wei; Wang, Hongji; Wang, Qing; Wang, Yannan; Xie, Lei; Zhang, Li; Zhao, Huan

The FlySpeech Audio-Visual Speaker Diarization System for MISP Challenge 2022

Authors: Yue Li
Bowen Pang
Wei Rao
Hongji Wang
Qing Wang
Yannan Wang
Lei Xie
Li Zhang
Huan Zhao
Publication date: 28 July 2023
Publisher

Abstract

This paper describes the FlySpeech speaker diarization system submitted to the second \textbf{M}ultimodal \textbf{I}nformation Based \textbf{S}peech \textbf{P}rocessing~(\textbf{MISP}) Challenge held in ICASSP 2022. We develop an end-to-end audio-visual speaker diarization~(AVSD) system, which consists of a lip encoder, a speaker encoder, and an audio-visual decoder. Specifically, to mitigate the degradation of diarization performance caused by separate training, we jointly train the speaker encoder and the audio-visual decoder. In addition, we leverage the large-data pretrained speaker extractor to initialize the speaker encoder

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2307.15400

Last time updated on 04/08/2023