Search CORE

261,307 research outputs found

Far-Field Speaker Recognition

Author: Jin Qin
Pan Yue
Schultz Tanja
Publication venue
Publication date: 18/06/2008
Field of study

KITopen

Far-Field Speaker Recognition

Author: Jin Qin
Schultz Tanja
Waibel Alex
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 05/06/2008
Field of study

KITopen

The Sheffield Wargames Corpus.

Author: Fox C.W.
Hain T.
Liu Y.
Zwyssig E.
Publication venue
Publication date: 01/01/2013
Field of study

Recognition of speech in natural environments is a challenging task, even more so if this involves conversations between sev-eral speakers. Work on meeting recognition has addressed some of the significant challenges, mostly targeting formal, business style meetings where people are mostly in a static position in a room. Only limited data is available that contains high qual-ity near and far field data from real interactions between par-ticipants. In this paper we present a new corpus for research on speech recognition, speaker tracking and diarisation, based on recordings of native speakers of English playing a table-top wargame. The Sheffield Wargames Corpus comprises 7 hours of data from 10 recording sessions, obtained from 96 micro-phones, 3 video cameras and, most importantly, 3D location data provided by a sensor tracking system. The corpus repre-sents a unique resource, that provides for the first time location tracks (1.3Hz) of speakers that are constantly moving and talk-ing. The corpus is available for research purposes, and includes annotated development and evaluation test sets. Baseline results for close-talking and far field sets are included in this paper. 1

CiteSeerX

Edinburgh Research Explorer

White Rose Research Online

Robust Far-Field Speaker Recognition under Mismatched Conditions

Author: Jin Qin
Schultz Tanja
Publication venue
Publication date: 04/08/2008
Field of study

KITopen

The Multimodal Information based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition

Author: Chen Hang
Chen Jingdong
Du Jun
Gao Jianqing
He Mao-Kui
Lee Chin-Hui
Liu Cong
Liu Diyuan
Pan Jia
Scharenborg Odette
Siniscalchi Sabato
Wang Zhe
Watanabe Shinji
Wu Shilong
Yin Baocai
Publication venue
Publication date: 11/03/2023
Field of study

The Multi-modal Information based Speech Processing (MISP) challenge aims to extend the application of signal processing technology in specific scenarios by promoting the research into wake-up words, speaker diarization, speech recognition, and other technologies. The MISP2022 challenge has two tracks: 1) audio-visual speaker diarization (AVSD), aiming to solve ``who spoken when'' using both audio and visual data; 2) a novel audio-visual diarization and recognition (AVDR) task that focuses on addressing ``who spoken what when'' with audio-visual speaker diarization results. Both tracks focus on the Chinese language, and use far-field audio and video in real home-tv scenarios: 2-6 people communicating each other with TV noise in the background. This paper introduces the dataset, track settings, and baselines of the MISP2022 challenge. Our analyses of experiments and examples indicate the good performance of AVDR baseline system, and the potential difficulties in this challenge due to, e.g., the far-field video quality, the presence of TV noise in the background, and the indistinguishable speakers.Comment: 5 pages, 4 figures, to be published in ICASSP202

arXiv.org e-Print Archive