Search CORE

54 research outputs found

Leveraging speaker diarization for meeting recognition from distant microphones

Author: Andreas Stolcke
David Imseng
Gerald Friedland
Publication venue
Publication date: 01/01/2010
Field of study

ABSTRACT We investigate using state-of-the-art speaker diarization output for speech recognition purposes. While it seems obvious that speech recognition could benefit from the output of speaker diarization ("Who spoke when") for effective feature normalization and model adaptation, such benefits have remained elusive in the very challenging domain of meeting recognition from distant microphones. In this study, we show that recognition gains are possible by careful postprocessing of the diarization output. Still, recognition accuracy may suffer when the underlying diarization system performs worse than expected, even compared to far less sophisticated speaker-clustering techniques. We obtain a more accurate and robust overall system by combining recognition output with multiple speaker segmentations and clusterings. We evaluate our methods on data from the 2009 NIST Rich Transcription meeting recognition evaluation

CiteSeerX

The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios

Author: Chang Xuankai
Cornell Samuele
Garcia Paola
Khudanpur Sanjeev
Maciejewski Matthew
Masuyama Yoshiki
Raj Desh
Squartini Stefano
Wang Zhong-Qiu
Watanabe Shinji
Wiesner Matthew
Publication venue
Publication date: 14/07/2023
Field of study

The CHiME challenges have played a significant role in the development and evaluation of robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR (DASR) task, within the 7th CHiME challenge. This task comprises joint ASR and diarization in far-field settings with multiple, and possibly heterogeneous, recording devices. Different from previous challenges, we evaluate systems on 3 diverse scenarios: CHiME-6, DiPCo, and Mixer 6. The goal is for participants to devise a single system that can generalize across different array geometries and use cases with no a-priori information. Another departure from earlier CHiME iterations is that participants are allowed to use open-source pre-trained models and datasets. In this paper, we describe the challenge design, motivation, and fundamental research questions in detail. We also present the baseline system, which is fully array-topology agnostic and features multi-channel diarization, channel selection, guided source separation and a robust ASR model that leverages self-supervised speech representations (SSLR)

arXiv.org e-Print Archive

Overlapped Speech Detection in Multi-Party Meetings

Author: Thaw Mie Mie
Zaw Thein Htay
Publication venue: 'International Journal of Computer Engineering and Applications'
Publication date: 29/07/2020
Field of study

Detection of simultaneous speech in meeting recordings is a difficult problem due both to the complexity of the meeting itself and the environment surrounding it. The system proposes the use of gammatone-like spectrogram-based linear predictor coefficients on distant microphone channel data for overlap detection functions. The framework utilized the Augmented Multiparty Interaction (AMI) conference corpus to assess model performance. The proposed system offers enhancements over base line feature set models for classification

International Journal of Computer (IJC - Global Society of Scientific Research and Researchers, GSSRR)