3,538 research outputs found
Robust overlapping speech recognition based on neural networks
We address issues for improving hands-free speech recognition performance in the presence of multiple simultaneous speakers using multiple distant microphones. In this paper, a log spectral mapping is proposed to estimate the log mel-filterbank outputs of clean speech from multiple noisy speech using neural networks. Both the mapping of the far-field speech and combination of the enhanced speech and the estimated interfering speech are investigated. Our neural network based feature enhancement method incorporates the noise information and can be viewed as a non-linear log spectral subtraction. Experimental studies on MONC corpus showed that MLP-based mapping techniques yields a improvement in the recognition accuracy for the overlapping speech
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios
The CHiME challenges have played a significant role in the development and
evaluation of robust automatic speech recognition (ASR) systems. We introduce
the CHiME-7 distant ASR (DASR) task, within the 7th CHiME challenge. This task
comprises joint ASR and diarization in far-field settings with multiple, and
possibly heterogeneous, recording devices. Different from previous challenges,
we evaluate systems on 3 diverse scenarios: CHiME-6, DiPCo, and Mixer 6. The
goal is for participants to devise a single system that can generalize across
different array geometries and use cases with no a-priori information. Another
departure from earlier CHiME iterations is that participants are allowed to use
open-source pre-trained models and datasets. In this paper, we describe the
challenge design, motivation, and fundamental research questions in detail. We
also present the baseline system, which is fully array-topology agnostic and
features multi-channel diarization, channel selection, guided source separation
and a robust ASR model that leverages self-supervised speech representations
(SSLR)
- …