Search CORE

589 research outputs found

Environmentally robust ASR front-end for deep neural network acoustic models

Author: Gales MJF
Yoshioka T
Publication venue: Computer Speech and Language
Publication date: 01/01/2015
Field of study

This paper examines the individual and combined impacts of various front-end approaches on the performance of deep neural network (DNN) based speech recognition systems in distant talking situations, where acoustic environmental distortion degrades the recognition performance. Training of a DNN-based acoustic model consists of generation of state alignments followed by learning the network parameters. This paper first shows that the network parameters are more sensitive to the speech quality than the alignments and thus this stage requires improvement. Then, various front-end robustness approaches to addressing this problem are categorised based on functionality. The degree to which each class of approaches impacts the performance of DNN-based acoustic models is examined experimentally. Based on the results, a front-end processing pipeline is proposed for efficiently combining different classes of approaches. Using this front-end, the combined effects of different classes of approaches are further evaluated in a single distant microphone-based meeting transcription task with both speaker independent (SI) and speaker adaptive training (SAT) set-ups. By combining multiple speech enhancement results, multiple types of features, and feature transformation, the front-end shows relative performance gains of 7.24% and 9.83% in the SI and SAT scenarios, respectively, over competitive DNN-based systems using log mel-filter bank features.This is the final version of the article. It first appeared from Elsevier via http://dx.doi.org/10.1016/j.csl.2014.11.00

Elsevier - Publisher Connector

Apollo (Cambridge)

Reverberant speech recognition exploiting clarity index estimation

Author: Naylor PA
Parada PP
Sharma D
Van Waterschoot T
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/06/2015
Field of study

Spiral - Imperial College Digital Repository

Two-Staged Acoustic Modeling Adaption for Robust Speech Recognition by the Example of German Oral History Interviews

Author: Behnke Sven
Gref Michael
Köhler Joachim
Schmidt Christoph
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/08/2019
Field of study

In automatic speech recognition, often little training data is available for specific challenging tasks, but training of state-of-the-art automatic speech recognition systems requires large amounts of annotated speech. To address this issue, we propose a two-staged approach to acoustic modeling that combines noise and reverberation data augmentation with transfer learning to robustly address challenges such as difficult acoustic recording conditions, spontaneous speech, and speech of elderly people. We evaluate our approach using the example of German oral history interviews, where a relative average reduction of the word error rate by 19.3% is achieved.Comment: Accepted for IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, July 201

arXiv.org e-Print Archive

Crossref

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)

Auditory processing-based features for improving speech recognition in adverse acoustic conditions

Author: Maganti Hari Krishna
Matassoni Marco
Publication venue
Publication date: 01/01/2014
Field of study

Springer - Publisher Connector

Archivio della ricerca - Fondazione Bruno Kessler

Open Access Repository

Distant-talking speaker identification by generalized spectral subtraction-based dereverberation and its efficient computation

Author: Atsuhiko Kai
Longbiao Wang
Zhaofeng Zhang
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Springer - Publisher Connector

Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition

Author: Sun Sining
Wang Ke
Wang Yujun
Xiang Fei
Xie Lei
Zhang Junbo
Publication venue: 'International Speech Communication Association'
Publication date: 25/10/2018
Field of study

We investigate the use of generative adversarial networks (GANs) in speech dereverberation for robust speech recognition. GANs have been recently studied for speech enhancement to remove additive noises, but there still lacks of a work to examine their ability in speech dereverberation and the advantages of using GANs have not been fully established. In this paper, we provide deep investigations in the use of GAN-based dereverberation front-end in ASR. First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in our dataset. Second, further adding residual connections in the deep LSTMs can boost the performance as well. Finally, we find that, for the success of GAN, it is important to update the generator and the discriminator using the same mini-batch data during training. Moreover, using reverberant spectrogram as a condition to discriminator, as suggested in previous studies, may degrade the performance. In summary, our GAN-based dereverberation front-end achieves 14%-19% relative CER reduction as compared to the baseline DNN dereverberation network when tested on a strong multi-condition training acoustic model.Comment: Interspeech 201

arXiv.org e-Print Archive

Crossref