Search CORE

1,111 research outputs found

A Family of Stereo-Based Stochastic Mapping Algorithms for Noisy Speech Recognition

Author: Mohamed Afify
Xiaodong Cui
Yuqing Gao
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Data-driven Speech Enhancement:from Non-negative Matrix Factorization to Deep Representation Learning

Author: Xiang Yang
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2022
Field of study

VBN

Spectral Domain Speech Enhancement Using HMM State-Dependent Super-Gaussian Priors

Author: Arne Leijon
Nasser Mohammadiha
Rainer Martin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Light Gated Recurrent Units for Speech Recognition

Author: Bengio Yoshua
Brakel Philemon
Omologo Maurizio
Ravanelli Mirco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/03/2018
Field of study

A field that has directly benefited from the recent advances in deep learning is Automatic Speech Recognition (ASR). Despite the great achievements of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially in challenging environments characterized by significant noise and reverberation. To improve robustness, modern speech recognizers often employ acoustic models based on Recurrent Neural Networks (RNNs), that are naturally able to exploit large time contexts and long-term speech modulations. It is thus of great interest to continue the study of proper techniques for improving the effectiveness of RNNs in processing speech signals. In this paper, we revise one of the most popular RNN models, namely Gated Recurrent Units (GRUs), and propose a simplified architecture that turned out to be very effective for ASR. The contribution of this work is two-fold: First, we analyze the role played by the reset gate, showing that a significant redundancy with the update gate occurs. As a result, we propose to remove the former from the GRU design, leading to a more efficient and compact single-gate model. Second, we propose to replace hyperbolic tangent with ReLU activations. This variation couples well with batch normalization and could help the model learn long-term dependencies without numerical issues. Results show that the proposed architecture, called Light GRU (Li-GRU), not only reduces the per-epoch training time by more than 30% over a standard GRU, but also consistently improves the recognition accuracy across different tasks, input features, noisy conditions, as well as across different ASR paradigms, ranging from standard DNN-HMM speech recognizers to end-to-end CTC models.Comment: Copyright 2018 IEE

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Adaptive Hidden Markov Noise Modelling for Speech Enhancement

Author: Bai Jiongjun
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/05/2013
Field of study

A robust and reliable noise estimation algorithm is required in many speech enhancement systems. The aim of this thesis is to propose and evaluate a robust noise estimation algorithm for highly non-stationary noisy environments. In this work, we model the non-stationary noise using a set of discrete states with each state representing a distinct noise power spectrum. In this approach, the state sequence over time is conveniently represented by a Hidden Markov Model (HMM). In this thesis, we first present an online HMM re-estimation framework that models time-varying noise using a Hidden Markov Model and tracks changes in noise characteristics by a sequential model update procedure that tracks the noise characteristics during the absence of speech. In addition the algorithm will when necessary create new model states to represent novel noise spectra and will merge existing states that have similar characteristics. We then extend our work in robust noise estimation during speech activity by incorporating a speech model into our existing noise model. The noise characteristics within each state are updated based on a speech presence probability which is derived from a modified Minima controlled recursive averaging method. We have demonstrated the effectiveness of our noise HMM in tracking both stationary and highly non-stationary noise, and shown that it gives improved performance over other conventional noise estimation methods when it is incorporated into a standard speech enhancement algorithm

Spiral - Imperial College Digital Repository

Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features

Author: A Mohamed
A Sehr
B Atal
B Cauchi
BH Juang
BT Meyer
D Povey
EAP Habets
G Hinton
G Langner
I Kodrasi
K Lebart
KE Muller
MR Schroeder
MR Schädler
N Moritz
R Martin
SB David
T Dau
T Gerkmann
T Nakatani
T Yoshioka
Y Ephraim
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A novel and robust parameter training approach for HMMs under noisy and partial access to states

Author: Akman A.
Kozat S. S.
Ozkan H.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Cataloged from PDF version of article.This paper proposes a new estimation algorithm for the parameters of an HMM as to best account for the observed data. In this model, in addition to the observation sequence, we have partial and noisy access to the hidden state sequence as side information. This access can be seen as "partial labeling" of the hidden states. Furthermore, we model possible mislabeling in the side information in a joint framework and derive the corresponding EM updates accordingly. In our simulations, we observe that using this side information, we considerably improve the state recognition performance, up to 70%, with respect to the "achievable margin" defined by the baseline algorithms. Moreover, our algorithm is shown to be robust to the training conditions. (C) 2013 Elsevier B.V. All rights reserved

Bilkent University Institutional Repository