3 research outputs found
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
Speech Recognition in noisy environment using Deep Learning Neural Network
Recent researches in the field of automatic speaker recognition have shown that methods based
on deep learning neural networks provide better performance than other statistical classifiers. On
the other hand, these methods usually require adjustment of a significant number of parameters.
The goal of this thesis is to show that selecting appropriate value of parameters can significantly
improve speaker recognition performance of methods based on deep learning neural networks.
The reported study introduces an approach to automatic speaker recognition based on deep
neural networks and the stochastic gradient descent algorithm. It particularly focuses on three
parameters of the stochastic gradient descent algorithm: the learning rate, and the hidden and
input layer dropout rates. Additional attention was devoted to the research question of speaker
recognition under noisy conditions.
Thus, two experiments were conducted in the scope of this thesis. The first experiment was
intended to demonstrate that the optimization of the observed parameters of the stochastic
gradient descent algorithm can improve speaker recognition performance under no presence of
noise. This experiment was conducted in two phases. In the first phase, the recognition rate is
observed when the hidden layer dropout rate and the learning rate are varied, while the input
layer dropout rate was constant. In the second phase of this experiment, the recognition rate is
observed when the input layers dropout rate and learning rate are varied, while the hidden layer
dropout rate was constant. The second experiment was intended to show that the optimization of
the observed parameters of the stochastic gradient descent algorithm can improve speaker
recognition performance even under noisy conditions. Thus, different noise levels were
artificially applied on the original speech signal
Deep Learning for Distant Speech Recognition
Deep learning is an emerging technology that is considered one of the most
promising directions for reaching higher levels of artificial intelligence.
Among the other achievements, building computers that understand speech
represents a crucial leap towards intelligent machines. Despite the great
efforts of the past decades, however, a natural and robust human-machine speech
interaction still appears to be out of reach, especially when users interact
with a distant microphone in noisy and reverberant environments. The latter
disturbances severely hamper the intelligibility of a speech signal, making
Distant Speech Recognition (DSR) one of the major open challenges in the field.
This thesis addresses the latter scenario and proposes some novel techniques,
architectures, and algorithms to improve the robustness of distant-talking
acoustic models. We first elaborate on methodologies for realistic data
contamination, with a particular emphasis on DNN training with simulated data.
We then investigate on approaches for better exploiting speech contexts,
proposing some original methodologies for both feed-forward and recurrent
neural networks. Lastly, inspired by the idea that cooperation across different
DNNs could be the key for counteracting the harmful effects of noise and
reverberation, we propose a novel deep learning paradigm called network of deep
neural networks. The analysis of the original concepts were based on extensive
experimental validations conducted on both real and simulated data, considering
different corpora, microphone configurations, environments, noisy conditions,
and ASR tasks.Comment: PhD Thesis Unitn, 201