4,168 research outputs found
Towards Structured Deep Neural Network for Automatic Speech Recognition
In this paper we propose the Structured Deep Neural Network (Structured DNN)
as a structured and deep learning algorithm, learning to find the best
structured object (such as a label sequence) given a structured input (such as
a vector sequence) by globally considering the mapping relationships between
the structure rather than item by item.
When automatic speech recognition is viewed as a special case of such a
structured learning problem, where we have the acoustic vector sequence as the
input and the phoneme label sequence as the output, it becomes possible to
comprehensively learned utterance by utterance as a whole, rather than frame by
frame.
Structured Support Vector Machine (structured SVM) was proposed to perform
ASR with structured learning previously, but limited by the linear nature of
SVM. Here we propose structured DNN to use nonlinear transformations in
multi-layers as a structured and deep learning algorithm. It was shown to beat
structured SVM in preliminary experiments on TIMIT
DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation
In previous works, only parameter weights of ASR models are optimized under
fixed-topology architecture. However, the design of successful model
architecture has always relied on human experience and intuition. Besides, many
hyperparameters related to model architecture need to be manually tuned.
Therefore in this paper, we propose an ASR approach with efficient
gradient-based architecture search, DARTS-ASR. In order to examine the
generalizability of DARTS-ASR, we apply our approach not only on many languages
to perform monolingual ASR, but also on a multilingual ASR setting. Following
previous works, we conducted experiments on a multilingual dataset, IARPA
BABEL. The experiment results show that our approach outperformed the baseline
fixed-topology architecture by 10.2% and 10.0% relative reduction on character
error rates under monolingual and multilingual ASR settings respectively.
Furthermore, we perform some analysis on the searched architectures by
DARTS-ASR.Comment: Accepted at INTERSPEECH 202
- …