6,823 research outputs found
Non-autoregressive Transformer-based End-to-end ASR using BERT
Transformer-based models have led to a significant innovation in various
classic and practical subjects, including speech processing, natural language
processing, and computer vision. On top of the transformer, the attention-based
end-to-end automatic speech recognition (ASR) models have become a popular
fashion in recent years. Specifically, the non-autoregressive modeling, which
can achieve fast inference speed and comparable performance when compared to
conventional autoregressive methods, is an emergent research topic. In the
context of natural language processing, the bidirectional encoder
representations from transformers (BERT) model has received widespread
attention, partially due to its ability to infer contextualized word
representations and to obtain superior performances of downstream tasks by
performing only simple fine-tuning. In order to not only inherit the advantages
of non-autoregressive ASR modeling, but also receive benefits from a
pre-trained language model (e.g., BERT), a non-autoregressive transformer-based
end-to-end ASR model based on BERT is presented in this paper. A series of
experiments conducted on the AISHELL-1 dataset demonstrates competitive or
superior results of the proposed model when compared to state-of-the-art ASR
systems
- …