17 research outputs found
Subword and Crossword Units for CTC Acoustic Models
This paper proposes a novel approach to create an unit set for CTC based
speech recognition systems. By using Byte Pair Encoding we learn an unit set of
an arbitrary size on a given training text. In contrast to using characters or
words as units this allows us to find a good trade-off between the size of our
unit set and the available training data. We evaluate both Crossword units,
that may span multiple word, and Subword units. By combining this approach with
decoding methods using a separate language model we are able to achieve state
of the art results for grapheme based CTC systems.Comment: Current version accepted at Interspeech 201
Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system
Exploiting effective target modeling units is very important and has always
been a concern in end-to-end automatic speech recognition (ASR). In this work,
we propose a phonetic-assisted multi-target units (PMU) modeling approach, to
enhance the Conformer-Transducer ASR system in a progressive representation
learning manner. Specifically, PMU first uses the pronunciation-assisted
subword modeling (PASM) and byte pair encoding (BPE) to produce
phonetic-induced and text-induced target units separately; Then, three new
frameworks are investigated to enhance the acoustic encoder, including a basic
PMU, a paraCTC and a pcaCTC, they integrate the PASM and BPE units at different
levels for CTC and transducer multi-task training. Experiments on both
LibriSpeech and accented ASR tasks show that, the proposed PMU significantly
outperforms the conventional BPE, it reduces the WER of LibriSpeech clean,
other, and six accented ASR testsets by relative 12.7%, 6.0% and 7.7%,
respectively.Comment: 5 pages, 1 figures, submitted to ICASSP 202