Search CORE

17 research outputs found

Subword and Crossword Units for CTC Acoustic Models

Author: Metze Florian
Sanabria Ramon
Waibel Alex
Zenkel Thomas
Publication venue
Publication date: 18/06/2018
Field of study

This paper proposes a novel approach to create an unit set for CTC based speech recognition systems. By using Byte Pair Encoding we learn an unit set of an arbitrary size on a given training text. In contrast to using characters or words as units this allows us to find a good trade-off between the size of our unit set and the available training data. We evaluate both Crossword units, that may span multiple word, and Subword units. By combining this approach with decoding methods using a separate language model we are able to achieve state of the art results for grapheme based CTC systems.Comment: Current version accepted at Interspeech 201

arXiv.org e-Print Archive

Crossref

Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

Author: Li Li
Long Yanhua
Wei Haoran
Xu Dongxing
Publication venue
Publication date: 02/11/2022
Field of study

Exploiting effective target modeling units is very important and has always been a concern in end-to-end automatic speech recognition (ASR). In this work, we propose a phonetic-assisted multi-target units (PMU) modeling approach, to enhance the Conformer-Transducer ASR system in a progressive representation learning manner. Specifically, PMU first uses the pronunciation-assisted subword modeling (PASM) and byte pair encoding (BPE) to produce phonetic-induced and text-induced target units separately; Then, three new frameworks are investigated to enhance the acoustic encoder, including a basic PMU, a paraCTC and a pcaCTC, they integrate the PASM and BPE units at different levels for CTC and transducer multi-task training. Experiments on both LibriSpeech and accented ASR tasks show that, the proposed PMU significantly outperforms the conventional BPE, it reduces the WER of LibriSpeech clean, other, and six accented ASR testsets by relative 12.7%, 6.0% and 7.7%, respectively.Comment: 5 pages, 1 figures, submitted to ICASSP 202

arXiv.org e-Print Archive