256 research outputs found
Subword and Crossword Units for CTC Acoustic Models
This paper proposes a novel approach to create an unit set for CTC based
speech recognition systems. By using Byte Pair Encoding we learn an unit set of
an arbitrary size on a given training text. In contrast to using characters or
words as units this allows us to find a good trade-off between the size of our
unit set and the available training data. We evaluate both Crossword units,
that may span multiple word, and Subword units. By combining this approach with
decoding methods using a separate language model we are able to achieve state
of the art results for grapheme based CTC systems.Comment: Current version accepted at Interspeech 201
Hierarchical Multi Task Learning With CTC
In Automatic Speech Recognition it is still challenging to learn useful
intermediate representations when using high-level (or abstract) target units
such as words. For that reason, character or phoneme based systems tend to
outperform word-based systems when just few hundreds of hours of training data
are being used. In this paper, we first show how hierarchical multi-task
training can encourage the formation of useful intermediate representations. We
achieve this by performing Connectionist Temporal Classification at different
levels of the network with targets of different granularity. Our model thus
performs predictions in multiple scales for the same input. On the standard
300h Switchboard training setup, our hierarchical multi-task architecture
exhibits improvements over single-task architectures with the same number of
parameters. Our model obtains 14.0% Word Error Rate on the Eval2000 Switchboard
subset without any decoder or language model, outperforming the current
state-of-the-art on acoustic-to-word models.Comment: In Proceedings at SLT 201
- …