3,200 research outputs found
Attending to characters in neural sequence labeling models
Sequence labeling architectures use word embeddings for capturing similarity, but suffer when
handling previously unseen or rare words. We investigate character-level extensions to such
models and propose a novel architecture for combining alternative word representations. By
using an attention mechanism, the model is able to dynamically decide how much information to
use from a word- or character-level component. We evaluated different architectures on a range of
sequence labeling datasets, and character-level extensions were found to improve performance
on every benchmark. In addition, the proposed attention-based architecture delivered the best
results even with a smaller number of trainable parameters
Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition
The success of self-attention in NLP has led to recent applications in
end-to-end encoder-decoder architectures for speech recognition. Separately,
connectionist temporal classification (CTC) has matured as an alignment-free,
non-autoregressive approach to sequence transduction, either by itself or in
various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully
self-attentional network for CTC, and show it is tractable and competitive for
end-to-end speech recognition. SAN-CTC trains quickly and outperforms existing
CTC models and most encoder-decoder models, with character error rates (CERs)
of 4.7% in 1 day on WSJ eval92 and 2.8% in 1 week on LibriSpeech test-clean,
with a fixed architecture and one GPU. Similar improvements hold for WERs after
LM decoding. We motivate the architecture for speech, evaluate position and
downsampling approaches, and explore how label alphabets (character, phoneme,
subword) affect attention heads and performance.Comment: Accepted to ICASSP 201
- …