Search CORE

3,200 research outputs found

Attending to characters in neural sequence labeling models

Author: Crichton GKO
Pyysalo S
Rei M
Publication venue: COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers
Publication date: 01/01/2016
Field of study

Sequence labeling architectures use word embeddings for capturing similarity, but suffer when handling previously unseen or rare words. We investigate character-level extensions to such models and propose a novel architecture for combining alternative word representations. By using an attention mechanism, the model is able to dynamically decide how much information to use from a word- or character-level component. We evaluated different architectures on a range of sequence labeling datasets, and character-level extensions were found to improve performance on every benchmark. In addition, the proposed attention-based architecture delivered the best results even with a smaller number of trainable parameters

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Apollo (Cambridge)

Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition

Author: Huang Zhiheng
Kirchhoff Katrin
Salazar Julian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/02/2019
Field of study

The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition. Separately, connectionist temporal classification (CTC) has matured as an alignment-free, non-autoregressive approach to sequence transduction, either by itself or in various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully self-attentional network for CTC, and show it is tractable and competitive for end-to-end speech recognition. SAN-CTC trains quickly and outperforms existing CTC models and most encoder-decoder models, with character error rates (CERs) of 4.7% in 1 day on WSJ eval92 and 2.8% in 1 week on LibriSpeech test-clean, with a fixed architecture and one GPU. Similar improvements hold for WERs after LM decoding. We motivate the architecture for speech, evaluate position and downsampling approaches, and explore how label alphabets (character, phoneme, subword) affect attention heads and performance.Comment: Accepted to ICASSP 201

arXiv.org e-Print Archive

Crossref