45,753 research outputs found
A Global Context Mechanism for Sequence Labeling
Sequential labeling tasks necessitate the computation of sentence
representations for each word within a given sentence. With the advent of
advanced pretrained language models; one common approach involves incorporating
a BiLSTM layer to bolster the sequence structure information at the output
level. Nevertheless, it has been empirically demonstrated (P.-H. Li et al.,
2020) that the potential of BiLSTM for generating sentence representations for
sequence labeling tasks is constrained, primarily due to the amalgamation of
fragments form past and future sentence representations to form a complete
sentence representation. In this study, we discovered that strategically
integrating the whole sentence representation, which existing in the first cell
and last cell of BiLSTM, into sentence representation of ecah cell, could
markedly enhance the F1 score and accuracy. Using BERT embedded within BiLSTM
as illustration, we conducted exhaustive experiments on nine datasets for
sequence labeling tasks, encompassing named entity recognition (NER), part of
speech (POS) tagging and End-to-End Aspect-Based sentiment analysis (E2E-ABSA).
We noted significant improvements in F1 scores and accuracy across all examined
datasets
Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition
The success of self-attention in NLP has led to recent applications in
end-to-end encoder-decoder architectures for speech recognition. Separately,
connectionist temporal classification (CTC) has matured as an alignment-free,
non-autoregressive approach to sequence transduction, either by itself or in
various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully
self-attentional network for CTC, and show it is tractable and competitive for
end-to-end speech recognition. SAN-CTC trains quickly and outperforms existing
CTC models and most encoder-decoder models, with character error rates (CERs)
of 4.7% in 1 day on WSJ eval92 and 2.8% in 1 week on LibriSpeech test-clean,
with a fixed architecture and one GPU. Similar improvements hold for WERs after
LM decoding. We motivate the architecture for speech, evaluate position and
downsampling approaches, and explore how label alphabets (character, phoneme,
subword) affect attention heads and performance.Comment: Accepted to ICASSP 201
- …