8 research outputs found
Hierarchically-Refined Label Attention Network for Sequence Labeling
CRF has been used as a powerful model for statistical sequence labeling. For
neural sequence labeling, however, BiLSTM-CRF does not always lead to better
results compared with BiLSTM-softmax local classification. This can be because
the simple Markov label transition model of CRF does not give much information
gain over strong neural encoding. For better representing label sequences, we
investigate a hierarchically-refined label attention network, which explicitly
leverages label embeddings and captures potential long-term label dependency by
giving each word incrementally refined label distributions with hierarchical
attention. Results on POS tagging, NER and CCG supertagging show that the
proposed model not only improves the overall tagging accuracy with similar
number of parameters, but also significantly speeds up the training and testing
compared to BiLSTM-CRF.Comment: EMNLP 201
Speaker-change Aware CRF for Dialogue Act Classification
Recent work in Dialogue Act (DA) classification approaches the task as a
sequence labeling problem, using neural network models coupled with a
Conditional Random Field (CRF) as the last layer. CRF models the conditional
probability of the target DA label sequence given the input utterance sequence.
However, the task involves another important input sequence, that of speakers,
which is ignored by previous work. To address this limitation, this paper
proposes a simple modification of the CRF layer that takes speaker-change into
account. Experiments on the SwDA corpus show that our modified CRF layer
outperforms the original one, with very wide margins for some DA labels.
Further, visualizations demonstrate that our CRF layer can learn meaningful,
sophisticated transition patterns between DA label pairs conditioned on
speaker-change in an end-to-end way. Code is publicly available
Unifying Token and Span Level Supervisions for Few-Shot Sequence Labeling
Few-shot sequence labeling aims to identify novel classes based on only a few
labeled samples. Existing methods solve the data scarcity problem mainly by
designing token-level or span-level labeling models based on metric learning.
However, these methods are only trained at a single granularity (i.e., either
token level or span level) and have some weaknesses of the corresponding
granularity. In this paper, we first unify token and span level supervisions
and propose a Consistent Dual Adaptive Prototypical (CDAP) network for few-shot
sequence labeling. CDAP contains the token-level and span-level networks,
jointly trained at different granularities. To align the outputs of two
networks, we further propose a consistent loss to enable them to learn from
each other. During the inference phase, we propose a consistent greedy
inference algorithm that first adjusts the predicted probability and then
greedily selects non-overlapping spans with maximum probability. Extensive
experiments show that our model achieves new state-of-the-art results on three
benchmark datasets.Comment: Accepted by ACM Transactions on Information System