836 research outputs found
A Survey of Multi-task Learning in Natural Language Processing: Regarding Task Relatedness and Training Methods
Multi-task learning (MTL) has become increasingly popular in natural language
processing (NLP) because it improves the performance of related tasks by
exploiting their commonalities and differences. Nevertheless, it is still not
understood very well how multi-task learning can be implemented based on the
relatedness of training tasks. In this survey, we review recent advances of
multi-task learning methods in NLP, with the aim of summarizing them into two
general multi-task training methods based on their task relatedness: (i) joint
training and (ii) multi-step training. We present examples in various NLP
downstream applications, summarize the task relationships and discuss future
directions of this promising topic.Comment: Accepted to EACL 2023 as regular long pape
Cross-Language Speech Emotion Recognition Using Multimodal Dual Attention Transformers
Despite the recent progress in speech emotion recognition (SER),
state-of-the-art systems are unable to achieve improved performance in
cross-language settings. In this paper, we propose a Multimodal Dual Attention
Transformer (MDAT) model to improve cross-language SER. Our model utilises
pre-trained models for multimodal feature extraction and is equipped with a
dual attention mechanism including graph attention and co-attention to capture
complex dependencies across different modalities and achieve improved
cross-language SER results using minimal target language data. In addition, our
model also exploits a transformer encoder layer for high-level feature
representation to improve emotion classification accuracy. In this way, MDAT
performs refinement of feature representation at various stages and provides
emotional salient features to the classification layer. This novel approach
also ensures the preservation of modality-specific emotional information while
enhancing cross-modality and cross-language interactions. We assess our model's
performance on four publicly available SER datasets and establish its superior
effectiveness compared to recent approaches and baseline models.Comment: Under Review IEEE TM
FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding
Large-scale cross-lingual language models (LM), such as mBERT, Unicoder and
XLM, have achieved great success in cross-lingual representation learning.
However, when applied to zero-shot cross-lingual transfer tasks, most existing
methods use only single-language input for LM finetuning, without leveraging
the intrinsic cross-lingual alignment between different languages that proves
essential for multilingual tasks. In this paper, we propose FILTER, an enhanced
fusion method that takes cross-lingual data as input for XLM finetuning.
Specifically, FILTER first encodes text input in the source language and its
translation in the target language independently in the shallow layers, then
performs cross-language fusion to extract multilingual knowledge in the
intermediate layers, and finally performs further language-specific encoding.
During inference, the model makes predictions based on the text input in the
target language and its translation in the source language. For simple tasks
such as classification, translated text in the target language shares the same
label as the source language. However, this shared label becomes less accurate
or even unavailable for more complex tasks such as question answering, NER and
POS tagging. To tackle this issue, we further propose an additional
KL-divergence self-teaching loss for model training, based on auto-generated
soft pseudo-labels for translated text in the target language. Extensive
experiments demonstrate that FILTER achieves new state of the art on two
challenging multilingual multi-task benchmarks, XTREME and XGLUE.Comment: Accepted to AAAI 2021; Top-1 Performance on XTREME
(https://sites.research.google/xtreme, September 8, 2020) and XGLUE
(https://microsoft.github.io/XGLUE, September 14, 2020) benchmar
Improving Self-training for Cross-lingual Named Entity Recognition with Contrastive and Prototype Learning
In cross-lingual named entity recognition (NER), self-training is commonly
used to bridge the linguistic gap by training on pseudo-labeled target-language
data. However, due to sub-optimal performance on target languages, the pseudo
labels are often noisy and limit the overall performance. In this work, we aim
to improve self-training for cross-lingual NER by combining representation
learning and pseudo label refinement in one coherent framework. Our proposed
method, namely ContProto mainly comprises two components: (1) contrastive
self-training and (2) prototype-based pseudo-labeling. Our contrastive
self-training facilitates span classification by separating clusters of
different classes, and enhances cross-lingual transferability by producing
closely-aligned representations between the source and target language.
Meanwhile, prototype-based pseudo-labeling effectively improves the accuracy of
pseudo labels during training. We evaluate ContProto on multiple transfer
pairs, and experimental results show our method brings in substantial
improvements over current state-of-the-art methods.Comment: Accepted by ACL202
Neural Unsupervised Domain Adaptation in NLP—A Survey
Deep neural networks excel at learning from labeled data and achieve
state-of-the-art results on a wide array of Natural Language Processing tasks.
In contrast, learning from unlabeled data, especially under domain shift,
remains a challenge. Motivated by the latest advances, in this survey we review
neural unsupervised domain adaptation techniques which do not require labeled
target domain data. This is a more challenging yet a more widely applicable
setup. We outline methods, from early approaches in traditional non-neural
methods to pre-trained model transfer. We also revisit the notion of domain,
and we uncover a bias in the type of Natural Language Processing tasks which
received most attention. Lastly, we outline future directions, particularly the
broader need for out-of-distribution generalization of future intelligent NLP
- …