10 research outputs found
Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus
Many efforts of research are devoted to semantic role labeling (SRL) which is
crucial for natural language understanding. Supervised approaches have achieved
impressing performances when large-scale corpora are available for
resource-rich languages such as English. While for the low-resource languages
with no annotated SRL dataset, it is still challenging to obtain competitive
performances. Cross-lingual SRL is one promising way to address the problem,
which has achieved great advances with the help of model transferring and
annotation projection. In this paper, we propose a novel alternative based on
corpus translation, constructing high-quality training datasets for the target
languages from the source gold-standard SRL annotations. Experimental results
on Universal Proposition Bank show that the translation-based method is highly
effective, and the automatic pseudo datasets can improve the target-language
SRL performances significantly.Comment: Accepted at ACL 202
On the Importance of Word Order Information in Cross-lingual Sequence Labeling
Word order variances generally exist in different languages. In this paper,
we hypothesize that cross-lingual models that fit into the word order of the
source language might fail to handle target languages. To verify this
hypothesis, we investigate whether making models insensitive to the word order
of the source language can improve the adaptation performance in target
languages. To do so, we reduce the source language word order information
fitted to sequence encoders and observe the performance changes. In addition,
based on this hypothesis, we propose a new method for fine-tuning multilingual
BERT in downstream cross-lingual sequence labeling tasks. Experimental results
on dialogue natural language understanding, part-of-speech tagging, and named
entity recognition tasks show that reducing word order information fitted to
the model can achieve better zero-shot cross-lingual performance. Furthermore,
our proposed methods can also be applied to strong cross-lingual baselines, and
improve their performances.Comment: Accepted in AAAI-202
Cross-lingual Emotion Detection
Emotion detection is of great importance for understanding humans.
Constructing annotated datasets to train automated models can be expensive. We
explore the efficacy of cross-lingual approaches that would use data from a
source language to build models for emotion detection in a target language. We
compare three approaches, namely: i) using inherently multilingual models; ii)
translating training data into the target language; and iii) using an
automatically tagged parallel corpus. In our study, we consider English as the
source language with Arabic and Spanish as target languages. We study the
effectiveness of different classification models such as BERT and SVMs trained
with different features. Our BERT-based monolingual models that are trained on
target language data surpass state-of-the-art (SOTA) by 4% and 5% absolute
Jaccard score for Arabic and Spanish respectively. Next, we show that using
cross-lingual approaches with English data alone, we can achieve more than 90%
and 80% relative effectiveness of the Arabic and Spanish BERT models
respectively. Lastly, we use LIME to interpret the differences between models