3,475 research outputs found
Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing
We introduce a novel method for multilingual transfer that utilizes deep
contextual embeddings, pretrained in an unsupervised fashion. While contextual
embeddings have been shown to yield richer representations of meaning compared
to their static counterparts, aligning them poses a challenge due to their
dynamic nature. To this end, we construct context-independent variants of the
original monolingual spaces and utilize their mapping to derive an alignment
for the context-dependent spaces. This mapping readily supports processing of a
target language, improving transfer by context-aware embeddings. Our
experimental results demonstrate the effectiveness of this approach for
zero-shot and few-shot learning of dependency parsing. Specifically, our method
consistently outperforms the previous state-of-the-art on 6 tested languages,
yielding an improvement of 6.8 LAS points on average.Comment: NAACL 201
XL-NBT: A Cross-lingual Neural Belief Tracking Framework
Task-oriented dialog systems are becoming pervasive, and many companies
heavily rely on them to complement human agents for customer service in call
centers. With globalization, the need for providing cross-lingual customer
support becomes more urgent than ever. However, cross-lingual support poses
great challenges---it requires a large amount of additional annotated data from
native speakers. In order to bypass the expensive human annotation and achieve
the first step towards the ultimate goal of building a universal dialog system,
we set out to build a cross-lingual state tracking framework. Specifically, we
assume that there exists a source language with dialog belief tracking
annotations while the target languages have no annotated dialog data of any
form. Then, we pre-train a state tracker for the source language as a teacher,
which is able to exploit easy-to-access parallel data. We then distill and
transfer its own knowledge to the student state tracker in target languages. We
specifically discuss two types of common parallel resources: bilingual corpus
and bilingual dictionary, and design different transfer learning strategies
accordingly. Experimentally, we successfully use English state tracker as the
teacher to transfer its knowledge to both Italian and German trackers and
achieve promising results.Comment: 13 pages, 5 figures, 3 tables, accepted to EMNLP 2018 conferenc
Cross-lingual transfer learning and multitask learning for capturing multiword expressions
This is an accepted manuscript of an article published by Association for Computational Linguistics in Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), available online: https://www.aclweb.org/anthology/W19-5119
The accepted version of the publication may differ from the final published version.Recent developments in deep learning have prompted a surge of interest in the application of multitask and transfer learning to NLP problems. In this study, we explore for the first time, the application of transfer learning (TRL) and multitask learning (MTL) to the identification of Multiword Expressions (MWEs). For MTL, we exploit the shared syntactic information between MWE and dependency parsing models to jointly train a single model on both tasks. We specifically predict two types of labels: MWE and dependency parse. Our neural MTL architecture utilises the supervision of dependency parsing in lower layers and predicts MWE tags in upper layers. In the TRL scenario, we overcome the scarcity of data by learning a model on a larger MWE dataset and transferring the knowledge to a resource-poor setting in another language. In both scenarios, the resulting models achieved higher performance compared to standard neural approaches
Zero-Shot Cross-Lingual Transfer with Meta Learning
Learning what to share between tasks has been a topic of great importance
recently, as strategic sharing of knowledge has been shown to improve
downstream task performance. This is particularly important for multilingual
applications, as most languages in the world are under-resourced. Here, we
consider the setting of training models on multiple different languages at the
same time, when little or no data is available for languages other than
English. We show that this challenging setup can be approached using
meta-learning, where, in addition to training a source language model, another
model learns to select which training instances are the most beneficial to the
first. We experiment using standard supervised, zero-shot cross-lingual, as
well as few-shot cross-lingual settings for different natural language
understanding tasks (natural language inference, question answering). Our
extensive experimental setup demonstrates the consistent effectiveness of
meta-learning for a total of 15 languages. We improve upon the state-of-the-art
for zero-shot and few-shot NLI (on MultiNLI and XNLI) and QA (on the MLQA
dataset). A comprehensive error analysis indicates that the correlation of
typological features between languages can partly explain when parameter
sharing learned via meta-learning is beneficial.Comment: Accepted as long paper in EMNLP2020 main conferenc
Empirical Gaussian priors for cross-lingual transfer learning
Sequence model learning algorithms typically maximize log-likelihood minus
the norm of the model (or minimize Hamming loss + norm). In cross-lingual
part-of-speech (POS) tagging, our target language training data consists of
sequences of sentences with word-by-word labels projected from translations in
languages for which we have labeled data, via word alignments. Our training
data is therefore very noisy, and if Rademacher complexity is high, learning
algorithms are prone to overfit. Norm-based regularization assumes a constant
width and zero mean prior. We instead propose to use the source language
models to estimate the parameters of a Gaussian prior for learning new POS
taggers. This leads to significantly better performance in multi-source
transfer set-ups. We also present a drop-out version that injects (empirical)
Gaussian noise during online learning. Finally, we note that using empirical
Gaussian priors leads to much lower Rademacher complexity, and is superior to
optimally weighted model interpolation.Comment: Presented at NIPS 2015 Workshop on Transfer and Multi-Task Learnin
- …