6 research outputs found
On the Importance of Word Order Information in Cross-lingual Sequence Labeling
Word order variances generally exist in different languages. In this paper,
we hypothesize that cross-lingual models that fit into the word order of the
source language might fail to handle target languages. To verify this
hypothesis, we investigate whether making models insensitive to the word order
of the source language can improve the adaptation performance in target
languages. To do so, we reduce the source language word order information
fitted to sequence encoders and observe the performance changes. In addition,
based on this hypothesis, we propose a new method for fine-tuning multilingual
BERT in downstream cross-lingual sequence labeling tasks. Experimental results
on dialogue natural language understanding, part-of-speech tagging, and named
entity recognition tasks show that reducing word order information fitted to
the model can achieve better zero-shot cross-lingual performance. Furthermore,
our proposed methods can also be applied to strong cross-lingual baselines, and
improve their performances.Comment: Accepted in AAAI-202
How Decoding Strategies Affect the Verifiability of Generated Text
Recent progress in pre-trained language models led to systems that are able
to generate text of an increasingly high quality. While several works have
investigated the fluency and grammatical correctness of such models, it is
still unclear to which extent the generated text is consistent with factual
world knowledge. Here, we go beyond fluency and also investigate the
verifiability of text generated by state-of-the-art pre-trained language
models. A generated sentence is verifiable if it can be corroborated or
disproved by Wikipedia, and we find that the verifiability of generated text
strongly depends on the decoding strategy. In particular, we discover a
tradeoff between factuality (i.e., the ability of generating Wikipedia
corroborated text) and repetitiveness. While decoding strategies such as top-k
and nucleus sampling lead to less repetitive generations, they also produce
less verifiable text. Based on these finding, we introduce a simple and
effective decoding strategy which, in comparison to previously used decoding
strategies, produces less repetitive and more verifiable text.Comment: accepted at Findings of EMNLP 202
Itzulpen automatiko gainbegiratu gabea
192 p.Modern machine translation relies on strong supervision in the form of parallel corpora. Such arequirement greatly departs from the way in which humans acquire language, and poses a major practicalproblem for low-resource language pairs. In this thesis, we develop a new paradigm that removes thedependency on parallel data altogether, relying on nothing but monolingual corpora to train unsupervisedmachine translation systems. For that purpose, our approach first aligns separately trained wordrepresentations in different languages based on their structural similarity, and uses them to initializeeither a neural or a statistical machine translation system, which is further trained through iterative backtranslation.While previous attempts at learning machine translation systems from monolingual corporahad strong limitations, our work¿along with other contemporaneous developments¿is the first to reportpositive results in standard, large-scale settings, establishing the foundations of unsupervised machinetranslation and opening exciting opportunities for future research