Search CORE

4 research outputs found

Learning Unsupervised Word Translations Without Adversaries

Author: Hospedales Timothy
Mukherjee Tanmoy
Yamada Makoto
Publication venue
Publication date: 01/01/2018
Field of study

Crossref

Edinburgh Research Explorer

Bilingual Lexicon Induction through Unsupervised Machine Translation

Author: Agirre Eneko
Artetxe Mikel
Labaka Gorka
Publication venue
Publication date: 01/01/2019
Field of study

A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods. In this paper, we propose an alternative approach to this problem that builds on the recent work on unsupervised machine translation. This way, instead of directly inducing a bilingual lexicon from cross-lingual embeddings, we use them to build a phrase-table, combine it with a language model, and use the resulting machine translation system to generate a synthetic parallel corpus, from which we extract the bilingual lexicon using statistical word alignment techniques. As such, our method can work with any word embedding and cross-lingual mapping technique, and it does not require any additional resource besides the monolingual corpus used to train the embeddings. When evaluated on the exact same cross-lingual embeddings, our proposed method obtains an average improvement of 6 accuracy points over nearest neighbor and 4 points over CSLS retrieval, establishing a new state-of-the-art in the standard MUSE dataset.Comment: ACL 201

arXiv.org e-Print Archive

Crossref

Duality Regularization for Unsupervised Bilingual Lexicon Induction

Author: Bai Xuefeng
Cao Hailong
Zhang Yue
Zhao Tiejun
Publication venue
Publication date: 03/09/2019
Field of study

Unsupervised bilingual lexicon induction naturally exhibits duality, which results from symmetry in back-translation. For example, EN-IT and IT-EN induction can be mutually primal and dual problems. Current state-of-the-art methods, however, consider the two tasks independently. In this paper, we propose to train primal and dual models jointly, using regularizers to encourage consistency in back translation cycles. Experiments across 6 language pairs show that the proposed method significantly outperforms competitive baselines, obtaining the best-published results on a standard benchmark

arXiv.org e-Print Archive

Itzulpen automatiko gainbegiratu gabea

Author: Artexe Zurutuza Mikel
Publication venue
Publication date: 29/07/2020
Field of study

192 p.Modern machine translation relies on strong supervision in the form of parallel corpora. Such arequirement greatly departs from the way in which humans acquire language, and poses a major practicalproblem for low-resource language pairs. In this thesis, we develop a new paradigm that removes thedependency on parallel data altogether, relying on nothing but monolingual corpora to train unsupervisedmachine translation systems. For that purpose, our approach first aligns separately trained wordrepresentations in different languages based on their structural similarity, and uses them to initializeeither a neural or a statistical machine translation system, which is further trained through iterative backtranslation.While previous attempts at learning machine translation systems from monolingual corporahad strong limitations, our work¿along with other contemporaneous developments¿is the first to reportpositive results in standard, large-scale settings, establishing the foundations of unsupervised machinetranslation and opening exciting opportunities for future research

Archivo Digital para la Docencia y la Investigación