301 research outputs found
Dependency Parsing using Prosody Markers from a Parallel Text
Proceedings of the Ninth International Workshop
on Treebanks and Linguistic Theories.
Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti.
NEALT Proceedings Series, Vol. 9 (2010), 127-138.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15891
An Effective Approach to Unsupervised Machine Translation
While machine translation has traditionally relied on large amounts of
parallel corpora, a recent research line has managed to train both Neural
Machine Translation (NMT) and Statistical Machine Translation (SMT) systems
using monolingual corpora only. In this paper, we identify and address several
deficiencies of existing unsupervised SMT approaches by exploiting subword
information, developing a theoretically well founded unsupervised tuning
method, and incorporating a joint refinement procedure. Moreover, we use our
improved SMT system to initialize a dual NMT model, which is further fine-tuned
through on-the-fly back-translation. Together, we obtain large improvements
over the previous state-of-the-art in unsupervised machine translation. For
instance, we get 22.5 BLEU points in English-to-German WMT 2014, 5.5 points
more than the previous best unsupervised system, and 0.5 points more than the
(supervised) shared task winner back in 2014.Comment: ACL 201
Machine learning for ancient languages: a survey
Ancient languages preserve the cultures and histories of the past. However, their study is fraught with difficulties, and experts must tackle a range of challenging text-based tasks, from deciphering lost languages to restoring damaged inscriptions, to determining the authorship of works of literature. Technological aids have long supported the study of ancient texts, but in recent years advances in artificial intelligence and machine learning have enabled analyses on a scale and in a detail that are reshaping the field of humanities, similarly to how microscopes and telescopes have contributed to the realm of science. This article aims to provide a comprehensive survey of published research using machine learning for the study of ancient texts written in any language, script, and medium, spanning over three and a half millennia of civilizations around the ancient world. To analyze the relevant literature, we introduce a taxonomy of tasks inspired by the steps involved in the study of ancient documents: digitization, restoration, attribution, linguistic analysis, textual criticism, translation, and decipherment. This work offers three major contributions: first, mapping the interdisciplinary field carved out by the synergy between the humanities and machine learning; second, highlighting how active collaboration between specialists from both fields is key to producing impactful and compelling scholarship; third, highlighting promising directions for future work in this field. Thus, this work promotes and supports the continued collaborative impetus between the humanities and machine learning
Itzulpen automatiko gainbegiratu gabea
192 p.Modern machine translation relies on strong supervision in the form of parallel corpora. Such arequirement greatly departs from the way in which humans acquire language, and poses a major practicalproblem for low-resource language pairs. In this thesis, we develop a new paradigm that removes thedependency on parallel data altogether, relying on nothing but monolingual corpora to train unsupervisedmachine translation systems. For that purpose, our approach first aligns separately trained wordrepresentations in different languages based on their structural similarity, and uses them to initializeeither a neural or a statistical machine translation system, which is further trained through iterative backtranslation.While previous attempts at learning machine translation systems from monolingual corporahad strong limitations, our work¿along with other contemporaneous developments¿is the first to reportpositive results in standard, large-scale settings, establishing the foundations of unsupervised machinetranslation and opening exciting opportunities for future research
- …