Search CORE

585 research outputs found

Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing

Author: Barzilay Regina
Globerson Amir
Ram Ori
Schuster Tal
Publication venue
Publication date: 01/01/2019
Field of study

We introduce a novel method for multilingual transfer that utilizes deep contextual embeddings, pretrained in an unsupervised fashion. While contextual embeddings have been shown to yield richer representations of meaning compared to their static counterparts, aligning them poses a challenge due to their dynamic nature. To this end, we construct context-independent variants of the original monolingual spaces and utilize their mapping to derive an alignment for the context-dependent spaces. This mapping readily supports processing of a target language, improving transfer by context-aware embeddings. Our experimental results demonstrate the effectiveness of this approach for zero-shot and few-shot learning of dependency parsing. Specifically, our method consistently outperforms the previous state-of-the-art on 6 tested languages, yielding an improvement of 6.8 LAS points on average.Comment: NAACL 201

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Word2Vec vs DBnary: Augmenting METEOR using Vector Representations or Lexical Resources?

Author: Besacier Laurent
Blanchon Hervé
Bérard Alexandre
Elloumi Zied
Servan Christophe
Publication venue: HAL CCSD
Publication date: 05/10/2016
Field of study

International audienceThis paper presents an approach combining lexico-semantic resources and distributed representations of words applied to the evaluation in machine translation (MT). This study is made through the enrichment of a well-known MT evaluation metric: METEOR. This metric enables an approximate match (synonymy or morphological similarity) between an automatic and a reference translation. Our experiments are made in the framework of the Metrics task of WMT 2014. We show that distributed representations are a good alternative to lexico-semantic resources for MT evaluation and they can even bring interesting additional information. The augmented versions of METEOR, using vector representations, are made available on our Github page

arXiv.org e-Print Archive

HAL - Lille 3

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling

Author: Ao Ben C. H.
Chao Lidia S.
Du Haihua
Wan Yu
Wong Derek F.
Yang Baosong
Publication venue
Publication date: 11/12/2019
Field of study

As a special machine translation task, dialect translation has two main characteristics: 1) lack of parallel training corpus; and 2) possessing similar grammar between two sides of the translation. In this paper, we investigate how to exploit the commonality and diversity between dialects thus to build unsupervised translation models merely accessing to monolingual data. Specifically, we leverage pivot-private embedding, layer coordination, as well as parameter sharing to sufficiently model commonality and diversity among source and target, ranging from lexical, through syntactic, to semantic levels. In order to examine the effectiveness of the proposed models, we collect 20 million monolingual corpus for each of Mandarin and Cantonese, which are official language and the most widely used dialect in China. Experimental results reveal that our methods outperform rule-based simplified and traditional Chinese conversion and conventional unsupervised translation models over 12 BLEU scores.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications