120,888 research outputs found
Neural Machine Translation into Language Varieties
Both research and commercial machine translation have so far neglected the
importance of properly handling the spelling, lexical and grammar divergences
occurring among language varieties. Notable cases are standard national
varieties such as Brazilian and European Portuguese, and Canadian and European
French, which popular online machine translation services are not keeping
distinct. We show that an evident side effect of modeling such varieties as
unique classes is the generation of inconsistent translations. In this work, we
investigate the problem of training neural machine translation from English to
specific pairs of language varieties, assuming both labeled and unlabeled
parallel texts, and low-resource conditions. We report experiments from English
to two pairs of dialects, EuropeanBrazilian Portuguese and European-Canadian
French, and two pairs of standardized varieties, Croatian-Serbian and
Indonesian-Malay. We show significant BLEU score improvements over baseline
systems when translation into similar languages is learned as a multilingual
task with shared representations.Comment: Published at EMNLP 2018: third conference on machine translation (WMT
2018
Using distributional similarity to organise biomedical terminology
We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy
- …