9,087 research outputs found
Cross-Lingual Dependency Parsing for Closely Related Languages - Helsinki's Submission to VarDial 2017
This paper describes the submission from the University of Helsinki to the
shared task on cross-lingual dependency parsing at VarDial 2017. We present
work on annotation projection and treebank translation that gave good results
for all three target languages in the test set. In particular, Slovak seems to
work well with information coming from the Czech treebank, which is in line
with related work. The attachment scores for cross-lingual models even surpass
the fully supervised models trained on the target language treebank. Croatian
is the most difficult language in the test set and the improvements over the
baseline are rather modest. Norwegian works best with information coming from
Swedish whereas Danish contributes surprisingly little
Reconstructing Native Language Typology from Foreign Language Usage
Linguists and psychologists have long been studying cross-linguistic
transfer, the influence of native language properties on linguistic performance
in a foreign language. In this work we provide empirical evidence for this
process in the form of a strong correlation between language similarities
derived from structural features in English as Second Language (ESL) texts and
equivalent similarities obtained from the typological features of the native
languages. We leverage this finding to recover native language typological
similarity structure directly from ESL text, and perform prediction of
typological features in an unsupervised fashion with respect to the target
languages. Our method achieves 72.2% accuracy on the typology prediction task,
a result that is highly competitive with equivalent methods that rely on
typological resources.Comment: CoNLL 201
Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!
Argumentation mining (AM) requires the identification of complex discourse
structures and has lately been applied with success monolingually. In this
work, we show that the existing resources are, however, not adequate for
assessing cross-lingual AM, due to their heterogeneity or lack of complexity.
We therefore create suitable parallel corpora by (human and machine)
translating a popular AM dataset consisting of persuasive student essays into
German, French, Spanish, and Chinese. We then compare (i) annotation projection
and (ii) bilingual word embeddings based direct transfer strategies for
cross-lingual AM, finding that the former performs considerably better and
almost eliminates the loss from cross-lingual transfer. Moreover, we find that
annotation projection works equally well when using either costly human or
cheap machine translations. Our code and data are available at
\url{http://github.com/UKPLab/coling2018-xling_argument_mining}.Comment: Accepted at Coling 201
Lost in translation: the problems of using mainstream MT evaluation metrics for sign language translation
In this paper we consider the problems of applying corpus-based techniques to minority languages that are neither politically recognised nor have a formally accepted writing system, namely sign languages. We discuss the adoption of an annotated form of sign language data as a suitable corpus for the development of a data-driven machine translation (MT) system, and deal with issues that arise from its use. Useful software tools that facilitate easy annotation of video data are also discussed. Furthermore, we address the problems of using traditional MT evaluation metrics for sign language translation. Based on the candidate translations produced from our example-based machine translation system, we discuss why standard metrics fall short of providing an accurate evaluation and suggest more suitable evaluation methods
- …