1,747 research outputs found
Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!
Argumentation mining (AM) requires the identification of complex discourse
structures and has lately been applied with success monolingually. In this
work, we show that the existing resources are, however, not adequate for
assessing cross-lingual AM, due to their heterogeneity or lack of complexity.
We therefore create suitable parallel corpora by (human and machine)
translating a popular AM dataset consisting of persuasive student essays into
German, French, Spanish, and Chinese. We then compare (i) annotation projection
and (ii) bilingual word embeddings based direct transfer strategies for
cross-lingual AM, finding that the former performs considerably better and
almost eliminates the loss from cross-lingual transfer. Moreover, we find that
annotation projection works equally well when using either costly human or
cheap machine translations. Our code and data are available at
\url{http://github.com/UKPLab/coling2018-xling_argument_mining}.Comment: Accepted at Coling 201
Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding
We construct a multilingual common semantic space based on distributional
semantics, where words from multiple languages are projected into a shared
space to enable knowledge and resource transfer across languages. Beyond word
alignment, we introduce multiple cluster-level alignments and enforce the word
clusters to be consistently distributed across multiple languages. We exploit
three signals for clustering: (1) neighbor words in the monolingual word
embedding space; (2) character-level information; and (3) linguistic properties
(e.g., apposition, locative suffix) derived from linguistic structure knowledge
bases available for thousands of languages. We introduce a new
cluster-consistent correlational neural network to construct the common
semantic space by aligning words as well as clusters. Intrinsic evaluation on
monolingual and multilingual QVEC tasks shows our approach achieves
significantly higher correlation with linguistic features than state-of-the-art
multi-lingual embedding learning methods do. Using low-resource language name
tagging as a case study for extrinsic evaluation, our approach achieves up to
24.5\% absolute F-score gain over the state of the art.Comment: 10 page
The TALP–UPC Spanish–English WMT biomedical task: bilingual embeddings and char-based neural language model rescoring in a phrase-based system
This paper describes the TALP–UPC system in the Spanish–English WMT 2016 biomedical shared task. Our system is a standard phrase-based system enhanced with vocabulary expansion using bilingual word embeddings and a characterbased neural language model with rescoring. The former focuses on resolving outof- vocabulary words, while the latter enhances the fluency of the system. The two modules progressively improve the final translation as measured by a combination of several lexical metrics.Postprint (published version
XL-NBT: A Cross-lingual Neural Belief Tracking Framework
Task-oriented dialog systems are becoming pervasive, and many companies
heavily rely on them to complement human agents for customer service in call
centers. With globalization, the need for providing cross-lingual customer
support becomes more urgent than ever. However, cross-lingual support poses
great challenges---it requires a large amount of additional annotated data from
native speakers. In order to bypass the expensive human annotation and achieve
the first step towards the ultimate goal of building a universal dialog system,
we set out to build a cross-lingual state tracking framework. Specifically, we
assume that there exists a source language with dialog belief tracking
annotations while the target languages have no annotated dialog data of any
form. Then, we pre-train a state tracker for the source language as a teacher,
which is able to exploit easy-to-access parallel data. We then distill and
transfer its own knowledge to the student state tracker in target languages. We
specifically discuss two types of common parallel resources: bilingual corpus
and bilingual dictionary, and design different transfer learning strategies
accordingly. Experimentally, we successfully use English state tracker as the
teacher to transfer its knowledge to both Italian and German trackers and
achieve promising results.Comment: 13 pages, 5 figures, 3 tables, accepted to EMNLP 2018 conferenc
- …