7,577 research outputs found
Hierarchical Character-Word Models for Language Identification
Social media messages' brevity and unconventional spelling pose a challenge
to language identification. We introduce a hierarchical model that learns
character and contextualized word-level representations for language
identification. Our method performs well against strong base- lines, and can
also reveal code-switching
Neural Word Segmentation with Rich Pretraining
Neural word segmentation research has benefited from large-scale raw texts by
leveraging them for pretraining character and word embeddings. On the other
hand, statistical segmentation research has exploited richer sources of
external information, such as punctuation, automatic segmentation and POS. We
investigate the effectiveness of a range of external training sources for
neural word segmentation by building a modular segmentation model, pretraining
the most important submodule using rich external sources. Results show that
such pretraining significantly improves the model, leading to accuracies
competitive to the best methods on six benchmarks.Comment: Accepted by ACL 201
Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!
Argumentation mining (AM) requires the identification of complex discourse
structures and has lately been applied with success monolingually. In this
work, we show that the existing resources are, however, not adequate for
assessing cross-lingual AM, due to their heterogeneity or lack of complexity.
We therefore create suitable parallel corpora by (human and machine)
translating a popular AM dataset consisting of persuasive student essays into
German, French, Spanish, and Chinese. We then compare (i) annotation projection
and (ii) bilingual word embeddings based direct transfer strategies for
cross-lingual AM, finding that the former performs considerably better and
almost eliminates the loss from cross-lingual transfer. Moreover, we find that
annotation projection works equally well when using either costly human or
cheap machine translations. Our code and data are available at
\url{http://github.com/UKPLab/coling2018-xling_argument_mining}.Comment: Accepted at Coling 201
- …