Search CORE

1,104 research outputs found

Universal Dependencies Parsing for Colloquial Singaporean English

Author: Chan GuangYong Leonard
Chieu Hai Leong
Wang Hongmin
Yang Jie
Zhang Yue
Publication venue
Publication date: 01/01/2017
Field of study

Singlish can be interesting to the ACL community both linguistically as a major creole based on English, and computationally for information extraction and sentiment analysis of regional social media. We investigate dependency parsing of Singlish by constructing a dependency treebank under the Universal Dependencies scheme, and then training a neural network model by integrating English syntactic knowledge into a state-of-the-art parser trained on the Singlish treebank. Results show that English knowledge can lead to 25% relative error reduction, resulting in a parser of 84.47% accuracies. To the best of our knowledge, we are the first to use neural stacking to improve cross-lingual dependency parsing on low-resource languages. We make both our annotation and parser available for further research.Comment: Accepted by ACL 201

arXiv.org e-Print Archive

Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!

Author: Daxenberger Johannes
Eger Steffen
Gurevych Iryna
Stab Christian
Publication venue
Publication date: 08/06/2018
Field of study

Argumentation mining (AM) requires the identification of complex discourse structures and has lately been applied with success monolingually. In this work, we show that the existing resources are, however, not adequate for assessing cross-lingual AM, due to their heterogeneity or lack of complexity. We therefore create suitable parallel corpora by (human and machine) translating a popular AM dataset consisting of persuasive student essays into German, French, Spanish, and Chinese. We then compare (i) annotation projection and (ii) bilingual word embeddings based direct transfer strategies for cross-lingual AM, finding that the former performs considerably better and almost eliminates the loss from cross-lingual transfer. Moreover, we find that annotation projection works equally well when using either costly human or cheap machine translations. Our code and data are available at \url{http://github.com/UKPLab/coling2018-xling_argument_mining}.Comment: Accepted at Coling 201

arXiv.org e-Print Archive

Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT

Author: Dredze Mark
Wu Shijie
Publication venue
Publication date: 01/01/2019
Field of study

Pretrained contextual representation models (Peters et al., 2018; Devlin et al., 2018) have pushed forward the state-of-the-art on many NLP tasks. A new release of BERT (Devlin, 2018) includes a model simultaneously pretrained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task. This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing. We compare mBERT with the best-published methods for zero-shot cross-lingual transfer and find mBERT competitive on each task. Additionally, we investigate the most effective strategy for utilizing mBERT in this manner, determine to what extent mBERT generalizes away from language specific features, and measure factors that influence cross-lingual transfer.Comment: EMNLP 2019 Camera Read

arXiv.org e-Print Archive

A resource-light approach to morpho-syntactic tagging

Author: Macken Lieve
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2010
Field of study