Search CORE

62,390 research outputs found

Sentences and Documents in Native Language Identification

Author: Brunato Dominique
Cimino Andrea
Dell’Orletta Felice
Venturi Giulia
Publication venue: 'OpenEdition'
Publication date: 01/01/2018
Field of study

Starting from a wide set of linguistic features, we present the first in depth feature analysis in two different Native Language Identification (NLI) scenarios. We compare the results obtained in a traditional NLI document classification task and in a newly introduced sentence classification task, investigating the different role played by the considered features. Finally, we study the impact of a set of selected features extracted from the sentence classifier in document classification.Partendo da un ampio insieme di caratteristiche linguistiche, presentiamo la prima analisi approfondita del ruolo delle caratteristiche linguistiche nel compito di identificazione della lingua nativa (NLI) in due differenti scenari. Confrontiamo i risultati ottenuti nel tradizionale task di NLI ed in un nuovo compito di classificazione di frasi, studiando il ruolo differente che svolgono le caratteristiche considerate. Infine, studiamo l’impatto di un insieme di caratteristiche estratte dal classificatore di frasi nel task di classificazione di documenti

Crossref

OpenEdition

Towards using web-crawled data for domain adaptation in statistical machine translation

Author: Giagkou Maria
Papavassiliou Vassilis
Pecina Pavel
Prokopidis Prokopis
Toral Antonio
Way Andy
Publication venue
Publication date: 30/05/2011
Field of study

This paper reports on the ongoing work focused on domain adaptation of statistical machine translation using domain-speciﬁc data obtained by domain-focused web crawling. We present a strategy for crawling monolingual and parallel data and their exploitation for testing, language modelling, and system tuning in a phrase--based machine translation framework. The proposed approach is evaluated on the domains of Natural Environment and Labour Legislation and two language pairs: English–French and English–Greek

DCU Online Research Access Service

Complex Word Identification: Challenges in Data Annotation and System Performance

Author: Malmasi Shervin
Paetzold Gustavo
Specia Lucia
Zampieri Marcos
Publication venue
Publication date: 13/10/2017
Field of study

This paper revisits the problem of complex word identification (CWI) following up the SemEval CWI shared task. We use ensemble classifiers to investigate how well computational methods can discriminate between complex and non-complex words. Furthermore, we analyze the classification performance to understand what makes lexical complexity challenging. Our findings show that most systems performed poorly on the SemEval CWI dataset, and one of the reasons for that is the way in which human annotation was performed.Comment: Proceedings of the 4th Workshop on NLP Techniques for Educational Applications (NLPTEA 2017

arXiv.org e-Print Archive

ZENODO