Search CORE

60,045 research outputs found

Towards shared datasets for normalization research

Author: De Clercq Orphée
Desmet Bart
Hoste Veronique
Schulz Sarah
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2014
Field of study

In this paper we present a Dutch and English dataset that can serve as a gold standard for evaluating text normalization approaches. With the combination of text messages, message board posts and tweets, these datasets represent a variety of user generated content. All data was manually normalized to their standard form using newly-developed guidelines. We perform automatic lexical normalization experiments on these datasets using statistical machine translation techniques. We focus on both the word and character level and find that we can improve the BLEU score with ca. 20% for both languages. In order for this user generated content data to be released publicly to the research community some issues first need to be resolved. These are discussed in closer detail by focussing on the current legislation and by investigating previous similar data collection projects. With this discussion we hope to shed some light on various difficulties researchers are facing when trying to share social media data

A Three-Way Perspective on Scientific Discourse Annotation for Knowledge Extraction

Author: Ananiadou S
de Waard A
Liakata M
Nawaz R
Pander Maat H
Thompson P
Publication venue
Publication date: 01/07/2012
Field of study

A study of systems implementation languages for the POCCNET system

Author: Basili V. R.
Franklin J. W.
Publication venue
Publication date: 27/08/1976
Field of study

The results are presented of a study of systems implementation languages for the Payload Operations Control Center Network (POCCNET). Criteria are developed for evaluating the languages, and fifteen existing languages are evaluated on the basis of these criteria

Parsing Argumentation Structures in Persuasive Essays

Author: Gurevych Iryna
Stab Christian
Publication venue
Publication date: 22/07/2016
Field of study

In this article, we present a novel approach for parsing argumentation structures. We identify argument components using sequence labeling at the token level and apply a new joint model for detecting argumentation structures. The proposed model globally optimizes argument component types and argumentative relations using integer linear programming. We show that our model considerably improves the performance of base classifiers and significantly outperforms challenging heuristic baselines. Moreover, we introduce a novel corpus of persuasive essays annotated with argumentation structures. We show that our annotation scheme and annotation guidelines successfully guide human annotators to substantial agreement. This corpus and the annotation guidelines are freely available for ensuring reproducibility and to encourage future research in computational argumentation.Comment: Under review in Computational Linguistics. First submission: 26 October 2015. Revised submission: 15 July 201

arXiv.org e-Print Archive

Directory of Open Access Journals

TUdatalib Repository (TU Darmstadt)

Key stage 3 English : roots and research

Author: Harrison Colin
Publication venue: Department for Education and Skills
Publication date: 01/01/2002
Field of study