Search CORE

8,792 research outputs found

Vagueness and referential ambiguity in a large-scale annotated corpus

Author: Versley Yannick
Publication venue
Publication date: 01/01/2008
Field of study

In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevenson’s Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions

CiteSeerX

Hochschulschriftenserver - Universität Frankfurt am Main

Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

Author: A Roberts
A Shah
Aleksandar Savkov
B Efron
G Hripcsak
G Savova
J Cohen
J Foster
J-W Fan
Jackie Cassell
John Carroll
K Verspoor
KH Krippendorff
LK Tanabe
M Bada
MP Marcus
Rob Koeling
S Abney
W Sun
Ö Uzuner
Ö Uzuner
Ö Uzuner
Ö Uzuner
Ö Uzuner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning

Crossref

Springer - Publisher Connector

PubMed Central

Sussex Research Online

Building a semantically annotated corpus of clinical texts

Author: Andrea Setzer
Angus Roberts
Denny
Franzén
Friedman
Gennari
George Demetriou
Hersh
Hripcsak
Ian Roberts
Kim
Lindberg
Mark Hepple
Meystre
Pestian
Robert Gaizauskas
Roberts
Tanabe
Yikun Guo
Publication venue: 'Elsevier BV'
Publication date: 01/10/2009
Field of study

In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains

Elsevier - Publisher Connector

Crossref

White Rose Research Online

Temporal expression normalisation in natural language texts

Author: Filannino Michele
Publication venue
Publication date: 01/06/2012
Field of study

Automatic annotation of temporal expressions is a research challenge of great interest in the field of information extraction. In this report, I describe a novel rule-based architecture, built on top of a pre-existing system, which is able to normalise temporal expressions detected in English texts. Gold standard temporally-annotated resources are limited in size and this makes research difficult. The proposed system outperforms the state-of-the-art systems with respect to TempEval-2 Shared Task (value attribute) and achieves substantially better results with respect to the pre-existing system on top of which it has been developed. I will also introduce a new free corpus consisting of 2822 unique annotated temporal expressions. Both the corpus and the system are freely available on-line.Comment: 7 pages, 1 figure, 5 table

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Customizing BPMN Diagrams Using Timelines

Author: Combi Carlo
Oliboni Barbara
Sala Pietro
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 26th International Symposium on Temporal Representation and Reasoning (TIME 2019)
Publication date: 01/01/2019
Field of study

BPMN (Business Process Model and Notation) is widely used standard modeling technique for representing Business Processes by using diagrams, but lacks in some aspects. Representing execution-dependent and time-dependent decisions in BPMN Diagrams may be a daunting challenge [Carlo Combi et al., 2017]. In many cases such constraints are omitted in order to preserve the simplicity and the readability of the process model. However, for purposes such as compliance checking, process mining, and verification, formalizing such constraints could be very useful. In this paper, we propose a novel approach for annotating BPMN Diagrams with Temporal Synchronization Rules borrowed from the timeline-based planning field. We discuss the expressivity of the proposed approach and show that it is able to capture a lot of complex temporally-related constraints without affecting the structure of BPMN diagrams. Finally, we provide a mapping from annotated BPMN diagrams to timeline-based planning problems that allows one to take advantage of the last twenty years of theoretical and practical developments in the field

Dagstuhl Research Online Publication Server

Catalogo dei prodotti della ricerca

Information structure

Author: Endriss Cornelia
Fiedler Ines
Götze Michael
Hinterwimmer Stefan
Petrova Svetlana
Schwarz Anne
Skopeteas Stavros
Stoel Ruben
Weskott Thomas
Publication venue
Publication date: 05/05/2009
Field of study

The guidelines for Information Structure include instructions for the annotation of Information Status (or ‘givenness’), Topic, and Focus, building upon a basic syntactic annotation of nominal phrases and sentences. A procedure for the annotation of these features is proposed

Hochschulschriftenserver - Universität Frankfurt am Main

Active learning in annotating micro-blogs dealing with e-reputation

Author: Cossu Jean-Valère
Molina-Villegas Alejandro
Tello-Signoret Mariana
Publication venue
Publication date: 25/09/2017
Field of study

Elections unleash strong political views on Twitter, but what do people really think about politics? Opinion and trend mining on micro blogs dealing with politics has recently attracted researchers in several fields including Information Retrieval and Machine Learning (ML). Since the performance of ML and Natural Language Processing (NLP) approaches are limited by the amount and quality of data available, one promising alternative for some tasks is the automatic propagation of expert annotations. This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i.e., representation, web reputation) of politicians. Our main focus is on the methodology followed to build an original annotated dataset expressing opinion from two French politicians over time. We therefore review state of the art NLP-based ML algorithms to automatically annotate tweets using a manual initiation step as bootstrap. This paper focuses on key issues about active learning while building a large annotated data set from noise. This will be introduced by human annotators, abundance of data and the label distribution across data and entities. In turn, we show that Twitter characteristics such as the author's name or hashtags can be considered as the bearing point to not only improve automatic systems for Opinion Mining (OM) and Topic Classification but also to reduce noise in human annotations. However, a later thorough analysis shows that reducing noise might induce the loss of crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science - Vol 3 - Contextualisation digitale - 201

arXiv.org e-Print Archive

Episciences.org