Search CORE

3,520 research outputs found

Building a semantically annotated corpus of clinical texts

Author: Andrea Setzer
Angus Roberts
Denny
Franzén
Friedman
Gennari
George Demetriou
Hersh
Hripcsak
Ian Roberts
Kim
Lindberg
Mark Hepple
Meystre
Pestian
Robert Gaizauskas
Roberts
Tanabe
Yikun Guo
Publication venue: 'Elsevier BV'
Publication date: 01/10/2009
Field of study

In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains

Elsevier - Publisher Connector

Crossref

White Rose Research Online

Argumentation Mining in User-Generated Web Discourse

Author: Gurevych Iryna
Habernal Ivan
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2015
Field of study

The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17

arXiv.org e-Print Archive

TUbiblio

Crossref

Directory of Open Access Journals

TUdatalib Repository (TU Darmstadt)

Automated Analysis of Digital Gonioscopy Images Using Deep Learning

Author: Peroni Andrea
Publication venue
Publication date: 01/01/2022
Field of study

University of Dundee Online Publications

ImageNet Large Scale Visual Recognition Challenge

Author: A Geiger
A Torralba
Aditya Khosla
Alexander C. Berg
Andrej Karpathy
B Alexe
B Yao
C Liu
C Vondrick
DG Lowe
GA Miller
Hao Su
J Uijlings
Jia Deng
Jonathan Krause
K Crammer
KEA Sande van de
KEA Sande van de
Li Fei-Fei
M Everingham
M Everingham
Michael Bernstein
Olga Russakovsky
P Arbelaez
P Felzenszwalb
S Thorpe
Sanjeev Satheesh
Sean Ma
T Ahonen
Zhiheng Huang
Publication venue
Publication date: 01/01/2015
Field of study

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.Comment: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL VOC (per-category comparisons in Table 3, distribution of localization difficulty in Fig 16), a list of queries used for obtaining object detection images (Appendix C), and some additional reference

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Carolina Digital Repository

Annotating Argument Schemes

Author: Lawrence John
Reed Chris
Visser Jacky
Wagemans Jean
Walton Douglas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2021
Field of study

University of Dundee Online Publications

International Migration, Integration and Social Cohesion online publications

UvA-DARE

An Investigation into the Pedagogical Features of Documents

Author: Burns Gully
Gordon Jonathan
Natarajan Prem
Sheng Emily
Publication venue
Publication date: 01/01/2017
Field of study

Characterizing the content of a technical document in terms of its learning utility can be useful for applications related to education, such as generating reading lists from large collections of documents. We refer to this learning utility as the "pedagogical value" of the document to the learner. While pedagogical value is an important concept that has been studied extensively within the education domain, there has been little work exploring it from a computational, i.e., natural language processing (NLP), perspective. To allow a computational exploration of this concept, we introduce the notion of "pedagogical roles" of documents (e.g., Tutorial and Survey) as an intermediary component for the study of pedagogical value. Given the lack of available corpora for our exploration, we create the first annotated corpus of pedagogical roles and use it to test baseline techniques for automatic prediction of such roles.Comment: 12th Workshop on Innovative Use of NLP for Building Educational Applications (BEA) at EMNLP 2017; 12 page

arXiv.org e-Print Archive

Crossref

Is it worth it? Budget-related evaluation metrics for model selection

Author: Kelleher John D.
Klubička Filip
Salton Giancarlo D.
Publication venue
Publication date: 01/01/2018
Field of study

Creating a linguistic resource is often done by using a machine learning model that filters the content that goes through to a human annotator, before going into the final resource. However, budgets are often limited, and the amount of available data exceeds the amount of affordable annotation. In order to optimize the benefit from the invested human work, we argue that deciding on which model one should employ depends not only on generalized evaluation metrics such as F-score, but also on the gain metric. Because the model with the highest F-score may not necessarily have the best sequencing of predicted classes, this may lead to wasting funds on annotating false positives, yielding zero improvement of the linguistic resource. We exemplify our point with a case study, using real data from a task of building a verb-noun idiom dictionary. We show that, given the choice of three systems with varying F-scores, the system with the highest F-score does not yield the highest profits. In other words, in our case the cost-benefit trade off is more favorable for a system with a lower F-score.Comment: 7 pages, 1 figure, 5 tables, In proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018

arXiv.org e-Print Archive

Arrow@TUDublin