576,675 research outputs found
brat: a Web-based Tool for NLP-Assisted Text Annotation
We introduce the brat rapid annotation tool (BRAT), an intuitive web-based tool for text annotation supported by Natural Language Processing (NLP) technology. BRAT has been developed for rich structured annotation for a variety of NLP tasks and aims to support manual curation efforts and increase annotator productivity using NLP techniques. We discuss several case studies of real-world annotation projects using pre-release versions of BRAT and present an evaluation of annotation assisted by semantic class disambiguation on a multicategory entity mention annotation task, showing a 15 % decrease in total annotation time. BRAT is available under an opensource license from
Annotation graphs as a framework for multidimensional linguistic data analysis
In recent work we have presented a formal framework for linguistic annotation
based on labeled acyclic digraphs. These `annotation graphs' offer a simple yet
powerful method for representing complex annotation structures incorporating
hierarchy and overlap. Here, we motivate and illustrate our approach using
discourse-level annotations of text and speech data drawn from the CALLHOME,
COCONUT, MUC-7, DAMSL and TRAINS annotation schemes. With the help of domain
specialists, we have constructed a hybrid multi-level annotation for a fragment
of the Boston University Radio Speech Corpus which includes the following
levels: segment, word, breath, ToBI, Tilt, Treebank, coreference and named
entity. We show how annotation graphs can represent hybrid multi-level
structures which derive from a diverse set of file formats. We also show how
the approach facilitates substantive comparison of multiple annotations of a
single signal based on different theoretical models. The discussion shows how
annotation graphs open the door to wide-ranging integration of tools, formats
and corpora.Comment: 10 pages, 10 figures, Towards Standards and Tools for Discourse
Tagging, Proceedings of the Workshop. pp. 1-10. Association for Computational
Linguistic
A Practical Incremental Learning Framework For Sparse Entity Extraction
This work addresses challenges arising from extracting entities from textual
data, including the high cost of data annotation, model accuracy, selecting
appropriate evaluation criteria, and the overall quality of annotation. We
present a framework that integrates Entity Set Expansion (ESE) and Active
Learning (AL) to reduce the annotation cost of sparse data and provide an
online evaluation method as feedback. This incremental and interactive learning
framework allows for rapid annotation and subsequent extraction of sparse data
while maintaining high accuracy. We evaluate our framework on three publicly
available datasets and show that it drastically reduces the cost of sparse
entity annotation by an average of 85% and 45% to reach 0.9 and 1.0 F-Scores
respectively. Moreover, the method exhibited robust performance across all
datasets.Comment: https://www.aclweb.org/anthology/C18-1059
Information structure
The guidelines for Information Structure include instructions for the annotation of Information Status (or ‘givenness’), Topic, and Focus, building upon a basic syntactic annotation of nominal phrases and sentences. A procedure for the annotation of these features is proposed
Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation
Image segmentation is a fundamental problem in biomedical image analysis.
Recent advances in deep learning have achieved promising results on many
biomedical image segmentation benchmarks. However, due to large variations in
biomedical images (different modalities, image settings, objects, noise, etc),
to utilize deep learning on a new application, it usually needs a new set of
training data. This can incur a great deal of annotation effort and cost,
because only biomedical experts can annotate effectively, and often there are
too many instances in images (e.g., cells) to annotate. In this paper, we aim
to address the following question: With limited effort (e.g., time) for
annotation, what instances should be annotated in order to attain the best
performance? We present a deep active learning framework that combines fully
convolutional network (FCN) and active learning to significantly reduce
annotation effort by making judicious suggestions on the most effective
annotation areas. We utilize uncertainty and similarity information provided by
FCN and formulate a generalized version of the maximum set cover problem to
determine the most representative and uncertain areas for annotation. Extensive
experiments using the 2015 MICCAI Gland Challenge dataset and a lymph node
ultrasound image segmentation dataset show that, using annotation suggestions
by our method, state-of-the-art segmentation performance can be achieved by
using only 50% of training data.Comment: Accepted at MICCAI 201
- …
