3,505 research outputs found
A Practical Incremental Learning Framework For Sparse Entity Extraction
This work addresses challenges arising from extracting entities from textual
data, including the high cost of data annotation, model accuracy, selecting
appropriate evaluation criteria, and the overall quality of annotation. We
present a framework that integrates Entity Set Expansion (ESE) and Active
Learning (AL) to reduce the annotation cost of sparse data and provide an
online evaluation method as feedback. This incremental and interactive learning
framework allows for rapid annotation and subsequent extraction of sparse data
while maintaining high accuracy. We evaluate our framework on three publicly
available datasets and show that it drastically reduces the cost of sparse
entity annotation by an average of 85% and 45% to reach 0.9 and 1.0 F-Scores
respectively. Moreover, the method exhibited robust performance across all
datasets.Comment: https://www.aclweb.org/anthology/C18-1059
Practical, Efficient, and Customizable Active Learning for Named Entity Recognition in the Digital Humanities
Scholars in inter-disciplinary fields like the
Digital Humanities are increasingly interested
in semantic annotation of specialized corpora.
Yet, under-resourced languages, imperfect or
noisily structured data, and user-specific classification tasks make it difficult to meet their
needs using off-the-shelf models. Manual annotation of large corpora from scratch, meanwhile, can be prohibitively expensive. Thus,
we propose an active learning solution for
named entity recognition, attempting to maximize a custom model’s improvement per additional unit of manual annotation. Our system
robustly handles any domain or user-defined
label set and requires no external resources,
enabling quality named entity recognition for
Humanities corpora where such resources are
not available. Evaluating on typologically disparate languages and datasets, we reduce required annotation by 20-60% and greatly outperform a competitive active learning baseline.New York University–Paris Sciences Lettres Global Alliance grant; National Endowment for the Humanities grant, award HAA-256078-17; Computational Approaches to Modeling Language lab
at New York University Abu Dhab
Challenges and solutions for Latin named entity recognition
Although spanning thousands of years and genres as diverse as liturgy, historiography, lyric and other forms of prose and poetry, the body of Latin texts is still relatively sparse compared to English. Data sparsity in Latin presents a number of challenges for traditional Named Entity
Recognition techniques. Solving such challenges and enabling reliable Named Entity Recognition in Latin texts can facilitate many down-stream applications, from machine translation to digital historiography, enabling Classicists, historians, and archaeologists for instance, to track
the relationships of historical persons, places, and groups on a large scale. This paper presents the first annotated corpus for evaluating Named Entity Recognition in Latin, as well as a fully supervised model that achieves over 90% F-score on a held-out test set, significantly outperforming a competitive baseline. We also present a novel active learning strategy that predicts how many and which sentences need to be annotated for named entities in order to attain a specified degree
of accuracy when recognizing named entities automatically in a given text. This maximizes the productivity of annotators while simultaneously controlling quality
- …