Search CORE

1 research outputs found

Reducing Annotation Effort on Unbalanced Corpus based on Cost Matrix

Author: Chan Joel
Litman Diane J
Luo Wencan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 13/06/2013
Field of study

Annotated corpora play a significant role in many NLP applications. However, annotation by humans is time-consuming and costly. In this paper, a high recall predictor based on a cost-sensitive learner is proposed as a method to semi-automate the annotation of unbalanced classes. We demonstrate the effectiveness of our approach in the context of one form of unbalanced task: annotation of transcribed human-human dialogues for presence/absence of uncertainty. In two data sets, our cost-matrix based method of uncertainty annotation achieved high levels of recall while maintaining acceptable levels of accuracy. The method is able to reduce human annotation effort by about 80 % without a significant loss in data quality, as demonstrated by an extrinsic evaluation showing that results originally achieved using manually-obtained uncertainty annotations can be replicated using semi-automatically obtained uncertainty annotations.

CiteSeerX

D-Scholarship@Pitt