Skip to main content
Article thumbnail
Location of Repository

Reducing Label Cost by Combining Feature Labels and Crowdsourcing

By Jay Pujara and Lise Getoor

Abstract

Decreasing technology costs, increasing computational power and ubiquitous network connectivity are contributing to an unprecedented increase in the amount of publicly available data. Yet this surge of data has not been accompanied by a complementary increase in annotation. This lack of annotated data complicates data mining tasks in which supervised learning is preferred or required. In response, researchers have proposed many approaches to cheaply construct training sets. One approach, referred to as feature labels (McCallum & Nigam, 1999), chooses features that strongly correlate with the label space and uses instances containing those features as labeled data for training a classifier. These high precision examples help bootstrap the learning process. Another technique, crowdsourcing, exploits our everincreasing connectivity to request annotation from a broader community (who may or may not be domain experts), thereby refining and expanding the labeled data. Combining these techniques provides a means to obtain supervision from large, unlabeled data sources. In this paper, we investigate using active learning to combine these approaches in a unified framework which we call active bootstrapping. We show that this technique produces more reliable labels than either approach individually, resulting in a better classifier at mini

Year: 2013
OAI identifier: oai:CiteSeerX.psu:10.1.1.353.1042
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://linqs.cs.umd.edu/basili... (external link)
  • http://linqs.cs.umd.edu/basili... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.