Article thumbnail
Location of Repository

Enhancing Named Entity Extraction by Effectively Incorporating the Crowd

By Katrin Braunschweig, Maik Thiele, Julian Eberius, Wolfgang Lehner and Technische Universität Dresden


Abstract: Named entity extraction is an established research area in the field of information extraction. When tailored to aspecific domain and with sufficient pre-labeled training data, state-of-the-art extraction algorithms have achieved near human performance. However, when presented with semi-structured data, informal text or unknown domains where training data is not available, extraction results can deteriorate significantly. Recent research has focused on crowdsourcing as an alternative toautomatic named entity extraction or as atool togenerate the required training data. While humans easily adapt to semi-structured data and informal style, acrowd-based approach also introduces new issues due to monetary costs or spamming. We address these issues by combining automatic named entity extraction algorithms with crowdsourcing into ahybrid approach. We have conducted awide range of experiments on real world data to identify aset of subtasks or operators, that can be performed either by the crowd or automatically. Results show that ameaningful combination of these operators into complex processing pipelines can significantly enhance the quality ofnamed entity extraction in challenging scenarios, while at the same time reducing the monetary costs of crowdsourcing and the risk of misuse.

Year: 2013
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.