2 research outputs found
Noise or additional information? Leveraging crowdsource annotation item agreement for natural language tasks.
In order to reduce noise in training data, most natural language crowdsourcing an-notation tasks gather redundant labels and aggregate them into an integrated label, which is provided to the classifier. How-ever, aggregation discards potentially use-ful information from linguistically am-biguous instances. For five natural language tasks, we pass item agreement on to the task classifier via soft labeling and low-agreement filter-ing of the training dataset. We find a sta-tistically significant benefit from low item agreement training filtering in four of our five tasks, and no systematic benefit from soft labeling.