Crowdsourcing markets like Amazon's Mechanical Turk (MTurk) make it possible
to task people with small jobs, such as labeling images or looking up phone
numbers, via a programmatic interface. MTurk tasks for processing datasets with
humans are currently designed with significant reimplementation of common
workflows and ad-hoc selection of parameters such as price to pay per task. We
describe how we have integrated crowds into a declarative workflow engine
called Qurk to reduce the burden on workflow designers. In this paper, we focus
on how to use humans to compare items for sorting and joining data, two of the
most common operations in DBMSs. We describe our basic query interface and the
user interface of the tasks we post to MTurk. We also propose a number of
optimizations, including task batching, replacing pairwise comparisons with
numerical ratings, and pre-filtering tables before joining them, which
dramatically reduce the overall cost of running sorts and joins on the crowd.
In an experiment joining two sets of images, we reduce the overall cost from
67inanaiveimplementationtoabout3, without substantially affecting
accuracy or latency. In an end-to-end experiment, we reduced cost by a factor
of 14.5.Comment: VLDB201