1,568 research outputs found
Crowd-Sourcing Fuzzy and Faceted Classification for Concept Search
Searching for concepts in science and technology is often a difficult task.
To facilitate concept search, different types of human-generated metadata have
been created to define the content of scientific and technical disclosures.
Classification schemes such as the International Patent Classification (IPC)
and MEDLINE's MeSH are structured and controlled, but require trained experts
and central management to restrict ambiguity (Mork, 2013). While unstructured
tags of folksonomies can be processed to produce a degree of structure
(Kalendar, 2010; Karampinas, 2012; Sarasua, 2012; Bragg, 2013) the freedom
enjoyed by the crowd typically results in less precision (Stock 2007).
Existing classification schemes suffer from inflexibility and ambiguity.
Since humans understand language, inference, implication, abstraction and hence
concepts better than computers, we propose to harness the collective wisdom of
the crowd. To do so, we propose a novel classification scheme that is
sufficiently intuitive for the crowd to use, yet powerful enough to facilitate
search by analogy, and flexible enough to deal with ambiguity. The system will
enhance existing classification information. Linking up with the semantic web
and computer intelligence, a Citizen Science effort (Good, 2013) would support
innovation by improving the quality of granted patents, reducing duplicitous
research, and stimulating problem-oriented solution design.
A prototype of our design is in preparation. A crowd-sourced fuzzy and
faceted classification scheme will allow for better concept search and improved
access to prior art in science and technology
Time-Sensitive Bayesian Information Aggregation for Crowdsourcing Systems
Crowdsourcing systems commonly face the problem of aggregating multiple
judgments provided by potentially unreliable workers. In addition, several
aspects of the design of efficient crowdsourcing processes, such as defining
worker's bonuses, fair prices and time limits of the tasks, involve knowledge
of the likely duration of the task at hand. Bringing this together, in this
work we introduce a new time--sensitive Bayesian aggregation method that
simultaneously estimates a task's duration and obtains reliable aggregations of
crowdsourced judgments. Our method, called BCCTime, builds on the key insight
that the time taken by a worker to perform a task is an important indicator of
the likely quality of the produced judgment. To capture this, BCCTime uses
latent variables to represent the uncertainty about the workers' completion
time, the tasks' duration and the workers' accuracy. To relate the quality of a
judgment to the time a worker spends on a task, our model assumes that each
task is completed within a latent time window within which all workers with a
propensity to genuinely attempt the labelling task (i.e., no spammers) are
expected to submit their judgments. In contrast, workers with a lower
propensity to valid labeling, such as spammers, bots or lazy labelers, are
assumed to perform tasks considerably faster or slower than the time required
by normal workers. Specifically, we use efficient message-passing Bayesian
inference to learn approximate posterior probabilities of (i) the confusion
matrix of each worker, (ii) the propensity to valid labeling of each worker,
(iii) the unbiased duration of each task and (iv) the true label of each task.
Using two real-world public datasets for entity linking tasks, we show that
BCCTime produces up to 11% more accurate classifications and up to 100% more
informative estimates of a task's duration compared to state-of-the-art
methods
- …