6,337 research outputs found
Time-Sensitive Bayesian Information Aggregation for Crowdsourcing Systems
Crowdsourcing systems commonly face the problem of aggregating multiple
judgments provided by potentially unreliable workers. In addition, several
aspects of the design of efficient crowdsourcing processes, such as defining
worker's bonuses, fair prices and time limits of the tasks, involve knowledge
of the likely duration of the task at hand. Bringing this together, in this
work we introduce a new time--sensitive Bayesian aggregation method that
simultaneously estimates a task's duration and obtains reliable aggregations of
crowdsourced judgments. Our method, called BCCTime, builds on the key insight
that the time taken by a worker to perform a task is an important indicator of
the likely quality of the produced judgment. To capture this, BCCTime uses
latent variables to represent the uncertainty about the workers' completion
time, the tasks' duration and the workers' accuracy. To relate the quality of a
judgment to the time a worker spends on a task, our model assumes that each
task is completed within a latent time window within which all workers with a
propensity to genuinely attempt the labelling task (i.e., no spammers) are
expected to submit their judgments. In contrast, workers with a lower
propensity to valid labeling, such as spammers, bots or lazy labelers, are
assumed to perform tasks considerably faster or slower than the time required
by normal workers. Specifically, we use efficient message-passing Bayesian
inference to learn approximate posterior probabilities of (i) the confusion
matrix of each worker, (ii) the propensity to valid labeling of each worker,
(iii) the unbiased duration of each task and (iv) the true label of each task.
Using two real-world public datasets for entity linking tasks, we show that
BCCTime produces up to 11% more accurate classifications and up to 100% more
informative estimates of a task's duration compared to state-of-the-art
methods
Privacy-Friendly Mobility Analytics using Aggregate Location Data
Location data can be extremely useful to study commuting patterns and
disruptions, as well as to predict real-time traffic volumes. At the same time,
however, the fine-grained collection of user locations raises serious privacy
concerns, as this can reveal sensitive information about the users, such as,
life style, political and religious inclinations, or even identities. In this
paper, we study the feasibility of crowd-sourced mobility analytics over
aggregate location information: users periodically report their location, using
a privacy-preserving aggregation protocol, so that the server can only recover
aggregates -- i.e., how many, but not which, users are in a region at a given
time. We experiment with real-world mobility datasets obtained from the
Transport For London authority and the San Francisco Cabs network, and present
a novel methodology based on time series modeling that is geared to forecast
traffic volumes in regions of interest and to detect mobility anomalies in
them. In the presence of anomalies, we also make enhanced traffic volume
predictions by feeding our model with additional information from correlated
regions. Finally, we present and evaluate a mobile app prototype, called
Mobility Data Donors (MDD), in terms of computation, communication, and energy
overhead, demonstrating the real-world deployability of our techniques.Comment: Published at ACM SIGSPATIAL 201
Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services
It is universal to see people obtain knowledge on micro-blog services by
asking others decision making questions. In this paper, we study the Jury
Selection Problem(JSP) by utilizing crowdsourcing for decision making tasks on
micro-blog services. Specifically, the problem is to enroll a subset of crowd
under a limited budget, whose aggregated wisdom via Majority Voting scheme has
the lowest probability of drawing a wrong answer(Jury Error Rate-JER). Due to
various individual error-rates of the crowd, the calculation of JER is
non-trivial. Firstly, we explicitly state that JER is the probability when the
number of wrong jurors is larger than half of the size of a jury. To avoid the
exponentially increasing calculation of JER, we propose two efficient
algorithms and an effective bounding technique. Furthermore, we study the Jury
Selection Problem on two crowdsourcing models, one is for altruistic
users(AltrM) and the other is for incentive-requiring users(PayM) who require
extra payment when enrolled into a task. For the AltrM model, we prove the
monotonicity of JER on individual error rate and propose an efficient exact
algorithm for JSP. For the PayM model, we prove the NP-hardness of JSP on PayM
and propose an efficient greedy-based heuristic algorithm. Finally, we conduct
a series of experiments to investigate the traits of JSP, and validate the
efficiency and effectiveness of our proposed algorithms on both synthetic and
real micro-blog data.Comment: VLDB201
Agent-based pedestrian modelling
When the focus of interest in geographical systems is at the very fine scale, at the level of
streets and buildings for example, movement becomes central to simulations of how spatial
activities are used and develop. Recent advances in computing power and the acquisition of
fine scale digital data now mean that we are able to attempt to understand and predict such
phenomena with the focus in spatial modelling changing to dynamic simulations of the
individual and collective behaviour of individual decision-making at such scales. In this
Chapter, we develop ideas about how such phenomena can be modelled showing first how
randomness and geometry are all important to local movement and how ordered spatial
structures emerge from such actions. We focus on developing these ideas for pedestrians
showing how random walks constrained by geometry but aided by what agents can see,
determine how individuals respond to locational patterns. We illustrate these ideas with three
types of example: first for local scale street scenes where congestion and flocking is all
important, second for coarser scale shopping centres such as malls where economic
preference interferes much more with local geometry, and finally for semi-organised street
festivals where management and control by police and related authorities is integral to the
way crowds move
- …