11,477 research outputs found
Learning to Predict the Wisdom of Crowds
The problem of "approximating the crowd" is that of estimating the crowd's
majority opinion by querying only a subset of it. Algorithms that approximate
the crowd can intelligently stretch a limited budget for a crowdsourcing task.
We present an algorithm, "CrowdSense," that works in an online fashion to
dynamically sample subsets of labelers based on an exploration/exploitation
criterion. The algorithm produces a weighted combination of a subset of the
labelers' votes that approximates the crowd's opinion.Comment: Presented at Collective Intelligence conference, 2012
(arXiv:1204.2991
Wisely Using a Budget for Crowdsourcing
The problem of “approximating the crowd” is that of estimating the crowd’s majority opinion by querying only a subset of it. Algorithms that approximate the crowd can intelligently stretch a limited budget for a crowdsourcing task. We present an algorithm, “CrowdSense,” that works in an online fashion where examples come one at a time. Crowd-Sense dynamically samples subsets of labelers based on an exploration/exploitation criterion. The algorithm produces a weighted combination of a subset of the labelers’ votes that approximates the crowd’s opinion. We then introduce two variations of CrowdSense that make various distributional assumptions to handle distinct crowd characteristics. In particular, the first algorithm makes a statistical independence assumption of the probabilities for large crowds, whereas the second algorithm finds a lower bound on how often the current sub-crowd agrees with the crowd majority vote. Our experiments on CrowdSense and several baselines demonstrate that we can reliably approximate the entire crowd’s vote by collecting opinions from a representative subset of the crowd
A Full Probabilistic Model for Yes/No Type Crowdsourcing in Multi-Class Classification
Crowdsourcing has become widely used in supervised scenarios where training
sets are scarce and difficult to obtain. Most crowdsourcing models in the
literature assume labelers can provide answers to full questions. In
classification contexts, full questions require a labeler to discern among all
possible classes. Unfortunately, discernment is not always easy in realistic
scenarios. Labelers may not be experts in differentiating all classes. In this
work, we provide a full probabilistic model for a shorter type of queries. Our
shorter queries only require "yes" or "no" responses. Our model estimates a
joint posterior distribution of matrices related to labelers' confusions and
the posterior probability of the class of every object. We developed an
approximate inference approach, using Monte Carlo Sampling and Black Box
Variational Inference, which provides the derivation of the necessary
gradients. We built two realistic crowdsourcing scenarios to test our model.
The first scenario queries for irregular astronomical time-series. The second
scenario relies on the image classification of animals. We achieved results
that are comparable with those of full query crowdsourcing. Furthermore, we
show that modeling labelers' failures plays an important role in estimating
true classes. Finally, we provide the community with two real datasets obtained
from our crowdsourcing experiments. All our code is publicly available.Comment: SIAM International Conference on Data Mining (SDM19), 9 official
pages, 5 supplementary page
A Semi-Lagrangian scheme for a modified version of the Hughes model for pedestrian flow
In this paper we present a Semi-Lagrangian scheme for a regularized version
of the Hughes model for pedestrian flow. Hughes originally proposed a coupled
nonlinear PDE system describing the evolution of a large pedestrian group
trying to exit a domain as fast as possible. The original model corresponds to
a system of a conservation law for the pedestrian density and an Eikonal
equation to determine the weighted distance to the exit. We consider this model
in presence of small diffusion and discuss the numerical analysis of the
proposed Semi-Lagrangian scheme. Furthermore we illustrate the effect of small
diffusion on the exit time with various numerical experiments
Kernel-based high-dimensional histogram estimation for visual tracking
©2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or distribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.Presented at the 15th IEEE International Conference on Image Processing, October 12–15, 2008, San Diego, California, U.S.A.DOI: 10.1109/ICIP.2008.4711862We propose an approach for non-rigid tracking that represents objects by their set of distribution parameters. Compared to joint histogram representations, a set of parameters such as mixed moments provides a significantly reduced size representation. The discriminating power is comparable to that of the corresponding full high dimensional histogram yet at far less spatial and computational complexity. The proposed method is robust in the presence of noise and illumination changes, and provides a natural extension to the use of mixture models. Experiments demonstrate that the proposed method outperforms both full color mean-shift and global covariance searches
Entropic Wasserstein Gradient Flows
This article details a novel numerical scheme to approximate gradient flows
for optimal transport (i.e. Wasserstein) metrics. These flows have proved
useful to tackle theoretically and numerically non-linear diffusion equations
that model for instance porous media or crowd evolutions. These gradient flows
define a suitable notion of weak solutions for these evolutions and they can be
approximated in a stable way using discrete flows. These discrete flows are
implicit Euler time stepping according to the Wasserstein metric. A bottleneck
of these approaches is the high computational load induced by the resolution of
each step. Indeed, this corresponds to the resolution of a convex optimization
problem involving a Wasserstein distance to the previous iterate. Following
several recent works on the approximation of Wasserstein distances, we consider
a discrete flow induced by an entropic regularization of the transportation
coupling. This entropic regularization allows one to trade the initial
Wasserstein fidelity term for a Kulback-Leibler divergence, which is easier to
deal with numerically. We show how KL proximal schemes, and in particular
Dykstra's algorithm, can be used to compute each step of the regularized flow.
The resulting algorithm is both fast, parallelizable and versatile, because it
only requires multiplications by a Gibbs kernel. On Euclidean domains
discretized on an uniform grid, this corresponds to a linear filtering (for
instance a Gaussian filtering when is the squared Euclidean distance) which
can be computed in nearly linear time. On more general domains, such as
(possibly non-convex) shapes or on manifolds discretized by a triangular mesh,
following a recently proposed numerical scheme for optimal transport, this
Gibbs kernel multiplication is approximated by a short-time heat diffusion
Information Gathering with Peers: Submodular Optimization with Peer-Prediction Constraints
We study a problem of optimal information gathering from multiple data
providers that need to be incentivized to provide accurate information. This
problem arises in many real world applications that rely on crowdsourced data
sets, but where the process of obtaining data is costly. A notable example of
such a scenario is crowd sensing. To this end, we formulate the problem of
optimal information gathering as maximization of a submodular function under a
budget constraint, where the budget represents the total expected payment to
data providers. Contrary to the existing approaches, we base our payments on
incentives for accuracy and truthfulness, in particular, {\em peer-prediction}
methods that score each of the selected data providers against its best peer,
while ensuring that the minimum expected payment is above a given threshold. We
first show that the problem at hand is hard to approximate within a constant
factor that is not dependent on the properties of the payment function.
However, for given topological and analytical properties of the instance, we
construct two greedy algorithms, respectively called PPCGreedy and
PPCGreedyIter, and establish theoretical bounds on their performance w.r.t. the
optimal solution. Finally, we evaluate our methods using a realistic crowd
sensing testbed.Comment: Longer version of AAAI'18 pape
- …