2,281 research outputs found
Gradient descent for sparse rank-one matrix completion for crowd-sourced aggregation of sparsely interacting workers
We consider worker skill estimation for the singlecoin
Dawid-Skene crowdsourcing model. In
practice skill-estimation is challenging because
worker assignments are sparse and irregular due
to the arbitrary, and uncontrolled availability of
workers. We formulate skill estimation as a
rank-one correlation-matrix completion problem,
where the observed components correspond to
observed label correlation between workers. We
show that the correlation matrix can be successfully
recovered and skills identifiable if and only
if the sampling matrix (observed components) is
irreducible and aperiodic. We then propose an
efficient gradient descent scheme and show that
skill estimates converges to the desired global optima
for such sampling matrices. Our proof is
original and the results are surprising in light of
the fact that even the weighted rank-one matrix
factorization problem is NP hard in general. Next
we derive sample complexity bounds for the noisy
case in terms of spectral properties of the signless
Laplacian of the sampling matrix. Our proposed
scheme achieves state-of-art performance on a
number of real-world datasets.Published versio
Conflating point of interest (POI) data: A systematic review of matching methods
Point of interest (POI) data provide digital representations of places in the
real world, and have been increasingly used to understand human-place
interactions, support urban management, and build smart cities. Many POI
datasets have been developed, which often have different geographic coverages,
attribute focuses, and data quality. From time to time, researchers may need to
conflate two or more POI datasets in order to build a better representation of
the places in the study areas. While various POI conflation methods have been
developed, there lacks a systematic review, and consequently, it is difficult
for researchers new to POI conflation to quickly grasp and use these existing
methods. This paper fills such a gap. Following the protocol of Preferred
Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), we conduct a
systematic review by searching through three bibliographic databases using
reproducible syntax to identify related studies. We then focus on a main step
of POI conflation, i.e., POI matching, and systematically summarize and
categorize the identified methods. Current limitations and future opportunities
are discussed afterwards. We hope that this review can provide some guidance
for researchers interested in conflating POI datasets for their research
Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data
Paucity of large curated hand-labeled training data for every
domain-of-interest forms a major bottleneck in the deployment of machine
learning models in computer vision and other fields. Recent work (Data
Programming) has shown how distant supervision signals in the form of labeling
functions can be used to obtain labels for given data in near-constant time. In
this work, we present Adversarial Data Programming (ADP), which presents an
adversarial methodology to generate data as well as a curated aggregated label
has given a set of weak labeling functions. We validated our method on the
MNIST, Fashion MNIST, CIFAR 10 and SVHN datasets, and it outperformed many
state-of-the-art models. We conducted extensive experiments to study its
usefulness, as well as showed how the proposed ADP framework can be used for
transfer learning as well as multi-task learning, where data from two domains
are generated simultaneously using the framework along with the label
information. Our future work will involve understanding the theoretical
implications of this new framework from a game-theoretic perspective, as well
as explore the performance of the method on more complex datasets.Comment: CVPR 2018 main conference pape
- …