2,281 research outputs found

    Gradient descent for sparse rank-one matrix completion for crowd-sourced aggregation of sparsely interacting workers

    Full text link
    We consider worker skill estimation for the singlecoin Dawid-Skene crowdsourcing model. In practice skill-estimation is challenging because worker assignments are sparse and irregular due to the arbitrary, and uncontrolled availability of workers. We formulate skill estimation as a rank-one correlation-matrix completion problem, where the observed components correspond to observed label correlation between workers. We show that the correlation matrix can be successfully recovered and skills identifiable if and only if the sampling matrix (observed components) is irreducible and aperiodic. We then propose an efficient gradient descent scheme and show that skill estimates converges to the desired global optima for such sampling matrices. Our proof is original and the results are surprising in light of the fact that even the weighted rank-one matrix factorization problem is NP hard in general. Next we derive sample complexity bounds for the noisy case in terms of spectral properties of the signless Laplacian of the sampling matrix. Our proposed scheme achieves state-of-art performance on a number of real-world datasets.Published versio

    A survey of spatial crowdsourcing

    Get PDF

    Multi-modal Spatial Crowdsourcing for Enriching Spatial Datasets

    Get PDF

    Conflating point of interest (POI) data: A systematic review of matching methods

    Full text link
    Point of interest (POI) data provide digital representations of places in the real world, and have been increasingly used to understand human-place interactions, support urban management, and build smart cities. Many POI datasets have been developed, which often have different geographic coverages, attribute focuses, and data quality. From time to time, researchers may need to conflate two or more POI datasets in order to build a better representation of the places in the study areas. While various POI conflation methods have been developed, there lacks a systematic review, and consequently, it is difficult for researchers new to POI conflation to quickly grasp and use these existing methods. This paper fills such a gap. Following the protocol of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), we conduct a systematic review by searching through three bibliographic databases using reproducible syntax to identify related studies. We then focus on a main step of POI conflation, i.e., POI matching, and systematically summarize and categorize the identified methods. Current limitations and future opportunities are discussed afterwards. We hope that this review can provide some guidance for researchers interested in conflating POI datasets for their research

    Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data

    Full text link
    Paucity of large curated hand-labeled training data for every domain-of-interest forms a major bottleneck in the deployment of machine learning models in computer vision and other fields. Recent work (Data Programming) has shown how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. In this work, we present Adversarial Data Programming (ADP), which presents an adversarial methodology to generate data as well as a curated aggregated label has given a set of weak labeling functions. We validated our method on the MNIST, Fashion MNIST, CIFAR 10 and SVHN datasets, and it outperformed many state-of-the-art models. We conducted extensive experiments to study its usefulness, as well as showed how the proposed ADP framework can be used for transfer learning as well as multi-task learning, where data from two domains are generated simultaneously using the framework along with the label information. Our future work will involve understanding the theoretical implications of this new framework from a game-theoretic perspective, as well as explore the performance of the method on more complex datasets.Comment: CVPR 2018 main conference pape
    • …
    corecore