1,361 research outputs found

    LoPub: High-Dimensional Crowdsourced Data Publication with Local Differential Privacy

    Get PDF
    High-dimensional crowdsourced data collected from numerous users produces rich knowledge about our society. However, it also brings unprecedented privacy threats to the participants. Local differential privacy (LDP), a variant of differential privacy, is recently proposed as a state-of-the-art privacy notion. Unfortunately, achieving LDP on high-dimensional crowdsourced data publication raises great challenges in terms of both computational efficiency and data utility. To this end, based on Expectation Maximization (EM) algorithm and Lasso regression, we first propose efficient multi-dimensional joint distribution estimation algorithms with LDP. Then, we develop a Local differentially private high-dimensional data Publication algorithm, LoPub, by taking advantage of our distribution estimation techniques. In particular, correlations among multiple attributes are identified to reduce the dimensionality of crowdsourced data, thus speeding up the distribution learning process and achieving high data utility. Extensive experiments on realworld datasets demonstrate that our multivariate distribution estimation scheme significantly outperforms existing estimation schemes in terms of both communication overhead and estimation speed. Moreover, LoPub can keep, on average, 80% and 60% accuracy over the released datasets in terms of SVM and random forest classification, respectively

    Quality of Information in Mobile Crowdsensing: Survey and Research Challenges

    Full text link
    Smartphones have become the most pervasive devices in people's lives, and are clearly transforming the way we live and perceive technology. Today's smartphones benefit from almost ubiquitous Internet connectivity and come equipped with a plethora of inexpensive yet powerful embedded sensors, such as accelerometer, gyroscope, microphone, and camera. This unique combination has enabled revolutionary applications based on the mobile crowdsensing paradigm, such as real-time road traffic monitoring, air and noise pollution, crime control, and wildlife monitoring, just to name a few. Differently from prior sensing paradigms, humans are now the primary actors of the sensing process, since they become fundamental in retrieving reliable and up-to-date information about the event being monitored. As humans may behave unreliably or maliciously, assessing and guaranteeing Quality of Information (QoI) becomes more important than ever. In this paper, we provide a new framework for defining and enforcing the QoI in mobile crowdsensing, and analyze in depth the current state-of-the-art on the topic. We also outline novel research challenges, along with possible directions of future work.Comment: To appear in ACM Transactions on Sensor Networks (TOSN

    Heteroscedastic Gaussian processes for uncertainty modeling in large-scale crowdsourced traffic data

    Full text link
    Accurately modeling traffic speeds is a fundamental part of efficient intelligent transportation systems. Nowadays, with the widespread deployment of GPS-enabled devices, it has become possible to crowdsource the collection of speed information to road users (e.g. through mobile applications or dedicated in-vehicle devices). Despite its rather wide spatial coverage, crowdsourced speed data also brings very important challenges, such as the highly variable measurement noise in the data due to a variety of driving behaviors and sample sizes. When not properly accounted for, this noise can severely compromise any application that relies on accurate traffic data. In this article, we propose the use of heteroscedastic Gaussian processes (HGP) to model the time-varying uncertainty in large-scale crowdsourced traffic data. Furthermore, we develop a HGP conditioned on sample size and traffic regime (SRC-HGP), which makes use of sample size information (probe vehicles per minute) as well as previous observed speeds, in order to more accurately model the uncertainty in observed speeds. Using 6 months of crowdsourced traffic data from Copenhagen, we empirically show that the proposed heteroscedastic models produce significantly better predictive distributions when compared to current state-of-the-art methods for both speed imputation and short-term forecasting tasks.Comment: 22 pages, Transportation Research Part C: Emerging Technologies (Elsevier

    A survey of spatial crowdsourcing

    Get PDF

    Trust-Based Fusion of Untrustworthy Information in Crowdsourcing Applications

    No full text
    In this paper, we address the problem of fusing untrustworthy reports provided from a crowd of observers, while simultaneously learning the trustworthiness of individuals. To achieve this, we construct a likelihood model of the userss trustworthiness by scaling the uncertainty of its multiple estimates with trustworthiness parameters. We incorporate our trust model into a fusion method that merges estimates based on the trust parameters and we provide an inference algorithm that jointly computes the fused output and the individual trustworthiness of the users based on the maximum likelihood framework. We apply our algorithm to cell tower localisation using real-world data from the OpenSignal project and we show that it outperforms the state-of-the-art methods in both accuracy, by up to 21%, and consistency, by up to 50% of its predictions. Copyright © 2013, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved

    Linear and Range Counting under Metric-based Local Differential Privacy

    Full text link
    Local differential privacy (LDP) enables private data sharing and analytics without the need for a trusted data collector. Error-optimal primitives (for, e.g., estimating means and item frequencies) under LDP have been well studied. For analytical tasks such as range queries, however, the best known error bound is dependent on the domain size of private data, which is potentially prohibitive. This deficiency is inherent as LDP protects the same level of indistinguishability between any pair of private data values for each data downer. In this paper, we utilize an extension of ϵ\epsilon-LDP called Metric-LDP or EE-LDP, where a metric EE defines heterogeneous privacy guarantees for different pairs of private data values and thus provides a more flexible knob than ϵ\epsilon does to relax LDP and tune utility-privacy trade-offs. We show that, under such privacy relaxations, for analytical workloads such as linear counting, multi-dimensional range counting queries, and quantile queries, we can achieve significant gains in utility. In particular, for range queries under EE-LDP where the metric EE is the L1L^1-distance function scaled by ϵ\epsilon, we design mechanisms with errors independent on the domain sizes; instead, their errors depend on the metric EE, which specifies in what granularity the private data is protected. We believe that the primitives we design for EE-LDP will be useful in developing mechanisms for other analytical tasks, and encourage the adoption of LDP in practice

    Crowdsourcing traffic data for travel time estimation

    Get PDF
    Travel time estimation is a fundamental measure used in routing and navigation applications, in particular in emerging intelligent transportation systems (ITS). For example, many users may prefer the fastest route to their destination and would rely on real-time predicted travel times. It also helps real-time traffic management and traffic light control. Accurate estimation of travel time requires collecting a lot of real-time data from road networks. This data can be collected using a wide variety of sources like inductive loop detectors, video cameras, radio frequency identification (RFID) transponders etc. But these systems include deployment of infrastructure which has some limitations and drawbacks. The main drawbacks in these modes are the high cost and the high probability of error caused by prevalence of equipment malfunctions and in the case of sensor based methods, the problem of spatial coverage.;As an alternative to traditional way of collecting data using expensive equipment, development of cellular & mobile technology allows for leveraging embedded GPS sensors in smartphones carried by millions of road users. Crowd-sourcing GPS data will allow building traffic monitoring systems that utilize this opportunity for the purpose of accurate and real-time prediction of traffic measures. However, the effectiveness of these systems have not yet been proven or shown in real applications. In this thesis, we study some of the current available data sets and identify the requirements for accurate prediction. In our work, we propose the design for a crowd-sourcing traffic application, including an android-based mobile client and a server architecture. We also develop map-matching method. More importantly, we present prediction methods using machine learning techniques such as support vector regression.;Machine learning provides an alternative to traditional statistical method such as using averaged historic data for estimation of travel time. Machine Learning techniques played a key role in estimation in the last two decades. They are proved by providing better accuracy in estimation and in classification. However, employing a machine learning technique in any application requires creative modeling of the system and its sensory data. In this thesis, we model the road network as a graph and train different models for different links on the road. Modeling a road network as graph with nodes and links enables the learner to capture patterns occurring on each segment of road, thereby providing better accuracy. To evaluate the prediction models, we use three sets of data out of which two sets are collected using mobile probing and one set is generated using VISSIM traffic simulator. The results show that crowdsourcing is only more accurate than traditional statistical methods if the input values for input data are very close to the actual values. In particular, when speed of vehicles on a link are concerned, we need to provide the machine learning model with data that is only few minutes old; using average speed of vehicles, for example from the past half hour, as is usually seen in many web based traffic information sources may not allow for better performance
    corecore