76 research outputs found
Location Privacy in Spatial Crowdsourcing
Spatial crowdsourcing (SC) is a new platform that engages individuals in
collecting and analyzing environmental, social and other spatiotemporal
information. With SC, requesters outsource their spatiotemporal tasks to a set
of workers, who will perform the tasks by physically traveling to the tasks'
locations. This chapter identifies privacy threats toward both workers and
requesters during the two main phases of spatial crowdsourcing, tasking and
reporting. Tasking is the process of identifying which tasks should be assigned
to which workers. This process is handled by a spatial crowdsourcing server
(SC-server). The latter phase is reporting, in which workers travel to the
tasks' locations, complete the tasks and upload their reports to the SC-server.
The challenge is to enable effective and efficient tasking as well as reporting
in SC without disclosing the actual locations of workers (at least until they
agree to perform a task) and the tasks themselves (at least to workers who are
not assigned to those tasks). This chapter aims to provide an overview of the
state-of-the-art in protecting users' location privacy in spatial
crowdsourcing. We provide a comparative study of a diverse set of solutions in
terms of task publishing modes (push vs. pull), problem focuses (tasking and
reporting), threats (server, requester and worker), and underlying technical
approaches (from pseudonymity, cloaking, and perturbation to exchange-based and
encryption-based techniques). The strengths and drawbacks of the techniques are
highlighted, leading to a discussion of open problems and future work
SoK: Differentially Private Publication of Trajectory Data
Trajectory analysis holds many promises, from improvements in traffic management to routing advice or infrastructure development. However, learning users\u27 paths is extremely privacy-invasive. Therefore, there is a necessity to protect trajectories such that we preserve the global properties, useful for analysis, while specific and private information of individuals remains inaccessible. Trajectories, however, are difficult to protect, since they are sequential, highly dimensional, correlated, bound to geophysical restrictions, and easily mapped to semantic points of interest.
This paper aims to establish a systematic framework on protective masking and synthetic-generation measures for trajectory databases with syntactic and differentially private (DP) guarantees, including also utility properties, derived from ideas and limitations of existing proposals. To reach this goal, we systematize the utility metrics used throughout the literature, deeply analyze the DP granularity notions, explore and elaborate on the state of the art on privacy-enhancing mechanisms and their problems, and expose the main limitations of DP notions in the context of trajectories
SoK: differentially private publication of trajectory data
Trajectory analysis holds many promises, from improvements in traffic management to routing advice or infrastructure development. However, learning usersâ paths is extremely privacy-invasive. Therefore, there is a necessity to protect trajectories such that we preserve the global properties, useful for analysis, while specific and private information of individuals remains inaccessible. Trajectories, however, are difficult to protect, since they are sequential, highly dimensional, correlated, bound to geophysical restrictions, and easily mapped to semantic points of interest. This paper aims to establish a systematic framework on protective masking measures for trajectory databases with differentially private (DP) guarantees, including also utility properties, derived from ideas and limitations of existing proposals. To reach this goal, we systematize the utility metrics used throughout the literature, deeply analyze the DP granularity notions, explore and elaborate on the state of the art on privacy-enhancing mechanisms and their problems, and expose the main limitations of DP notions in the context of trajectories.We would like to thank the reviewers and shepherd for their useful comments and suggestions in the improvement of this paper. Javier Parra-Arnau is the recipient of a âRamĂłn y Cajalâ fellowship funded by the Spanish Ministry of Science and Innovation. This work also received support from âla Caixaâ Foundation (fellowship code LCF/BQ/PR20/11770009), the European Unionâs H2020 program (Marie SkĆodowska-Curie grant agreement â 847648) from the Government of Spain under the project âCOMPROMISEâ (PID2020-113795RB-C31/AEI/10.13039/501100011033), and from the BMBF project âPROPOLISâ (16KIS1393K). The authors at KIT are supported by KASTEL Security Research Labs (Topic 46.23 of the Helmholtz Association) and Germanyâs Excellence Strategy (EXC 2050/1 âCeTIâ; ID 390696704).Peer ReviewedPostprint (published version
Privacy-Preserving Trajectory Data Publishing via Differential Privacy
Over the past decade, the collection of data by individuals, businesses and government agencies has increased tremendously. Due to the widespread of mobile computing and the advances in location-acquisition techniques, an immense amount of data concerning the mobility of moving objects have been generated. The movement data of an object (e.g. individual) might include specific information about the locations it visited, the time those locations were visited, or both. While it is beneficial to share data for the purpose of mining and analysis, data sharing might risk the privacy of the individuals involved in the data. Privacy-Preserving Data Publishing (PPDP) provides techniques that utilize several privacy models for the purpose of publishing useful information while preserving data privacy.
The objective of this thesis is to answer the following question: How can a data owner publish trajectory data while simultaneously safeguarding the privacy of the data and maintaining its usefulness? We propose an algorithm for anonymizing and publishing trajectory data that ensures the output is differentially private while maintaining high utility and scalability. Our solution comprises a twofold approach. First, we generalize trajectories by generalizing and then partitioning the timestamps at each location in a differentially private manner. Next, we add noise to the real count of the generalized trajectories according to the given privacy budget to enforce differential privacy. As a result, our approach achieves an overall epsilon-differential privacy on the output trajectory data. We perform experimental evaluation on real-life data, and demonstrate that our proposed approach can effectively answer count and range queries, as well as mining frequent sequential patterns. We also show that our algorithm is efficient w.r.t. privacy budget and number of partitions, and also scalable with increasing data size
Recommended from our members
Privacy-preserving Sanitization in Data Sharing
In the era of big data, the prospect of analyzing, monitoring and investigating all sources of data starts to stand out in every aspect of our life. The benefit of such practices becomes concrete only when analysts or investigators have the information shared from data owners. However, privacy is one of the main barriers that disrupt the sharing behavior, due to the fear of disclosing sensitive information. This dissertation describes data sanitization methods that disguise the sensitive information before sharing a dataset and our criteria are always protecting privacy while preserving utility as much as possible.
In particular, we provide solutions for tasks that require different types of shared data. In the case of sharing partial content of a dataset, we consider the problem of releasing a database under retention restrictions such that the auditing job can still be carried out. While obeying a retention policy often results in the wholesale destruction of the audit log in existing solutions, our framework allows to expire data at a fine granularity and supports audit queries on a database with incompleteness. Secondly, in the case of sharing the entire dataset, we solve the problem of untrusted system evaluation using released database synthesis under differential privacy. Our synthetic database accurately preserves the core performance measures of a given query workload, and satisfies differential privacy with crucial extensions to multi-relation databases. Lastly, in the case of sharing derived information from the data source, we focus on distributing results of network modeling under differential privacy. Our mechanism can safely output estimated parameters of the exponential random graph model, by employing a decomposition of the estimation problem into two steps: getting private sufficient statistics first and then estimating the model parameters. We show that our privacy mechanism provides provably less error than common baselines and our redesigned estimation algorithm offers better accuracy
Machine learning and privacy preserving algorithms for spatial and temporal sensing
Sensing physical and social environments are ubiquitous in modern mobile phones,
IoT devices, and infrastructure-based settings. Information engraved in such
data, especially the time and location attributes have unprecedented potential
to characterize individual and crowd behaviour, natural and technological processes.
However, it is challenging to extract abstract knowledge from the data
due to its massive size, sequential structure, asynchronous operation, noisy characteristics,
privacy concerns, and real time analysis requirements. Therefore, the
primary goal of this thesis is to propose theoretically grounded and practically
useful algorithms to learn from location and time stamps in sensor data. The
proposed methods are inspired by tools from geometry, topology, and statistics.
They leverage structures in the temporal and spatial data by probabilistically
modeling noise, exploring topological structures embedded, and utilizing statistical
structure to protect personal information and simultaneously learn aggregate
information. Proposed algorithms are geared towards streaming and distributed
operation for efficiency. The usefulness of the methods is argued using mathematical
analysis and empirical experiments on real and artificial datasets
GLOVE: towards privacy-preserving publishing of record-level-truthful mobile phone trajectories
Datasets of mobile phone trajectories collected by network operators offer an unprecedented opportunity to discover new knowledge from the activity of large populations of millions. However, publishing such trajectories also raises significant privacy concerns, as they contain personal data in the form of individual movement patterns. Privacy risks induce network operators to enforce restrictive confidential agreements in the rare occasions when they grant access to collected trajectories, whereas a less involved circulation of these data would fuel research and enable reproducibility in many disciplines. In this work, we contribute a building block toward the design of privacy-preserving datasets of mobile phone trajectories that are truthful at the record level. We present GLOVE, an algorithm that implements k-anonymity, hence solving the crucial unicity problem that affects this type of data while ensuring that the anonymized trajectories correspond to real-life users. GLOVE builds on original insights about the root causes behind the undesirable unicity of mobile phone trajectories, and leverages generalization and suppression to remove them. Proof-of-concept validations with large-scale real-world datasets demonstrate that the approach adopted by GLOVE allows preserving a substantial level of accuracy in the data, higher than that granted by previous methodologies.This work was supported by the AtracciĂłn de Talento Investigador program of the Comunidad de Madrid under Grant No. 2019-T1/TIC-16037 NetSense
A Study on Privacy Preserving Data Publishing With Differential Privacy
In the era of digitization it is important to preserve privacy of various sensitive information available around us, e.g., personal information, different social communication and video streaming sites' and services' own users' private information, salary information and structure of an organization, census and statistical data of a country and so on. These data can be represented in different formats such as Numerical and Categorical data, Graph Data, Tree-Structured data and so on. For preventing these data from being illegally exploited and protect it from privacy threats, it is required to apply an efficient privacy model over sensitive data. There have been a great number of studies on privacy-preserving data publishing over the last decades. Differential Privacy (DP) is one of the state of the art methods for preserving privacy to a database. However, applying DP to high dimensional tabular data (Numerical and Categorical) is challenging in terms of required time, memory, and high frequency computational unit. A well-known solution is to reduce the dimension of the given database, keeping its originality and preserving relations among all of its entities. In this thesis, we propose PrivFuzzy, a simple and flexible differentially private method that can publish differentially private data after reducing their original dimension with the help of Fuzzy logic. Exploiting Fuzzy mapping, PrivFuzzy can (1) reduce database columns and create a new low dimensional correlated database, (2) inject noise to each attribute to ensure differential privacy on newly created low dimensional database, and (3) sample each entry in the database and release synthesized database. Existing literatures show the difficulty of applying differential privacy over a high dimensional dataset, which we overcame by proposing a novel fuzzy based approach (PrivFuzzy). By applying our novel fuzzy mapping technique, PrivFuzzy transforms a high dimensional dataset to an equivalent low dimensional one, without losing any relationship within the dataset. Our experiments with real data and comparison with the existing privacy preserving models, PrivBayes and PrivGene, show that our proposed approach PrivFuzzy outperforms existing solutions in terms of the strength of privacy preservation, simplicity and improving utility.
Preserving privacy of Graph structured data, at the time of making some of its part available, is still one of the major problems in preserving data privacy. Most of the present models had tried to solve this issue by coming up with complex solution, as well as mixed up with signal and noise, which make these solutions ineffective in real time use and practice. One of the state of the art solution is to apply differential privacy over the queries on graph data and its statistics. But the challenge to meet here is to reduce the error at the time of publishing the data as mechanism of Differential privacy adds a large amount of noise and introduces erroneous results which reduces the utility of data. In this thesis, we proposed an Expectation Maximization (EM) based novel differentially private model for graph dataset. By applying EM method iteratively in conjunction with Laplace mechanism our proposed private model applies differentially private noise over the result of several subgraph queries on a graph dataset. Besides, to ensure expected utility, by selecting a maximal noise level , our proposed system can generate noisy result with expected utility. Comparing with existing models for several subgraph counting queries, we claim that our proposed model can generate much less noise than the existing models to achieve expected utility and can still preserve privacy
Differentially Private Event Stream Filtering with an Application to Traffic Estimation
RĂSUMĂ Beaucoup de systĂšmes Ă grande Ă©chelle tels que les systĂšmes de transport intelligents, les rĂ©seaux intelligents ou les bĂątiments intelligents requiĂšrent que des individus contribuent leurs flux de donnĂ©es
privĂ©es afin dâamasser, stocker, manipuler et analyser les informations pour le traitement du signal et Ă des fins de prise de dĂ©cision. Dans un scĂ©nario typique, un essaim de capteurs produit des signaux dâentrĂ©e Ă valeurs discrĂštes dĂ©crivant lâoccurrence dâĂ©vĂ©nements relatifs Ă ces individus. En consĂ©quence, des statistiques utiles doivent ĂȘtre publiĂ©es continuellement et en temps rĂ©el. Cependant, cela peut engendrer une perte de confidentialitĂ© pour les utilisateurs. Cette thĂšse considĂšre le problĂšme de fournir des garanties de confidentialitĂ© diffĂ©rentielle pour ces systĂšmes multi-sorties multi-entrĂ©es fonctionnant en continu. En particulier, nous considĂ©rons la question de
confidentialitĂ© dans le contexte de la thĂ©orie des systĂšmes et nous Ă©tudions le problĂšme de gĂ©nĂ©ration de signaux qui respectent la confidentialitĂ© des utilisateurs qui activent les capteurs. Nous prĂ©sentons une nouvelle architecture dâestimation des flux de trafic prĂ©servant la confidentialitĂ© des conducteurs. Nous introduisons aussi une surveillance diffĂ©rentiellement confidentielle dâoccupation dans un bĂątiment Ă©quipĂ© dâun dense rĂ©seau de capteurs de dĂ©tection de mouvement, qui sera
utile par exemple pour commander le systĂšme HVAC.----------ABSTRACT Many large-scale systems such as intelligent transportation systems, smart grids or smart buildings
require individuals to contribute their private data streams in order to amass, store, manipulate and
analyze information for signal processing and decision-making purposes. In a typical scenario,
swarms of sensors produce discrete-valued input signals that describe the occurrence of events involving
these users and several statistics of interest need to be continuously published in real-time.
This can however engender a privacy loss for the users in exchange of the utility provided by the
application. This thesis considers the problem of providing dierential privacy guarantees for such
multi-input multi-output systems operating continuously. In particular, we consider the privacy issues
in a system theoretic context, and address the problem of releasing filtered signals that respect
the privacy of users who activate the sensors. As a result of this thesis we present a new architecture
for privacy preserving estimation of trac flows. We also introduce dierentially private monitoring
and forecasting occupancy in a building equipped with a dense network of motion detection sensors, which is useful for example to control its HVAC system
- âŠ