5,103 research outputs found
Crowd-ML: A Privacy-Preserving Learning Framework for a Crowd of Smart Devices
Smart devices with built-in sensors, computational capabilities, and network
connectivity have become increasingly pervasive. The crowds of smart devices
offer opportunities to collectively sense and perform computing tasks in an
unprecedented scale. This paper presents Crowd-ML, a privacy-preserving machine
learning framework for a crowd of smart devices, which can solve a wide range
of learning problems for crowdsensing data with differential privacy
guarantees. Crowd-ML endows a crowdsensing system with an ability to learn
classifiers or predictors online from crowdsensing data privately with minimal
computational overheads on devices and servers, suitable for a practical and
large-scale employment of the framework. We analyze the performance and the
scalability of Crowd-ML, and implement the system with off-the-shelf
smartphones as a proof of concept. We demonstrate the advantages of Crowd-ML
with real and simulated experiments under various conditions
Generating Artificial Data for Private Deep Learning
In this paper, we propose generating artificial data that retain statistical
properties of real data as the means of providing privacy with respect to the
original dataset. We use generative adversarial network to draw
privacy-preserving artificial data samples and derive an empirical method to
assess the risk of information disclosure in a differential-privacy-like way.
Our experiments show that we are able to generate artificial data of high
quality and successfully train and validate machine learning models on this
data while limiting potential privacy loss.Comment: Privacy-Enhancing Artificial Intelligence and Language Technologies,
AAAI Spring Symposium Series, 201
PrivCheck: Privacy-Preserving Check-in Data Publishing for Personalized Location Based Services
International audienceWith the widespread adoption of smartphones, we have observed an increasing popularity of Location-Based Services (LBSs) in the past decade. To improve user experience, LBSs often provide personalized recommendations to users by mining their activity (i.e., check-in) data from location-based social networks. However, releasing user check-in data makes users vulnerable to inference attacks, as private data (e.g., gender) can often be inferred from the users'check-in data. In this paper, we propose PrivCheck, a customizable and continuous privacy-preserving check-in data publishing framework providing users with continuous privacy protection against inference attacks. The key idea of PrivCheck is to obfuscate user check-in data such that the privacy leakage of user-specified private data is minimized under a given data distortion budget, which ensures the utility of the obfuscated data to empower personalized LBSs. Since users often give LBS providers access to both their historical check-in data and future check-in streams, we develop two data obfuscation methods for historical and online check-in publishing, respectively. An empirical evaluation on two real-world datasets shows that our framework can efficiently provide effective and continuous protection of user-specified private data, while still preserving the utility of the obfuscated data for personalized LBS
Lessons Learned: Surveying the Practicality of Differential Privacy in the Industry
Since its introduction in 2006, differential privacy has emerged as a
predominant statistical tool for quantifying data privacy in academic works.
Yet despite the plethora of research and open-source utilities that have
accompanied its rise, with limited exceptions, differential privacy has failed
to achieve widespread adoption in the enterprise domain. Our study aims to shed
light on the fundamental causes underlying this academic-industrial utilization
gap through detailed interviews of 24 privacy practitioners across 9 major
companies. We analyze the results of our survey to provide key findings and
suggestions for companies striving to improve privacy protection in their data
workflows and highlight the necessary and missing requirements of existing
differential privacy tools, with the goal of guiding researchers working
towards the broader adoption of differential privacy. Our findings indicate
that analysts suffer from lengthy bureaucratic processes for requesting access
to sensitive data, yet once granted, only scarcely-enforced privacy policies
stand between rogue practitioners and misuse of private information. We thus
argue that differential privacy can significantly improve the processes of
requesting and conducting data exploration across silos, and conclude that with
a few of the improvements suggested herein, the practical use of differential
privacy across the enterprise is within striking distance
Assessing Data Usefulness for Failure Analysis in Anonymized System Logs
System logs are a valuable source of information for the analysis and
understanding of systems behavior for the purpose of improving their
performance. Such logs contain various types of information, including
sensitive information. Information deemed sensitive can either directly be
extracted from system log entries by correlation of several log entries, or can
be inferred from the combination of the (non-sensitive) information contained
within system logs with other logs and/or additional datasets. The analysis of
system logs containing sensitive information compromises data privacy.
Therefore, various anonymization techniques, such as generalization and
suppression have been employed, over the years, by data and computing centers
to protect the privacy of their users, their data, and the system as a whole.
Privacy-preserving data resulting from anonymization via generalization and
suppression may lead to significantly decreased data usefulness, thus,
hindering the intended analysis for understanding the system behavior.
Maintaining a balance between data usefulness and privacy preservation,
therefore, remains an open and important challenge. Irreversible encoding of
system logs using collision-resistant hashing algorithms, such as SHAKE-128, is
a novel approach previously introduced by the authors to mitigate data privacy
concerns. The present work describes a study of the applicability of the
encoding approach from earlier work on the system logs of a production high
performance computing system. Moreover, a metric is introduced to assess the
data usefulness of the anonymized system logs to detect and identify the
failures encountered in the system.Comment: 11 pages, 3 figures, submitted to 17th IEEE International Symposium
on Parallel and Distributed Computin
Protecting privacy of semantic trajectory
The growing ubiquity of GPS-enabled devices in everyday life has made large-scale collection of trajectories feasible, providing ever-growing opportunities for human movement analysis. However, publishing this vulnerable data is accompanied by increasing concerns about individuals’ geoprivacy. This thesis has two objectives: (1) propose a privacy protection framework for semantic trajectories and (2) develop a Python toolbox in ArcGIS Pro environment for non-expert users to enable them to anonymize trajectory data. The former aims to prevent users’ re-identification when knowing the important locations or any random spatiotemporal points of users by swapping their important locations to new locations with the same semantics and unlinking the users from their trajectories. This is accomplished by converting GPS points into sequences of visited meaningful locations and moves and integrating several anonymization techniques. The second component of this thesis implements privacy protection in a way that even users without deep knowledge of anonymization and coding skills can anonymize their data by offering an all-in-one toolbox. By proposing and implementing this framework and toolbox, we hope that trajectory privacy is better protected in research
PA-iMFL: Communication-Efficient Privacy Amplification Method against Data Reconstruction Attack in Improved Multi-Layer Federated Learning
Recently, big data has seen explosive growth in the Internet of Things (IoT).
Multi-layer FL (MFL) based on cloud-edge-end architecture can promote model
training efficiency and model accuracy while preserving IoT data privacy. This
paper considers an improved MFL, where edge layer devices own private data and
can join the training process. iMFL can improve edge resource utilization and
also alleviate the strict requirement of end devices, but suffers from the
issues of Data Reconstruction Attack (DRA) and unacceptable communication
overhead. This paper aims to address these issues with iMFL. We propose a
Privacy Amplification scheme on iMFL (PA-iMFL). Differing from standard MFL, we
design privacy operations in end and edge devices after local training,
including three sequential components, local differential privacy with Laplace
mechanism, privacy amplification subsample, and gradient sign reset.
Benefitting from privacy operations, PA-iMFL reduces communication overhead and
achieves privacy-preserving. Extensive results demonstrate that against
State-Of-The-Art (SOTA) DRAs, PA-iMFL can effectively mitigate private data
leakage and reach the same level of protection capability as the SOTA defense
model. Moreover, due to adopting privacy operations in edge devices, PA-iMFL
promotes up to 2.8 times communication efficiency than the SOTA compression
method without compromising model accuracy.Comment: 12 pages, 11 figure
SoK: Chasing Accuracy and Privacy, and Catching Both in Differentially Private Histogram Publication
Histograms and synthetic data are of key importance in data analysis. However, researchers have shown that even aggregated data such as histograms, containing no obvious sensitive attributes, can result in privacy leakage. To enable data analysis, a strong notion of privacy is required to avoid risking unintended privacy violations.Such a strong notion of privacy is differential privacy, a statistical notion of privacy that makes privacy leakage quantifiable. The caveat regarding differential privacy is that while it has strong guarantees for privacy, privacy comes at a cost of accuracy. Despite this trade-off being a central and important issue in the adoption of differential privacy, there exists a gap in the literature regarding providing an understanding of the trade-off and how to address it appropriately. Through a systematic literature review (SLR), we investigate the state-of-the-art within accuracy improving differentially private algorithms for histogram and synthetic data publishing. Our contribution is two-fold: 1) we identify trends and connections in the contributions to the field of differential privacy for histograms and synthetic data and 2) we provide an understanding of the privacy/accuracy trade-off challenge by crystallizing different dimensions to accuracy improvement. Accordingly, we position and visualize the ideas in relation to each other and external work, and deconstruct each algorithm to examine the building blocks separately with the aim of pinpointing which dimension of accuracy improvement each technique/approach is targeting. Hence, this systematization of knowledge (SoK) provides an understanding of in which dimensions and how accuracy improvement can be pursued without sacrificing privacy
- …