721 research outputs found
Evaluating Privacy-Friendly Mobility Analytics on Aggregate Location Data
Information about people's movements and the locations they visit enables a wide number of mobility analytics applications, e.g., real-time traffic maps or urban planning, aiming to improve quality of life in modern smart-cities. Alas, the availability of users' fine-grained location data reveals sensitive information about them such as home and work places, lifestyles, political or religious inclinations. In an attempt to mitigate this, aggregation is often employed as a strategy that allows analytics and machine learning tasks while protecting the privacy of individual users' location traces. In this thesis, we perform an end-to-end evaluation of crowdsourced privacy-friendly location aggregation aiming to understand its usefulness for analytics as well as its privacy implications towards users who contribute their data. First, we present a time-series methodology which, along with privacy-friendly crowdsourcing of aggregate locations, supports mobility analytics such as traffic forecasting and mobility anomaly detection. Next, we design quantification frameworks and methodologies that let us reason about the privacy loss stemming from the collection or release of aggregate location information against knowledgeable adversaries that aim to infer users' profiles, locations, or membership. We then utilize these frameworks to evaluate defenses ranging from generalization and hiding, to differential privacy, which can be employed to prevent inferences on aggregate location statistics, in terms of privacy protection as well as utility loss towards analytics tasks. Our results highlight that, while location aggregation is useful for mobility analytics, it is a weak privacy protection mechanism in this setting and that additional defenses can only protect privacy if some statistical utility is sacrificed. Overall, the tools presented in this thesis can be used by providers who desire to assess the quality of privacy protection before data release and its results have several implications about current location data practices and applications
Quantifying Sample Anonymity in Score-Based Generative Models with Adversarial Fingerprinting
Recent advances in score-based generative models have led to a huge spike in
the development of downstream applications using generative models ranging from
data augmentation over image and video generation to anomaly detection. Despite
publicly available trained models, their potential to be used for privacy
preserving data sharing has not been fully explored yet. Training diffusion
models on private data and disseminating the models and weights rather than the
raw dataset paves the way for innovative large-scale data-sharing strategies,
particularly in healthcare, where safeguarding patients' personal health
information is paramount. However, publishing such models without individual
consent of, e.g., the patients from whom the data was acquired, necessitates
guarantees that identifiable training samples will never be reproduced, thus
protecting personal health data and satisfying the requirements of policymakers
and regulatory bodies. This paper introduces a method for estimating the upper
bound of the probability of reproducing identifiable training images during the
sampling process. This is achieved by designing an adversarial approach that
searches for anatomic fingerprints, such as medical devices or dermal art,
which could potentially be employed to re-identify training images. Our method
harnesses the learned score-based model to estimate the probability of the
entire subspace of the score function that may be utilized for one-to-one
reproduction of training samples. To validate our estimates, we generate
anomalies containing a fingerprint and investigate whether generated samples
from trained generative models can be uniquely mapped to the original training
samples. Overall our results show that privacy-breaching images are reproduced
at sampling time if the models were trained without care.Comment: 10 pages, 6 figure
The role of Signal Processing in Meeting Privacy Challenges [an overview]
International audienceWith the increasing growth and sophistication of information technology, personal information is easily accessible electronically. This flood of released personal data raises important privacy concerns. However, electronic data sources exist to be used and have tremendous value (utility) to their users and collectors, leading to a tension between privacy and utility. This article aims to quantify that tension by means of an information-theoretic framework and motivate signal processing approaches to privacy problems. The framework is applied to a number of case studies to illustrate concretely how signal processing can be harnessed to provide data privacy
Adversarial Robustness in Unsupervised Machine Learning: A Systematic Review
As the adoption of machine learning models increases, ensuring robust models
against adversarial attacks is increasingly important. With unsupervised
machine learning gaining more attention, ensuring it is robust against attacks
is vital. This paper conducts a systematic literature review on the robustness
of unsupervised learning, collecting 86 papers. Our results show that most
research focuses on privacy attacks, which have effective defenses; however,
many attacks lack effective and general defensive measures. Based on the
results, we formulate a model on the properties of an attack on unsupervised
learning, contributing to future research by providing a model to use.Comment: 38 pages, 11 figure
- …