Search CORE

2,930 research outputs found

Privacy-Friendly Mobility Analytics using Aggregate Location Data

Author: De Cristofaro Emiliano
Pyrgelis Apostolos
Ross Gordon
Publication venue
Publication date: 09/10/2016
Field of study

Location data can be extremely useful to study commuting patterns and disruptions, as well as to predict real-time traffic volumes. At the same time, however, the fine-grained collection of user locations raises serious privacy concerns, as this can reveal sensitive information about the users, such as, life style, political and religious inclinations, or even identities. In this paper, we study the feasibility of crowd-sourced mobility analytics over aggregate location information: users periodically report their location, using a privacy-preserving aggregation protocol, so that the server can only recover aggregates -- i.e., how many, but not which, users are in a region at a given time. We experiment with real-world mobility datasets obtained from the Transport For London authority and the San Francisco Cabs network, and present a novel methodology based on time series modeling that is geared to forecast traffic volumes in regions of interest and to detect mobility anomalies in them. In the presence of anomalies, we also make enhanced traffic volume predictions by feeding our model with additional information from correlated regions. Finally, we present and evaluate a mobile app prototype, called Mobility Data Donors (MDD), in terms of computation, communication, and energy overhead, demonstrating the real-world deployability of our techniques.Comment: Published at ACM SIGSPATIAL 201

arXiv.org e-Print Archive

Measuring Membership Privacy on Aggregate Location Time-Series

Author: De Cristofaro Emiliano
Pyrgelis Apostolos
Troncoso Carmela
Publication venue
Publication date: 27/04/2020
Field of study

While location data is extremely valuable for various applications, disclosing it prompts serious threats to individuals' privacy. To limit such concerns, organizations often provide analysts with aggregate time-series that indicate, e.g., how many people are in a location at a time interval, rather than raw individual traces. In this paper, we perform a measurement study to understand Membership Inference Attacks (MIAs) on aggregate location time-series, where an adversary tries to infer whether a specific user contributed to the aggregates. We find that the volume of contributed data, as well as the regularity and particularity of users' mobility patterns, play a crucial role in the attack's success. We experiment with a wide range of defenses based on generalization, hiding, and perturbation, and evaluate their ability to thwart the attack vis-a-vis the utility loss they introduce for various mobility analytics tasks. Our results show that some defenses fail across the board, while others work for specific tasks on aggregate location time-series. For instance, suppressing small counts can be used for ranking hotspots, data generalization for forecasting traffic, hotspot discovery, and map inference, while sampling is effective for location labeling and anomaly detection when the dataset is sparse. Differentially private techniques provide reasonable accuracy only in very specific settings, e.g., discovering hotspots and forecasting their traffic, and more so when using weaker privacy notions like crowd-blending privacy. Overall, our measurements show that there does not exist a unique generic defense that can preserve the utility of the analytics for arbitrary applications, and provide useful insights regarding the disclosure of sanitized aggregate location time-series

arXiv.org e-Print Archive

Machine Learning Models that Remember Too Much

Author: Biggio B.
Bugiel S.
Dinh T. T. A.
Fredrikson M.
Graham-Cumming J.
Han S.
Kloft M.
Krizhevsky A.
Krizhevsky A.
Lin Z.
Lowd D.
Maas A. L.
Ohrimenko O.
Schuster F.
Torres-Arias S.
Vapnik V.
Zhang C.
Publication venue
Publication date: 22/09/2017
Field of study

Machine learning (ML) is becoming a commodity. Numerous ML frameworks and services are available to data holders who are not ML experts but want to train predictive models on their data. It is important that ML models trained on sensitive inputs (e.g., personal images or documents) not leak too much information about the training data. We consider a malicious ML provider who supplies model-training code to the data holder, does not observe the training, but then obtains white- or black-box access to the resulting model. In this setting, we design and implement practical algorithms, some of them very similar to standard ML techniques such as regularization and data augmentation, that "memorize" information about the training dataset in the model yet the model is as accurate and predictive as a conventionally trained model. We then explain how the adversary can extract memorized information from the model. We evaluate our techniques on standard ML tasks for image classification (CIFAR10), face recognition (LFW and FaceScrub), and text analysis (20 Newsgroups and IMDB). In all cases, we show how our algorithms create models that have high predictive power yet allow accurate extraction of subsets of their training data

arXiv.org e-Print Archive

Privacy Meets Explainability: A Comprehensive Impact Benchmark

Author: Ahmed Sheraz
Dengel Andreas
Lucieri Adriano
Mercier Dominique
Saifullah Saifullah
Publication venue
Publication date: 08/11/2022
Field of study

Since the mid-10s, the era of Deep Learning (DL) has continued to this day, bringing forth new superlatives and innovations each year. Nevertheless, the speed with which these innovations translate into real applications lags behind this fast pace. Safety-critical applications, in particular, underlie strict regulatory and ethical requirements which need to be taken care of and are still active areas of debate. eXplainable AI (XAI) and privacy-preserving machine learning (PPML) are both crucial research fields, aiming at mitigating some of the drawbacks of prevailing data-hungry black-box models in DL. Despite brisk research activity in the respective fields, no attention has yet been paid to their interaction. This work is the first to investigate the impact of private learning techniques on generated explanations for DL-based models. In an extensive experimental analysis covering various image and time series datasets from multiple domains, as well as varying privacy techniques, XAI methods, and model architectures, the effects of private training on generated explanations are studied. The findings suggest non-negligible changes in explanations through the introduction of privacy. Apart from reporting individual effects of PPML on XAI, the paper gives clear recommendations for the choice of techniques in real applications. By unveiling the interdependencies of these pivotal technologies, this work is a first step towards overcoming the remaining hurdles for practically applicable AI in safety-critical domains.Comment: Under Submissio

arXiv.org e-Print Archive

Supporting Regularized Logistic Regression Privately and Efficiently

Author: Li Wenfa
Liu Hongzhe
Xie Wei
Yang Peng
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 30/09/2015
Field of study

As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Increasing concerns over data privacy make it more and more difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used machine learning model in various disciplines while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluation on several studies validated the privacy guarantees, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc

arXiv.org e-Print Archive

Directory of Open Access Journals

Towards privacy preserving cooperative cloud based intrusion detection systems

Author: Kothapalli Anirudh Mitreya
Publication venue
Publication date: 01/08/2020
Field of study

Les systèmes infonuagiques deviennent de plus en plus complexes, dynamiques et vulnérables aux attaques. Par conséquent, il est de plus en plus difficile pour qu'un seul système de détection d'intrusion (IDS) basé sur le cloud puisse repérer toutes les menaces, en raison des lacunes de connaissances sur les attaques et leurs conséquences. Les études récentes dans le domaine de la cybersécurité ont démontré qu'une coopération entre les IDS d'un nuage pouvait apporter une plus grande efficacité de détection dans des systèmes informatiques aussi complexes. Grâce à cette coopération, les IDS d'un nuage peuvent se connecter et partager leurs connaissances afin d'améliorer l'exactitude de la détection et obtenir des bénéfices communs. L'anonymat des données échangées par les IDS constitue un élément crucial de l'IDS coopérative. Un IDS malveillant pourrait obtenir des informations confidentielles d'autres IDS en faisant des conclusions à partir des données observées. Pour résoudre ce problème, nous proposons un nouveau système de protection de la vie privée pour les IDS en nuage. Plus particulièrement, nous concevons un système uniforme qui intègre des techniques de protection de la vie privée dans des IDS basés sur l'apprentissage automatique pour obtenir des IDS qui respectent les informations personnelles. Ainsi, l'IDS permet de cacher des informations possédant des données confidentielles et sensibles dans les données partagées tout en améliorant ou en conservant la précision de la détection. Nous avons mis en œuvre un système basé sur plusieurs techniques d'apprentissage automatique et de protection de la vie privée. Les résultats indiquent que les IDS qui ont été étudiés peuvent détecter les intrusions sans utiliser nécessairement les données initiales. Les résultats (c'est-à-dire qu'aucune diminution significative de la précision n'a été enregistrée) peuvent être obtenus en se servant des nouvelles données générées, analogues aux données de départ sur le plan sémantique, mais pas sur le plan synthétique.Cloud systems are becoming more sophisticated, dynamic, and vulnerable to attacks. Therefore, it's becoming increasingly difficult for a single cloud-based Intrusion Detection System (IDS) to detect all attacks, because of limited and incomplete knowledge about attacks and their implications. The recent works on cybersecurity have shown that a co-operation among cloud-based IDSs can bring higher detection accuracy in such complex computer systems. Through collaboration, cloud-based IDSs can consult and share knowledge with other IDSs to enhance detection accuracy and achieve mutual benefits. One fundamental barrier within cooperative IDS is the anonymity of the data the IDS exchanges. Malicious IDS can obtain sensitive information from other IDSs by inferring from the observed data. To address this problem, we propose a new framework for achieving a privacy-preserving cooperative cloud-based IDS. Specifically, we design a unified framework that integrates privacy-preserving techniques into machine learning-based IDSs to obtain privacy-aware cooperative IDS. Therefore, this allows IDS to hide private and sensitive information in the shared data while improving or maintaining detection accuracy. The proposed framework has been implemented by considering several machine learning and privacy-preserving techniques. The results suggest that the consulted IDSs can detect intrusions without the need to use the original data. The results (i.e., no records of significant degradation in accuracy) can be achieved using the newly generated data, similar to the original data semantically but not synthetically