Search CORE

157 research outputs found

Of Wines and Reviews: Measuring and Modeling the Vivino Wine Social Network

Author: De Cristofaro Emiliano
De Cristofaro Paolo
Kotonya Neema
Publication venue
Publication date: 13/09/2018
Field of study

This paper presents an analysis of social experiences around wine consumption through the lens of Vivino, a social network for wine enthusiasts with over 26 million users worldwide. We compare users' perceptions of various wine types and regional styles across both New and Old World wines, examining them across price ranges, vintages, regions, varietals, and blends. Among other things, we find that ratings provided by Vivino users are not biased by cost. We then study how wine characteristics, language in wine reviews, and the distribution of wine ratings can be combined to develop prediction models. More specifically, we model user behavior to develop a regression model for predicting wine ratings, and a classifier for determining user review preferences.Comment: A preliminary version of this paper appears in the Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2018). This is the full versio

arXiv.org e-Print Archive

Crossref

UCL Discovery

Danger is My Middle Name: Experimenting with SSL Vulnerabilities in Android Apps

Author: De Cristofaro Emiliano
Onwuzurike Lucky
Publication venue
Publication date: 04/05/2015
Field of study

This paper presents a measurement study of information leakage and SSL vulnerabilities in popular Android apps. We perform static and dynamic analysis on 100 apps, downloaded at least 10M times, that request full network access. Our experiments show that, although prior work has drawn a lot of attention to SSL implementations on mobile platforms, several popular apps (32/100) accept all certificates and all hostnames, and four actually transmit sensitive data unencrypted. We set up an experimental testbed simulating man-in-the-middle attacks and find that many apps (up to 91% when the adversary has a certificate installed on the victim's device) are vulnerable, allowing the attacker to access sensitive information, including credentials, files, personal details, and credit card numbers. Finally, we provide a few recommendations to app developers and highlight several open research problems.Comment: A preliminary version of this paper appears in the Proceedings of ACM WiSec 2015. This is the full versio

arXiv.org e-Print Archive

UCL Discovery

Privacy-Preserving Genetic Relatedness Test

Author: De Cristofaro Emiliano
Liang Kaitai
Zhang Yuruo
Publication venue
Publication date: 09/11/2016
Field of study

An increasing number of individuals are turning to Direct-To-Consumer (DTC) genetic testing to learn about their predisposition to diseases, traits, and/or ancestry. DTC companies like 23andme and Ancestry.com have started to offer popular and affordable ancestry and genealogy tests, with services allowing users to find unknown relatives and long-distant cousins. Naturally, access and possible dissemination of genetic data prompts serious privacy concerns, thus motivating the need to design efficient primitives supporting private genetic tests. In this paper, we present an effective protocol for privacy-preserving genetic relatedness test (PPGRT), enabling a cloud server to run relatedness tests on input an encrypted genetic database and a test facility's encrypted genetic sample. We reduce the test to a data matching problem and perform it, privately, using searchable encryption. Finally, a performance evaluation of hamming distance based PP-GRT attests to the practicality of our proposals.Comment: A preliminary version of this paper appears in the Proceedings of the 3rd International Workshop on Genome Privacy and Security (GenoPri'16

arXiv.org e-Print Archive

UCL Discovery

Measuring Membership Privacy on Aggregate Location Time-Series

Author: De Cristofaro Emiliano
Pyrgelis Apostolos
Troncoso Carmela
Publication venue
Publication date: 27/04/2020
Field of study

While location data is extremely valuable for various applications, disclosing it prompts serious threats to individuals' privacy. To limit such concerns, organizations often provide analysts with aggregate time-series that indicate, e.g., how many people are in a location at a time interval, rather than raw individual traces. In this paper, we perform a measurement study to understand Membership Inference Attacks (MIAs) on aggregate location time-series, where an adversary tries to infer whether a specific user contributed to the aggregates. We find that the volume of contributed data, as well as the regularity and particularity of users' mobility patterns, play a crucial role in the attack's success. We experiment with a wide range of defenses based on generalization, hiding, and perturbation, and evaluate their ability to thwart the attack vis-a-vis the utility loss they introduce for various mobility analytics tasks. Our results show that some defenses fail across the board, while others work for specific tasks on aggregate location time-series. For instance, suppressing small counts can be used for ranking hotspots, data generalization for forecasting traffic, hotspot discovery, and map inference, while sampling is effective for location labeling and anomaly detection when the dataset is sparse. Differentially private techniques provide reasonable accuracy only in very specific settings, e.g., discovering hotspots and forecasting their traffic, and more so when using weaker privacy notions like crowd-blending privacy. Overall, our measurements show that there does not exist a unique generic defense that can preserve the utility of the analytics for arbitrary applications, and provide useful insights regarding the disclosure of sanitized aggregate location time-series

arXiv.org e-Print Archive

Crossref

UCL Discovery

Systematizing Genome Privacy Research: A Privacy-Enhancing Technologies Perspective

Author: De Cristofaro Emiliano
Malin Bradley
Mittos Alexandros
Publication venue
Publication date: 17/08/2018
Field of study

Rapid advances in human genomics are enabling researchers to gain a better understanding of the role of the genome in our health and well-being, stimulating hope for more effective and cost efficient healthcare. However, this also prompts a number of security and privacy concerns stemming from the distinctive characteristics of genomic data. To address them, a new research community has emerged and produced a large number of publications and initiatives. In this paper, we rely on a structured methodology to contextualize and provide a critical analysis of the current knowledge on privacy-enhancing technologies used for testing, storing, and sharing genomic data, using a representative sample of the work published in the past decade. We identify and discuss limitations, technical challenges, and issues faced by the community, focusing in particular on those that are inherently tied to the nature of the problem and are harder for the community alone to address. Finally, we report on the importance and difficulty of the identified challenges based on an online survey of genome data privacy expertsComment: To appear in the Proceedings on Privacy Enhancing Technologies (PoPETs), Vol. 2019, Issue

arXiv.org e-Print Archive

UCL Discovery

Privacy-Friendly Mobility Analytics using Aggregate Location Data

Author: De Cristofaro Emiliano
Pyrgelis Apostolos
Ross Gordon
Publication venue
Publication date: 09/10/2016
Field of study

Location data can be extremely useful to study commuting patterns and disruptions, as well as to predict real-time traffic volumes. At the same time, however, the fine-grained collection of user locations raises serious privacy concerns, as this can reveal sensitive information about the users, such as, life style, political and religious inclinations, or even identities. In this paper, we study the feasibility of crowd-sourced mobility analytics over aggregate location information: users periodically report their location, using a privacy-preserving aggregation protocol, so that the server can only recover aggregates -- i.e., how many, but not which, users are in a region at a given time. We experiment with real-world mobility datasets obtained from the Transport For London authority and the San Francisco Cabs network, and present a novel methodology based on time series modeling that is geared to forecast traffic volumes in regions of interest and to detect mobility anomalies in them. In the presence of anomalies, we also make enhanced traffic volume predictions by feeding our model with additional information from correlated regions. Finally, we present and evaluate a mobile app prototype, called Mobility Data Donors (MDD), in terms of computation, communication, and energy overhead, demonstrating the real-world deployability of our techniques.Comment: Published at ACM SIGSPATIAL 201

arXiv.org e-Print Archive

UCL Discovery

On Collaborative Predictive Blacklisting

Author: De Cristofaro Emiliano
Melis Luca
Pyrgelis Apostolos
Publication venue
Publication date: 01/10/2018
Field of study

Collaborative predictive blacklisting (CPB) allows to forecast future attack sources based on logs and alerts contributed by multiple organizations. Unfortunately, however, research on CPB has only focused on increasing the number of predicted attacks but has not considered the impact on false positives and false negatives. Moreover, sharing alerts is often hindered by confidentiality, trust, and liability issues, which motivates the need for privacy-preserving approaches to the problem. In this paper, we present a measurement study of state-of-the-art CPB techniques, aiming to shed light on the actual impact of collaboration. To this end, we reproduce and measure two systems: a non privacy-friendly one that uses a trusted coordinating party with access to all alerts (Soldo et al., 2010) and a peer-to-peer one using privacy-preserving data sharing (Freudiger et al., 2015). We show that, while collaboration boosts the number of predicted attacks, it also yields high false positives, ultimately leading to poor accuracy. This motivates us to present a hybrid approach, using a semi-trusted central entity, aiming to increase utility from collaboration while, at the same time, limiting information disclosure and false positives. This leads to a better trade-off of true and false positive rates, while at the same time addressing privacy concerns.Comment: A preliminary version of this paper appears in ACM SIGCOMM's Computer Communication Review (Volume 48 Issue 5, October 2018). This is the full versio

arXiv.org e-Print Archive

UCL Discovery