Search CORE

2,273 research outputs found

Quantifying Privacy Loss of Human Mobility Graph Topology

Author: Beresford AR
Chan D
Manousakas D
Mascolo C
Sharma N
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/06/2018
Field of study

Human mobility is often represented as a mobility network, or graph, with nodes representing places of significance which an individual visits, such as their home, work, places of social amenity, etc., and edge weights corresponding to probability estimates of movements between these places. Previous research has shown that individuals can be identified by a small number of geolocated nodes in their mobility network, rendering mobility trace anonymization a hard task. In this paper we build on prior work and demonstrate that even when all location and timestamp information is removed from nodes, the graph topology of an individual mobility network itself is often uniquely identifying. Further, we observe that a mobility network is often unique, even when only a small number of the most popular nodes and edges are considered. We evaluate our approach using a large dataset of cell-tower location traces from 1 500 smartphone handsets with a mean duration of 430 days. We process the data to derive the top−N places visited by the device in the trace, and find that 93% of traces have a unique top−10 mobility network, and all traces are unique when considering top−15 mobility networks. Since mobility patterns, and therefore mobility networks for an individual, vary over time, we use graph kernel distance functions, to determine whether two mobility networks, taken at different points in time, represent the same individual. We then show that our distance metrics, while imperfect predictors, perform significantly better than a random strategy and therefore our approach represents a significant loss in privacy

UCL Discovery

Quantifying Privacy Loss of Human Mobility Graph Topology

Author: Beresford Alastair R
Chan Dennis
Manousakas Dionysis
Mascolo Cecilia
Sharma Nikhil
Publication venue: Proceedings on Privacy Enhancing Technologies
Publication date: 28/04/2018
Field of study

Abstract Human mobility is often represented as a mobility network, or graph, with nodes representing places of significance which an individual visits, such as their home, work, places of social amenity, etc., and edge weights corresponding to probability estimates of movements between these places. Previous research has shown that individuals can be identified by a small number of geolocated nodes in their mobility network, rendering mobility trace anonymization a hard task. In this paper we build on prior work and demonstrate that even when all location and timestamp information is removed from nodes, the graph topology of an individual mobility network itself is often uniquely identifying. Further, we observe that a mobility network is often unique, even when only a small number of the most popular nodes and edges are considered. We evaluate our approach using a large dataset of cell-tower location traces from 1 500 smartphone handsets with a mean duration of 430 days. We process the data to derive the top−N places visited by the device in the trace, and find that 93% of traces have a unique top−10 mobility network, and all traces are unique when considering top−15 mobility networks. Since mobility patterns, and therefore mobility networks for an individual, vary over time, we use graph kernel distance functions, to determine whether two mobility networks, taken at different points in time, represent the same individual. We then show that our distance metrics, while imperfect predictors, perform significantly better than a random strategy and therefore our approach represents a significant loss in privacy.</jats:p

Crossref

Directory of Open Access Journals

UCL Discovery

Apollo (Cambridge)

Recommended from our members

Data Summarizations for Scalable, Robust and Privacy-Aware Learning in High Dimensions

Author: Manousakas Dionysios
Publication venue: University of Cambridge
Publication date: 30/10/2021
Field of study

The advent of large-scale datasets has offered unprecedented amounts of information for building statistically powerful machines, but, at the same time, also introduced a remarkable computational challenge: how can we efficiently process massive data? This thesis presents a suite of data reduction methods that make learning algorithms scale on large datasets, via extracting a succinct model-specific representation that summarizes the full data collection—a coreset. Our frameworks support by design datasets of arbitrary dimensionality, and can be used for general purpose Bayesian inference under real-world constraints, including privacy preservation and robustness to outliers, encompassing diverse uncertainty-aware data analysis tasks, such as density estimation, classification and regression. We motivate the necessity for novel data reduction techniques in the first place by developing a reidentification attack on coarsened representations of private behavioural data. Analysing longitudinal records of human mobility, we detect privacy-revealing structural patterns, that remain preserved in reduced graph representations of individuals’ information with manageable size. These unique patterns enable mounting linkage attacks via structural similarity computations on longitudinal mobility traces, revealing an overlooked, yet existing, privacy threat. We then propose a scalable variational inference scheme for approximating posteriors on large datasets via learnable weighted pseudodata, termed pseudocoresets. We show that the use of pseudodata enables overcoming the constraints on minimum summary size for given approximation quality, that are imposed on all existing Bayesian coreset constructions due to data dimensionality. Moreover, it allows us to develop a scheme for pseudocoresets-based summarization that satisfies the standard framework of differential privacy by construction; in this way, we can release reduced size privacy-preserving representations for sensitive datasets that are amenable to arbitrary post-processing. Subsequently, we consider summarizations for large-scale Bayesian inference in scenarios when observed datapoints depart from the statistical assumptions of our model. Using robust divergences, we develop a method for constructing coresets resilient to model misspecification. Crucially, this method is able to automatically discard outliers from the generated data summaries. Thus we deliver robustified scalable representations for inference, that are suitable for applications involving contaminated and unreliable data sources. We demonstrate the performance of proposed summarization techniques on multiple parametric statistical models, and diverse simulated and real-world datasets, from music genre features to hospital readmission records, considering a wide range of data dimensionalities.Nokia Bell Labs, Lundgren Fund, Darwin College, University of Cambridge Department of Computer Science & Technology, University of Cambridg

Apollo (Cambridge)

Influence of tracking duration on the privacy of individual mobility graphs

Author: Hong Y
Martin H
Suel E
Wiedemann N
Xin Y
Publication venue: 'Informa UK Limited'
Publication date: 27/07/2023
Field of study

Location graphs, compact representations of human mobility without geocoordinates, can be used to personalise location-based services. While they are more privacy-preserving than raw tracking data, it was shown that they still hold a considerable risk for users to be re-identified solely by the graph topology. However, it is unclear how this risk depends on the tracking duration. Here, we consider a scenario where the attacker wants to match the new tracking data of a user to a pool of previously recorded mobility profiles, and we analyse the dependence of the re-identification performance on the tracking duration. We find that the re-identification accuracy varies between 0.41% and 20.97% and is affected by both the pool duration and the test-user tracking duration, it is greater if both have the same duration, and it is not significantly affected by socio-demographics such as age or gender, but can to some extent be explained by different mobility and graph features. Overall, the influence of tracking duration on user privacy has clear implications for data collection and storage strategies. We advise data collectors to limit the tracking duration or to reset user IDs regularly when storing long-term tracking data

UCL Discovery

Where you go is who you are -- A study on machine learning based semantic privacy attacks

Author: Janowicz Krzysztof
Kounadi Ourania
Raubal Martin
Wiedemann Nina
Publication venue
Publication date: 26/10/2023
Field of study

Concerns about data privacy are omnipresent, given the increasing usage of digital applications and their underlying business model that includes selling user data. Location data is particularly sensitive since they allow us to infer activity patterns and interests of users, e.g., by categorizing visited locations based on nearby points of interest (POI). On top of that, machine learning methods provide new powerful tools to interpret big data. In light of these considerations, we raise the following question: What is the actual risk that realistic, machine learning based privacy attacks can obtain meaningful semantic information from raw location data, subject to inaccuracies in the data? In response, we present a systematic analysis of two attack scenarios, namely location categorization and user profiling. Experiments on the Foursquare dataset and tracking data demonstrate the potential for abuse of high-quality spatial information, leading to a significant privacy loss even with location inaccuracy of up to 200m. With location obfuscation of more than 1 km, spatial information hardly adds any value, but a high privacy risk solely from temporal information remains. The availability of public context data such as POIs plays a key role in inference based on spatial information. Our findings point out the risks of ever-growing databases of tracking data and spatial context data, which policymakers should consider for privacy regulations, and which could guide individuals in their personal location protection measures

arXiv.org e-Print Archive

A survey of results on mobile phone datasets analysis

Author: A Amini
A Bogomolov
A Bogomolov
A Bogomolov
A Clauset
A Kuusik
A Narayanan
A Noulas
A Stopczynski
A Wesolowski
AA Nanavati
AL Barabási
AL Barabási
AL Barabási
B Csáji
C Cortes
C Herrera-Yagüe
C Ratti
C Ratti
C Smith-Clarke
C Song
C Song
CA Hidalgo
CO Buckee
D Grady
D Lazer
D Liben-Nowell
D Naboulsi
D Quercia
D Wang
DJ Mir
DJ Watts
DJ Watts
E Carolan
E Ferrara
E Frias-Martinez
E Katz
ED Fitkov-Norris
EU
F Baccelli
F Calabrese
F Calabrese
F Calabrese
F Calabrese
F Calabrese
F Manfredini
F Peruani
F Simini
FHZ Xavier
FHZ Xavier
G Ghoshal
G Kossinets
G Krings
G Krings
G Krings
G Miritello
G Miritello
G Miritello
G Miritello
G Palla
G Ranjan
G Tibély
GK Zipf
H Mao
H Risselada
H Sterly
H Zang
H Zhang
H-H Jo
H-H Jo
H-H Jo
I Trestian
J Abello
J Candia
J Karikoski
J Karikoski
J McInerney
J Park
J Reades
J Reades
J Saramäki
J Steenbruggen
J Wiese
J-P Onnela
JE Blumenstock
JE Blumenstock
JE Blumenstock
JL Toole
JP Bagrow
JP Bagrow
JP Onnela
JP Onnela
JP Onnela
JP Onnela
K Dasgupta
K Kianmehr
K Yu
KS Xu
KS Xu
L Backstrom
L Gao
L Kovanen
L Kovanen
L Kovanen
L Sweeney
L Sweeney
L Tabourier
M Barthélemy
M Berlingerio
M Cebrian
M Karsai
M Karsai
M Karsai
M Karsai
M Kivelä
M Martino
M Nanni
M Pielot
M Rosvall
M Schläpfer
M Seshadri
M Tizzoni
M-X Li
MC González
MEJ Newman
MEJ Newman
MS Granovetter
N Aharony
N Du
N Eagle
N Eagle
N Eagle
N Eagle
N Eagle
N Eagle
O Bucicovschi
P Deville
P Expert
P Holme
P Wang
P Wang
P Wang
PJ Mucha
R Kwok
R Lambiotte
R Ling
R Trasarti
RD Malmgren
S Catanese
S Gambs
S Hill
S Isaacman
S Isaacman
S Isaacman
S Jiang
S Kirkpatrick
S Landau
S Motahari
SY Hung
T Aynaud
T Dierkes
T Louail
T Raeder
V Angelakis
V Blondel
V Blondel
V Frias-Martinez
V Frias-Martinez
V Frias-Martinez
V Frias-Martinez
V Palchykov
V Salnikov
V-P Backlund
VD Blondel
VD Blondel
W Aiello
X Lu
Y Altshuler
Y Kim
Y Kryvasheyeu
Y Richter
Y Song
Y Wu
YA Montjoye de
YY Ahn
YY Liu
Z Huang
Z Smoreda
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref