18 research outputs found
An Automated Social Graph De-anonymization Technique
We present a generic and automated approach to re-identifying nodes in
anonymized social networks which enables novel anonymization techniques to be
quickly evaluated. It uses machine learning (decision forests) to matching
pairs of nodes in disparate anonymized sub-graphs. The technique uncovers
artefacts and invariants of any black-box anonymization scheme from a small set
of examples. Despite a high degree of automation, classification succeeds with
significant true positive rates even when small false positive rates are
sought. Our evaluation uses publicly available real world datasets to study the
performance of our approach against real-world anonymization strategies, namely
the schemes used to protect datasets of The Data for Development (D4D)
Challenge. We show that the technique is effective even when only small numbers
of samples are used for training. Further, since it detects weaknesses in the
black-box anonymization scheme it can re-identify nodes in one social network
when trained on another.Comment: 12 page
Quantification of De-anonymization Risks in Social Networks
The risks of publishing privacy-sensitive data have received considerable
attention recently. Several de-anonymization attacks have been proposed to
re-identify individuals even if data anonymization techniques were applied.
However, there is no theoretical quantification for relating the data utility
that is preserved by the anonymization techniques and the data vulnerability
against de-anonymization attacks.
In this paper, we theoretically analyze the de-anonymization attacks and
provide conditions on the utility of the anonymized data (denoted by anonymized
utility) to achieve successful de-anonymization. To the best of our knowledge,
this is the first work on quantifying the relationships between anonymized
utility and de-anonymization capability. Unlike previous work, our
quantification analysis requires no assumptions about the graph model, thus
providing a general theoretical guide for developing practical
de-anonymization/anonymization techniques.
Furthermore, we evaluate state-of-the-art de-anonymization attacks on a
real-world Facebook dataset to show the limitations of previous work. By
comparing these experimental results and the theoretically achievable
de-anonymization capability derived in our analysis, we further demonstrate the
ineffectiveness of previous de-anonymization attacks and the potential of more
powerful de-anonymization attacks in the future.Comment: Published in International Conference on Information Systems Security
and Privacy, 201
Using Metrics Suites to Improve the Measurement of Privacy in Graphs
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Social graphs are widely used in research (e.g., epidemiology) and business (e.g., recommender systems). However, sharing these graphs poses privacy risks because they contain sensitive information about individuals. Graph anonymization techniques aim to protect individual users in a graph, while graph de-anonymization aims to re-identify users. The effectiveness of anonymization and de-anonymization algorithms is usually evaluated with privacy metrics. However, it is unclear how strong existing privacy metrics are when they are used in graph privacy. In this paper, we study 26 privacy metrics for graph anonymization and de-anonymization and evaluate their strength in terms of three criteria: monotonicity indicates whether the metric indicates lower privacy for stronger adversaries; for within-scenario comparisons, evenness indicates whether metric values are spread evenly; and for between-scenario comparisons, shared value range indicates whether metrics use a consistent value range across scenarios. Our extensive experiments indicate that no single metric fulfills all three criteria perfectly. We therefore use methods from multi-criteria decision analysis to aggregate multiple metrics in a metrics suite, and we show that these metrics suites improve monotonicity compared to the best individual metric. This important result enables more monotonic, and thus more accurate, evaluations of new graph anonymization and de-anonymization algorithms
Gépi tanulási módszerek alkalmazása deanonimizálásra
Számos olyan adathalmaz áll a rendelkezésünkre, amelyek jelentős üzleti és kutatási potenciált hordoznak. Azonban – gondoljunk például a hordozható eszközök által gyűjtött egészségügyi adatokra – a hasznosítás mellett kiemelkedő kockázati tényező a privátszféra sérülése, amelynek elkerülésére többek között anonimizálási algoritmusokat alkalmaznak. Jelen tanulmányban az anonimizálás „visszafordítására” szakosodott algoritmusokat, az úgynevezett deanonimizációs eljárásokat, illetve azoknak egy speciális és újnak tekinthető szegmensét tekintjük át, amelyeknél gépi tanulási eljárásokat alkalmaznak a robusztusság, illetve a hatékonyság növelése érdekében. A tanulmányban a privátszféra-sértő üzleti célú támadások és a biztonsági alkalmazások hasonlóságára is rámutatunk: ugyanaz az algoritmus hogyan tud biztonsági indokkal a privátszférával szemben dolgozni, kontextustól függően
Graph matching by graph neural network
Graph matching or network alignment refers to the problem of matching two correlated graphs. This thesis presents a deep Q learning based method, which represents the matching process by a graph neural network. By breaking the symmetry, the parameterized graph neural network is able to capture a wide range of neighborhoods. Extensive experiments on various training and testing data have shown better performance, strong scalability and the ability to adapt to different domains
Quantifying Privacy Loss of Human Mobility Graph Topology
Human mobility is often represented as a mobility network, or graph, with nodes representing places of significance which an individual visits, such as their home, work, places of social amenity, etc., and edge weights corresponding to probability estimates of movements between these places. Previous research has shown that individuals can be identified by a small number of geolocated nodes in their mobility network, rendering mobility trace anonymization a hard task. In this paper we build on prior work and demonstrate that even when all location and timestamp information is removed from nodes, the graph topology of an individual mobility network itself is often uniquely identifying. Further, we observe that a mobility network is often unique, even when only a small number of the most popular nodes and edges are considered. We evaluate our approach using a large dataset of cell-tower location traces from 1 500 smartphone handsets with a mean duration of 430 days. We process the data to derive the top−N places visited by the device in the trace, and find that 93% of traces have a unique top−10 mobility network, and all traces are unique when considering top−15 mobility networks. Since mobility patterns, and therefore mobility networks for an individual, vary over time, we use graph kernel distance functions, to determine whether two mobility networks, taken at different points in time, represent the same individual. We then show that our distance metrics, while imperfect predictors, perform significantly better than a random strategy and therefore our approach represents a significant loss in privacy