51 research outputs found
Quantification of De-anonymization Risks in Social Networks
The risks of publishing privacy-sensitive data have received considerable
attention recently. Several de-anonymization attacks have been proposed to
re-identify individuals even if data anonymization techniques were applied.
However, there is no theoretical quantification for relating the data utility
that is preserved by the anonymization techniques and the data vulnerability
against de-anonymization attacks.
In this paper, we theoretically analyze the de-anonymization attacks and
provide conditions on the utility of the anonymized data (denoted by anonymized
utility) to achieve successful de-anonymization. To the best of our knowledge,
this is the first work on quantifying the relationships between anonymized
utility and de-anonymization capability. Unlike previous work, our
quantification analysis requires no assumptions about the graph model, thus
providing a general theoretical guide for developing practical
de-anonymization/anonymization techniques.
Furthermore, we evaluate state-of-the-art de-anonymization attacks on a
real-world Facebook dataset to show the limitations of previous work. By
comparing these experimental results and the theoretically achievable
de-anonymization capability derived in our analysis, we further demonstrate the
ineffectiveness of previous de-anonymization attacks and the potential of more
powerful de-anonymization attacks in the future.Comment: Published in International Conference on Information Systems Security
and Privacy, 201
Privacy-enhancing Aggregation of Internet of Things Data via Sensors Grouping
Big data collection practices using Internet of Things (IoT) pervasive
technologies are often privacy-intrusive and result in surveillance, profiling,
and discriminatory actions over citizens that in turn undermine the
participation of citizens to the development of sustainable smart cities.
Nevertheless, real-time data analytics and aggregate information from IoT
devices open up tremendous opportunities for managing smart city
infrastructures. The privacy-enhancing aggregation of distributed sensor data,
such as residential energy consumption or traffic information, is the research
focus of this paper. Citizens have the option to choose their privacy level by
reducing the quality of the shared data at a cost of a lower accuracy in data
analytics services. A baseline scenario is considered in which IoT sensor data
are shared directly with an untrustworthy central aggregator. A grouping
mechanism is introduced that improves privacy by sharing data aggregated first
at a group level compared as opposed to sharing data directly to the central
aggregator. Group-level aggregation obfuscates sensor data of individuals, in a
similar fashion as differential privacy and homomorphic encryption schemes,
thus inference of privacy-sensitive information from single sensors becomes
computationally harder compared to the baseline scenario. The proposed system
is evaluated using real-world data from two smart city pilot projects. Privacy
under grouping increases, while preserving the accuracy of the baseline
scenario. Intra-group influences of privacy by one group member on the other
ones are measured and fairness on privacy is found to be maximized between
group members with similar privacy choices. Several grouping strategies are
compared. Grouping by proximity of privacy choices provides the highest privacy
gains. The implications of the strategy on the design of incentives mechanisms
are discussed
Towards matching user mobility traces in large-scale datasets
The problem of unicity and reidentifiability of records in large-scale databases has been studied in different contexts and approaches, with focus on preserving privacy or matching records from different data sources. With an increasing number of service providers nowadays routinely collecting location traces of their users on unprecedented scales, there is a pronounced interest in the possibility of matching records and datasets based on spatial trajectories. Extending previous work on reidentifiability of spatial data and trajectory matching, we present the first large-scale analysis of user matchability in real mobility datasets on realistic scales, i.e. among two datasets that consist of several million people's mobility traces, coming from a mobile network operator and transportation smart card usage. We extract the relevant statistical properties which influence the matching process and analyze their impact on the matchability of users. We show that for individuals with typical activity in the transportation system (those making 3-4 trips per day on average), a matching algorithm based on the co-occurrence of their activities is expected to achieve a 16.8% success only after a one-week long observation of their mobility traces, and over 55% after four weeks. We show that the main determinant of matchability is the expected number of co-occurring records in the two datasets. Finally, we discuss different scenarios in terms of data collection frequency and give estimates of matchability over time. We show that with higher frequency data collection becoming more common, we can expect much higher success rates in even shorter intervals
Sanitization of Transportation Data: Policy Implications and Gaps
UC-ITS-2020-04Data about mobility provides information to improve city planning, identify traffic patterns, detect traffic jams, and route vehicles around them. This data often contains proprietary and personal information that companies and individuals do not wish others to know, for competitive and personal reasons. This sets up a paradox: the data needs to be analyzed, but it cannot be without revealing information that must be kept secret. A solution is to sanitize the data\u2014i.e., remove or suppress the sensitive information. The goal of sanitization is to protect sensitive information while enabling analyses of the data that will produce the same results as analyses of the unsanitized data. However, protecting information requires that sanitized data cannot be linked to data from other sources in a manner that leads to desanitization. This project reviews typical strategies used to sanitize datasets, the research on how some of these strategies are unsuccessful, and the questions that must be addressed to better understand the risks of desanitization
An Empirical Study on Android for Saving Non-shared Data on Public Storage
With millions of apps that can be downloaded from official or third-party
market, Android has become one of the most popular mobile platforms today.
These apps help people in all kinds of ways and thus have access to lots of
user's data that in general fall into three categories: sensitive data, data to
be shared with other apps, and non-sensitive data not to be shared with others.
For the first and second type of data, Android has provided very good storage
models: an app's private sensitive data are saved to its private folder that
can only be access by the app itself, and the data to be shared are saved to
public storage (either the external SD card or the emulated SD card area on
internal FLASH memory). But for the last type, i.e., an app's non-sensitive and
non-shared data, there is a big problem in Android's current storage model
which essentially encourages an app to save its non-sensitive data to shared
public storage that can be accessed by other apps. At first glance, it seems no
problem to do so, as those data are non-sensitive after all, but it implicitly
assumes that app developers could correctly identify all sensitive data and
prevent all possible information leakage from private-but-non-sensitive data.
In this paper, we will demonstrate that this is an invalid assumption with a
thorough survey on information leaks of those apps that had followed Android's
recommended storage model for non-sensitive data. Our studies showed that
highly sensitive information from billions of users can be easily hacked by
exploiting the mentioned problematic storage model. Although our empirical
studies are based on a limited set of apps, the identified problems are never
isolated or accidental bugs of those apps being investigated. On the contrary,
the problem is rooted from the vulnerable storage model recommended by Android.
To mitigate the threat, we also propose a defense framework
Using Metrics Suites to Improve the Measurement of Privacy in Graphs
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Social graphs are widely used in research (e.g., epidemiology) and business (e.g., recommender systems). However, sharing these graphs poses privacy risks because they contain sensitive information about individuals. Graph anonymization techniques aim to protect individual users in a graph, while graph de-anonymization aims to re-identify users. The effectiveness of anonymization and de-anonymization algorithms is usually evaluated with privacy metrics. However, it is unclear how strong existing privacy metrics are when they are used in graph privacy. In this paper, we study 26 privacy metrics for graph anonymization and de-anonymization and evaluate their strength in terms of three criteria: monotonicity indicates whether the metric indicates lower privacy for stronger adversaries; for within-scenario comparisons, evenness indicates whether metric values are spread evenly; and for between-scenario comparisons, shared value range indicates whether metrics use a consistent value range across scenarios. Our extensive experiments indicate that no single metric fulfills all three criteria perfectly. We therefore use methods from multi-criteria decision analysis to aggregate multiple metrics in a metrics suite, and we show that these metrics suites improve monotonicity compared to the best individual metric. This important result enables more monotonic, and thus more accurate, evaluations of new graph anonymization and de-anonymization algorithms
- …