463 research outputs found
Trajectory and Policy Aware Sender Anonymity in Location Based Services
We consider Location-based Service (LBS) settings, where a LBS provider logs
the requests sent by mobile device users over a period of time and later wants
to publish/share these logs. Log sharing can be extremely valuable for
advertising, data mining research and network management, but it poses a
serious threat to the privacy of LBS users. Sender anonymity solutions prevent
a malicious attacker from inferring the interests of LBS users by associating
them with their service requests after gaining access to the anonymized logs.
With the fast-increasing adoption of smartphones and the concern that historic
user trajectories are becoming more accessible, it becomes necessary for any
sender anonymity solution to protect against attackers that are
trajectory-aware (i.e. have access to historic user trajectories) as well as
policy-aware (i.e they know the log anonymization policy). We call such
attackers TP-aware.
This paper introduces a first privacy guarantee against TP-aware attackers,
called TP-aware sender k-anonymity. It turns out that there are many possible
TP-aware anonymizations for the same LBS log, each with a different utility to
the consumer of the anonymized log. The problem of finding the optimal TP-aware
anonymization is investigated. We show that trajectory-awareness renders the
problem computationally harder than the trajectory-unaware variants found in
the literature (NP-complete in the size of the log, versus PTIME). We describe
a PTIME l-approximation algorithm for trajectories of length l and empirically
show that it scales to large LBS logs (up to 2 million users)
Generating realistic scaled complex networks
Research on generative models is a central project in the emerging field of
network science, and it studies how statistical patterns found in real networks
could be generated by formal rules. Output from these generative models is then
the basis for designing and evaluating computational methods on networks, and
for verification and simulation studies. During the last two decades, a variety
of models has been proposed with an ultimate goal of achieving comprehensive
realism for the generated networks. In this study, we (a) introduce a new
generator, termed ReCoN; (b) explore how ReCoN and some existing models can be
fitted to an original network to produce a structurally similar replica, (c)
use ReCoN to produce networks much larger than the original exemplar, and
finally (d) discuss open problems and promising research directions. In a
comparative experimental study, we find that ReCoN is often superior to many
other state-of-the-art network generation methods. We argue that ReCoN is a
scalable and effective tool for modeling a given network while preserving
important properties at both micro- and macroscopic scales, and for scaling the
exemplar data by orders of magnitude in size.Comment: 26 pages, 13 figures, extended version, a preliminary version of the
paper was presented at the 5th International Workshop on Complex Networks and
their Application
Towards trajectory anonymization: a generalization-based approach
Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a novel generalization-based approach for anonymization of trajectories. We further show that releasing
anonymized trajectories may still have some privacy leaks. Therefore we propose a randomization based reconstruction algorithm for releasing anonymized trajectory data and also present how the underlying techniques can be adapted to other anonymity standards. The experimental results on real and synthetic trajectory datasets show the effectiveness of the proposed techniques
A data recipient centered de-identification method to retain statistical attributes
AbstractPrivacy has always been a great concern of patients and medical service providers. As a result of the recent advances in information technology and the government’s push for the use of Electronic Health Record (EHR) systems, a large amount of medical data is collected and stored electronically. This data needs to be made available for analysis but at the same time patient privacy has to be protected through de-identification. Although biomedical researchers often describe their research plans when they request anonymized data, most existing anonymization methods do not use this information when de-identifying the data. As a result, the anonymized data may not be useful for the planned research project. This paper proposes a data recipient centered approach to tailor the de-identification method based on input from the recipient of the data. We demonstrate our approach through an anonymization project for biomedical researchers with specific goals to improve the utility of the anonymized data for statistical models used for their research project. The selected algorithm improves a privacy protection method called Condensation by Aggarwal et al. Our methods were tested and validated on real cancer surveillance data provided by the Kentucky Cancer Registry
You are your Metadata: Identification and Obfuscation of Social Media Users using Metadata Information
Metadata are associated to most of the information we produce in our daily
interactions and communication in the digital world. Yet, surprisingly,
metadata are often still catergorized as non-sensitive. Indeed, in the past,
researchers and practitioners have mainly focused on the problem of the
identification of a user from the content of a message.
In this paper, we use Twitter as a case study to quantify the uniqueness of
the association between metadata and user identity and to understand the
effectiveness of potential obfuscation strategies. More specifically, we
analyze atomic fields in the metadata and systematically combine them in an
effort to classify new tweets as belonging to an account using different
machine learning algorithms of increasing complexity. We demonstrate that
through the application of a supervised learning algorithm, we are able to
identify any user in a group of 10,000 with approximately 96.7% accuracy.
Moreover, if we broaden the scope of our search and consider the 10 most likely
candidates we increase the accuracy of the model to 99.22%. We also found that
data obfuscation is hard and ineffective for this type of data: even after
perturbing 60% of the training data, it is still possible to classify users
with an accuracy higher than 95%. These results have strong implications in
terms of the design of metadata obfuscation strategies, for example for data
set release, not only for Twitter, but, more generally, for most social media
platforms.Comment: 11 pages, 13 figures. Published in the Proceedings of the 12th
International AAAI Conference on Web and Social Media (ICWSM 2018). June
2018. Stanford, CA, US
Location cloaking for location privacy protection and location safety protection
Many applications today rely on location information, yet disclosing such information can present heightened privacy and safety risks. A person\u27s whereabouts, for example, may reveal sensitive private information such as health condition and lifestyle. Location information also has the potential to allow an adversary to physically locate and destroy a subject, which is particularly concerned in digital battlefields.
This research investigates two problems. The first one is location privacy protection in location-based services. Our goal is to provide a desired level of guarantee that the location data collected by the service providers cannot be correlated with restricted spaces such as home and office to derive who\u27s where at what time. We propose 1) leveraging historical location samples for location depersonalization and 2) allowing a user to express her location privacy requirement by identifying a spatial region. With these two ideas in place, we develop a suite of techniques for location-privacy aware uses of location-based services, which can be either sporadic or continuous. An experimental system has been implemented with these techniques. The second problem investigated in this research is location safety protection in ad hoc networks. Unlike location privacy intrusion, the adversary here is not interested in finding the individual identities of the nodes in a spatial region, but simply wants to locate and destroy them. We define the safety level of a spatial region as the inverse of its node density and develop a suite of techniques for location safety-aware cloaking and routing. These schemes allow nodes to disclose their location as accurately as possible, while preventing such information from being used to identify any region with a safety level lower than a required threshold. The performance of the proposed techniques is evaluated through analysis and simulation
- …