1,449 research outputs found
Location Anonymization With Considering Errors and Existence Probability
Mobile devices that can sense their location using GPS or Wi-Fi have become extremely popular. However, many users hesitate to provide their accurate location information to unreliable third parties if it means that their identities or sensitive attribute values will be disclosed by doing so. Many approaches for anonymization, such as k-anonymity, have been proposed to tackle this issue. Existing studies for k-anonymity usually anonymize each user\u27s location so that the anonymized area contains k or more users. Existing studies, however, do not consider location errors and the probability that each user actually exists at the anonymized area. As a result, a specific user might be identified by untrusted third parties. We propose novel privacy and utility metrics that can treat the location and an efficient algorithm to anonymize the information associated with users\u27 locations. This is the first work that anonymizes location while considering location errors and the probability that each user is actually present at the anonymized area. By means of simulations, we have proven that our proposed method can reduce the risk of the user\u27s attributes being identified while maintaining the utility of the anonymized data
Assessing Data Usefulness for Failure Analysis in Anonymized System Logs
System logs are a valuable source of information for the analysis and
understanding of systems behavior for the purpose of improving their
performance. Such logs contain various types of information, including
sensitive information. Information deemed sensitive can either directly be
extracted from system log entries by correlation of several log entries, or can
be inferred from the combination of the (non-sensitive) information contained
within system logs with other logs and/or additional datasets. The analysis of
system logs containing sensitive information compromises data privacy.
Therefore, various anonymization techniques, such as generalization and
suppression have been employed, over the years, by data and computing centers
to protect the privacy of their users, their data, and the system as a whole.
Privacy-preserving data resulting from anonymization via generalization and
suppression may lead to significantly decreased data usefulness, thus,
hindering the intended analysis for understanding the system behavior.
Maintaining a balance between data usefulness and privacy preservation,
therefore, remains an open and important challenge. Irreversible encoding of
system logs using collision-resistant hashing algorithms, such as SHAKE-128, is
a novel approach previously introduced by the authors to mitigate data privacy
concerns. The present work describes a study of the applicability of the
encoding approach from earlier work on the system logs of a production high
performance computing system. Moreover, a metric is introduced to assess the
data usefulness of the anonymized system logs to detect and identify the
failures encountered in the system.Comment: 11 pages, 3 figures, submitted to 17th IEEE International Symposium
on Parallel and Distributed Computin
Privacy Preservation by Disassociation
In this work, we focus on protection against identity disclosure in the
publication of sparse multidimensional data. Existing multidimensional
anonymization techniquesa) protect the privacy of users either by altering the
set of quasi-identifiers of the original data (e.g., by generalization or
suppression) or by adding noise (e.g., using differential privacy) and/or (b)
assume a clear distinction between sensitive and non-sensitive information and
sever the possible linkage. In many real world applications the above
techniques are not applicable. For instance, consider web search query logs.
Suppressing or generalizing anonymization methods would remove the most
valuable information in the dataset: the original query terms. Additionally,
web search query logs contain millions of query terms which cannot be
categorized as sensitive or non-sensitive since a term may be sensitive for a
user and non-sensitive for another. Motivated by this observation, we propose
an anonymization technique termed disassociation that preserves the original
terms but hides the fact that two or more different terms appear in the same
record. We protect the users' privacy by disassociating record terms that
participate in identifying combinations. This way the adversary cannot
associate with high probability a record with a rare combination of terms. To
the best of our knowledge, our proposal is the first to employ such a technique
to provide protection against identity disclosure. We propose an anonymization
algorithm based on our approach and evaluate its performance on real and
synthetic datasets, comparing it against other state-of-the-art methods based
on generalization and differential privacy.Comment: VLDB201
Publishing Microdata with a Robust Privacy Guarantee
Today, the publication of microdata poses a privacy threat. Vast research has
striven to define the privacy condition that microdata should satisfy before it
is released, and devise algorithms to anonymize the data so as to achieve this
condition. Yet, no method proposed to date explicitly bounds the percentage of
information an adversary gains after seeing the published data for each
sensitive value therein. This paper introduces beta-likeness, an appropriately
robust privacy model for microdata anonymization, along with two anonymization
schemes designed therefor, the one based on generalization, and the other based
on perturbation. Our model postulates that an adversary's confidence on the
likelihood of a certain sensitive-attribute (SA) value should not increase, in
relative difference terms, by more than a predefined threshold. Our techniques
aim to satisfy a given beta threshold with little information loss. We
experimentally demonstrate that (i) our model provides an effective privacy
guarantee in a way that predecessor models cannot, (ii) our generalization
scheme is more effective and efficient in its task than methods adapting
algorithms for the k-anonymity model, and (iii) our perturbation method
outperforms a baseline approach. Moreover, we discuss in detail the resistance
of our model and methods to attacks proposed in previous research.Comment: VLDB201
Privacy Preserving Attribute-Focused Anonymization Scheme for Healthcare Data Publishing
Advancements in Industry 4.0 brought tremendous improvements in the healthcare sector, such as better quality of treatment, enhanced communication, remote monitoring, and reduced cost. Sharing healthcare data with healthcare providers is crucial for harnessing the benefits of such improvements. In general, healthcare data holds sensitive information about individuals. Hence, sharing such data is challenging because of various security and privacy issues. According to privacy regulations and ethical requirements, it is essential to preserve the privacy of patients before sharing data for medical research. State-of-the-art literature on privacy preserving studies either uses cryptographic approaches to protect the privacy or uses anonymizing techniques regardless of the type of attributes, this results in poor protection and data utility. In this paper, we propose an attribute-focused privacy preserving data publishing scheme. The proposed scheme is two-fold, comprising a fixed-interval approach to protect numerical attributes and an improved l -diverse slicing approach to protect the categorical and sensitive attributes. In the fixed-interval approach, the original values of the healthcare data are replaced with an equivalent computed value. The improved l -diverse slicing approach partitions the data both horizontally and vertically to avoid privacy leaks. Extensive experiments with real-world datasets are conducted to evaluate the performance of the proposed scheme. The classification models built on anonymized dataset yields approximately 13% better accuracy than benchmarked algorithms. Experimental analyses show that the average information loss which is measured by normalized certainty penalty (NCP) is reduced by 12% compared to similar approaches. The attribute focused scheme not only provides data utility but also prevents the data from membership disclosures, attribute disclosures, and identity disclosures
A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions
In recent decades, social network anonymization has become a crucial research
field due to its pivotal role in preserving users' privacy. However, the high
diversity of approaches introduced in relevant studies poses a challenge to
gaining a profound understanding of the field. In response to this, the current
study presents an exhaustive and well-structured bibliometric analysis of the
social network anonymization field. To begin our research, related studies from
the period of 2007-2022 were collected from the Scopus Database then
pre-processed. Following this, the VOSviewer was used to visualize the network
of authors' keywords. Subsequently, extensive statistical and network analyses
were performed to identify the most prominent keywords and trending topics.
Additionally, the application of co-word analysis through SciMAT and the
Alluvial diagram allowed us to explore the themes of social network
anonymization and scrutinize their evolution over time. These analyses
culminated in an innovative taxonomy of the existing approaches and
anticipation of potential trends in this domain. To the best of our knowledge,
this is the first bibliometric analysis in the social network anonymization
field, which offers a deeper understanding of the current state and an
insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure
- …