341 research outputs found
Privacy-preserving publishing of hierarchical data
Many applications today rely on storage and management of semi-structured information, for example, XML databases and document-oriented databases. These data often have to be shared with untrusted third parties, which makes individuals’ privacy a fundamental problem. In this article, we propose anonymization techniques for privacy-preserving publishing of hierarchical data. We show that the problem of anonymizing hierarchical data poses unique challenges that cannot be readily solved by existing mechanisms. We extend two standards for privacy protection in tabular data (k-anonymity and ℓ-diversity) and apply them to hierarchical data. We present utility-aware algorithms that enforce these definitions of privacy using generalizations and suppressions of data values. To evaluate our algorithms and their heuristics, we experiment on synthetic and real datasets obtained from two universities. Our experiments show that we significantly outperform related methods that provide comparable privacy guarantees
Utility-Based Privacy Preserving Data Publishing
Advances in data collection techniques and need for automation triggered in proliferation of a huge amount of data. This exponential increase in the collection of
personal information has for some time represented a serious threat to privacy. With the advancement of technologies for data storage, data mining, machine learning, social networking and cloud computing, the problem is further fueled. Privacy is a fundamental
right of every human being and needs to be preserved. As a counterbalance to the socio-technical transformations, most nations have both general policies on preserving
privacy and specic legislation to control access to and use of data. Privacy preserving data publishing is the ability to control the dissemination and use of one's personal
information.
Mere publishing (or sharing) of original data in raw form results in identity disclosure with linkage attacks. To overcome linkage attacks, the techniques of statistical disclosure control are employed. One such approach is k-anonymity that reduce data across a set of key variables to a set of classes. In a k-anonymized dataset each record is indistinguishable from at least k-1 others, meaning that an attacker cannot link the data records to population units with certainty thus reducing the probability of disclosure. Algorithms that have been proposed to enforce k-anonymity are Samarati's algorithm and Sweeney's
Datafly algorithm. Both of these algorithms adhere to full domain generalization with global recording. These methods have a tradeo between utility, computing time and
information loss. A good privacy preserving technique should ensure a balance of utility and privacy, giving good performance and level of uncertainty. In this thesis, we propose an improved greedy heuristic that maintains a balance between utility, privacy, computing
time and information loss.
Given a dataset and k, constructing the dataset to k-anonymous dataset can be done by the above-mentioned schemes. One of the challenges is to nd the best value of k,
when the dataset is provided. In this thesis, a scheme has been proposed to achieve the best value of k for a given dataset.
The k-anonymity scheme suers from homogeneity attack. As a result, the l-diverse scheme was developed. It states that the diversity of domain values of the dataset in an equivalence class should be l. The l-diversity scheme suers from background knowledge attack. To address this problem, t-closeness scheme was proposed. The t-closeness principle states that the distribution of records in an equivalence class and the distribution of records in the table should not exceed more than t. The drawback with this scheme is that, the distance metric deployed in constructing a table, satisfying t-closeness, does not follow the distance characteristics. In this thesis, we have deployed
an alternative distance metric namely, Hellinger metric, for constructing a t-closeness table. The t-closeness scheme with this alternative distance metric performed better with respect to the discernability metric and computing time.
The k-anonymity, l-diversity and t-closeness schemes can be used to anonymize the dataset before publishing (releasing or sharing). This is generally in a static environment.
There are also data that need to be published in a dynamic environment. One such example is a social network. Anonymizing social networks poses great challenges.
Solutions suggested till date do not consider utility of the data while anonymizing. In this thesis, we propose a novel scheme to anonymize the users depending on their importance and take utility into consideration. Importance of a node was decided by the centrality
and prestige measures. Hence, the utility and privacy of the users are balanced
Local and global recoding methods for anonymizing set-valued data
In this paper, we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of supermarket transactions that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the knowledge of the adversary. We define a new version of the k-anonymity guarantee, the k m-anonymity, to limit the effects of the data dimensionality, and we propose efficient algorithms to transform the database. Our anonymization model relies on generalization instead of suppression, which is the most common practice in related works on such data. We develop an algorithm that finds the optimal solution, however, at a high cost that makes it inapplicable for large, realistic problems. Then, we propose a greedy heuristic, which performs generalizations in an Apriori, level-wise fashion. The heuristic scales much better and in most of the cases finds a solution close to the optimal. Finally, we investigate the application of techniques that partition the database and perform anonymization locally, aiming at the reduction of the memory consumption and further scalability. A thorough experimental evaluation with real datasets shows that a vertical partitioning approach achieves excellent results in practice. © 2010 Springer-Verlag.postprin
Privacy preserving publishing of hierarchical data
Many applications today rely on storage and management of semi-structured information, e.g., XML databases and document-oriented databases. This data often has to be shared with untrusted third parties, which makes individuals' privacy a fundamental problem. In this thesis, we propose anonymization techniques for privacy preserving publishing of hierarchical data. We show that the problem of anonymizing hierarchical data poses unique challenges that cannot be readily solved by existing mechanisms. We addressed these challenges by utilizing two major privacy techniques; generalization and anatomization. Data generalization encapsulates data by mapping nearly low-level values (e.g., influenza) to higher-level concepts (e.g., respiratory system diseases). Using generalizations and suppression of data values, we revised two standards for privacy protection: kanonymity that hides individuals within groups of k members and `-diversity that bounds the probability of linking sensitive values with individuals.We then apply these standards to hierarchical data and present utility-aware algorithms that enforce the standards. To evaluate our algorithms and their heuristics, we experiment on synthetic and real datasets obtained from two universities. Our experiments show that we significantly outperform related methods that provide comparable privacy guarantees. Data anatomization masks the link between identifying attributes and sensitive attributes. This mechanism removes the necessity for generalization and opens up the possibility for higher utility. While this is so, anatomization has not been proposed for hierarchical data where utility is a serious concern due to high dimensionality. In this thesis we show, how one can perform the non-trivial task of defining anatomization in the context of hierarchical data. Moreover, we extend the definition of classical `-diversity and introduce (p,m)-privacy that bounds the probability of being linked to more than m occurrences of any sensitive values by p. Again, in our experiments we have observed that even under stricter privacy conditions our method performs exemplary
Privacy-preserving data outsourcing in the cloud via semantic data splitting
Even though cloud computing provides many intrinsic benefits, privacy
concerns related to the lack of control over the storage and management of the
outsourced data still prevent many customers from migrating to the cloud.
Several privacy-protection mechanisms based on a prior encryption of the data
to be outsourced have been proposed. Data encryption offers robust security,
but at the cost of hampering the efficiency of the service and limiting the
functionalities that can be applied over the (encrypted) data stored on cloud
premises. Because both efficiency and functionality are crucial advantages of
cloud computing, in this paper we aim at retaining them by proposing a
privacy-protection mechanism that relies on splitting (clear) data, and on the
distributed storage offered by the increasingly popular notion of multi-clouds.
We propose a semantically-grounded data splitting mechanism that is able to
automatically detect pieces of data that may cause privacy risks and split them
on local premises, so that each chunk does not incur in those risks; then,
chunks of clear data are independently stored into the separate locations of a
multi-cloud, so that external entities cannot have access to the whole
confidential data. Because partial data are stored in clear on cloud premises,
outsourced functionalities are seamlessly and efficiently supported by just
broadcasting queries to the different cloud locations. To enforce a robust
privacy notion, our proposal relies on a privacy model that offers a priori
privacy guarantees; to ensure its feasibility, we have designed heuristic
algorithms that minimize the number of cloud storage locations we need; to show
its potential and generality, we have applied it to the least structured and
most challenging data type: plain textual documents
A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions
In recent decades, social network anonymization has become a crucial research
field due to its pivotal role in preserving users' privacy. However, the high
diversity of approaches introduced in relevant studies poses a challenge to
gaining a profound understanding of the field. In response to this, the current
study presents an exhaustive and well-structured bibliometric analysis of the
social network anonymization field. To begin our research, related studies from
the period of 2007-2022 were collected from the Scopus Database then
pre-processed. Following this, the VOSviewer was used to visualize the network
of authors' keywords. Subsequently, extensive statistical and network analyses
were performed to identify the most prominent keywords and trending topics.
Additionally, the application of co-word analysis through SciMAT and the
Alluvial diagram allowed us to explore the themes of social network
anonymization and scrutinize their evolution over time. These analyses
culminated in an innovative taxonomy of the existing approaches and
anticipation of potential trends in this domain. To the best of our knowledge,
this is the first bibliometric analysis in the social network anonymization
field, which offers a deeper understanding of the current state and an
insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure
- …