19,538 research outputs found
Anonymization of Sensitive Quasi-Identifiers for l-diversity and t-closeness
A number of studies on privacy-preserving data mining have been proposed. Most of them assume that they can separate quasi-identifiers (QIDs) from sensitive attributes. For instance, they assume that address, job, and age are QIDs but are not sensitive attributes and that a disease name is a sensitive attribute but is not a QID. However, all of these attributes can have features that are both sensitive attributes and QIDs in practice. In this paper, we refer to these attributes as sensitive QIDs and we propose novel privacy models, namely, (l1, ..., lq)-diversity and (t1, ..., tq)-closeness, and a method that can treat sensitive QIDs. Our method is composed of two algorithms: an anonymization algorithm and a reconstruction algorithm. The anonymization algorithm, which is conducted by data holders, is simple but effective, whereas the reconstruction algorithm, which is conducted by data analyzers, can be conducted according to each data analyzer’s objective. Our proposed method was experimentally evaluated using real data sets
Utility Promises of Self-Organising Maps in Privacy Preserving Data Mining
Data mining techniques are highly efficient in sifting through big data to extract hidden knowledge and assist evidence-based decisions. However, it poses severe threats to individuals’ privacy because it can be exploited to allow inferences to be made on sensitive data. Researchers have proposed several privacy-preserving data mining techniques to address this challenge. One unique method is by extending anonymisation privacy models in data mining processes to enhance privacy and utility. Several published works in this area have utilised clustering techniques to enforce anonymisation models on private data, which work by grouping the data into clusters using a quality measure and then generalise the data in each group separately to achieve an anonymisation threshold. Although they are highly efficient and practical, however guaranteeing adequate balance between data utility and privacy protection remains a challenge. In addition to this, existing approaches do not work well with high-dimensional data, since it is difficult to develop good groupings without incurring excessive information loss. Our work aims to overcome these challenges by proposing a hybrid approach, combining self organising maps with conventional privacy based clustering algorithms. The main contribution of this paper is to show that, dimensionality reduction techniques can improve the anonymisation process by incurring less information loss, thus producing a more desirable balance between privacy and utility properties
Towards Training Graph Neural Networks with Node-Level Differential Privacy
Graph Neural Networks (GNNs) have achieved great success in mining
graph-structured data. Despite the superior performance of GNNs in learning
graph representations, serious privacy concerns have been raised for the
trained models which could expose the sensitive information of graphs. We
conduct the first formal study of training GNN models to ensure utility while
satisfying the rigorous node-level differential privacy considering the private
information of both node features and edges. We adopt the training framework
utilizing personalized PageRank to decouple the message-passing process from
feature aggregation during training GNN models and propose differentially
private PageRank algorithms to protect graph topology information formally.
Furthermore, we analyze the privacy degradation caused by the sampling process
dependent on the differentially private PageRank results during model training
and propose a differentially private GNN (DPGNN) algorithm to further protect
node features and achieve rigorous node-level differential privacy. Extensive
experiments on real-world graph datasets demonstrate the effectiveness of the
proposed algorithms for providing node-level differential privacy while
preserving good model utility
Complementing Privacy and Utility Trade-Off with Self-Organising Maps
open access articleIn recent years, data-enabled technologies have intensified the rate and scale at which organisations
collect and analyse data. Data mining techniques are applied to realise the full potential
of large-scale data analysis. These techniques are highly efficient in sifting through big data to extract
hidden knowledge and assist evidence-based decisions, offering significant benefits to their adopters.
However, this capability is constrained by important legal, ethical and reputational concerns. These
concerns arise because they can be exploited to allow inferences to be made on sensitive data, thus
posing severe threats to individuals’ privacy. Studies have shown Privacy-Preserving Data Mining
(PPDM) can adequately address this privacy risk and permit knowledge extraction in mining processes.
Several published works in this area have utilised clustering techniques to enforce anonymisation
models on private data, which work by grouping the data into clusters using a quality measure and
generalising the data in each group separately to achieve an anonymisation threshold. However, existing
approaches do not work well with high-dimensional data, since it is difficult to develop good groupings
without incurring excessive information loss. Our work aims to complement this balancing act by
optimising utility in PPDMprocesses. To illustrate this, we propose a hybrid approach, that combines
self-organising maps with conventional privacy-based clustering algorithms. We demonstrate through
experimental evaluation, that results from our approach produce more utility for data mining tasks and
outperforms conventional privacy-based clustering algorithms. This approach can significantly enable
large-scale analysis of data in a privacy-preserving and trustworthy manner
Privacy-preserving data mashup model for trading person-specific information
© 2016 Elsevier B.V. All rights reserved. Business enterprises adopt cloud integration services to improve collaboration with their trading partners and to deliver quality data mining services. Data-as-a-Service (DaaS) mashup allows multiple enterprises to integrate their data upon the demand of consumers. Business enterprises face challenges not only to protect private data over the cloud but also to legally adhere to privacy compliance rules when trading person-specific data. They need an effective privacy-preserving business model to deal with the challenges in emerging markets. We propose a model that allows the collaboration of multiple enterprises for integrating their data and derives the contribution of each data provider by valuating the incorporated cost factors. This model serves as a guide for business decision-making, such as estimating the potential risk and finding the optimal value for publishing mashup data. Experiments on real-life data demonstrate that our approach can identify the optimal value in data mashup for different privacy models, including K-anonymity, LKC-privacy, and ∈-differential privacy, with various anonymization algorithms and privacy parameters
Privacy Preserving Utility Mining: A Survey
In big data era, the collected data usually contains rich information and
hidden knowledge. Utility-oriented pattern mining and analytics have shown a
powerful ability to explore these ubiquitous data, which may be collected from
various fields and applications, such as market basket analysis, retail,
click-stream analysis, medical analysis, and bioinformatics. However, analysis
of these data with sensitive private information raises privacy concerns. To
achieve better trade-off between utility maximizing and privacy preserving,
Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent
years. In this paper, we provide a comprehensive overview of PPUM. We first
present the background of utility mining, privacy-preserving data mining and
PPUM, then introduce the related preliminaries and problem formulation of PPUM,
as well as some key evaluation criteria for PPUM. In particular, we present and
discuss the current state-of-the-art PPUM algorithms, as well as their
advantages and deficiencies in detail. Finally, we highlight and discuss some
technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
- …