19,538 research outputs found

    Anonymization of Sensitive Quasi-Identifiers for l-diversity and t-closeness

    Get PDF
    A number of studies on privacy-preserving data mining have been proposed. Most of them assume that they can separate quasi-identifiers (QIDs) from sensitive attributes. For instance, they assume that address, job, and age are QIDs but are not sensitive attributes and that a disease name is a sensitive attribute but is not a QID. However, all of these attributes can have features that are both sensitive attributes and QIDs in practice. In this paper, we refer to these attributes as sensitive QIDs and we propose novel privacy models, namely, (l1, ..., lq)-diversity and (t1, ..., tq)-closeness, and a method that can treat sensitive QIDs. Our method is composed of two algorithms: an anonymization algorithm and a reconstruction algorithm. The anonymization algorithm, which is conducted by data holders, is simple but effective, whereas the reconstruction algorithm, which is conducted by data analyzers, can be conducted according to each data analyzer’s objective. Our proposed method was experimentally evaluated using real data sets

    Utility Promises of Self-Organising Maps in Privacy Preserving Data Mining

    Get PDF
    Data mining techniques are highly efficient in sifting through big data to extract hidden knowledge and assist evidence-based decisions. However, it poses severe threats to individuals’ privacy because it can be exploited to allow inferences to be made on sensitive data. Researchers have proposed several privacy-preserving data mining techniques to address this challenge. One unique method is by extending anonymisation privacy models in data mining processes to enhance privacy and utility. Several published works in this area have utilised clustering techniques to enforce anonymisation models on private data, which work by grouping the data into clusters using a quality measure and then generalise the data in each group separately to achieve an anonymisation threshold. Although they are highly efficient and practical, however guaranteeing adequate balance between data utility and privacy protection remains a challenge. In addition to this, existing approaches do not work well with high-dimensional data, since it is difficult to develop good groupings without incurring excessive information loss. Our work aims to overcome these challenges by proposing a hybrid approach, combining self organising maps with conventional privacy based clustering algorithms. The main contribution of this paper is to show that, dimensionality reduction techniques can improve the anonymisation process by incurring less information loss, thus producing a more desirable balance between privacy and utility properties

    Towards Training Graph Neural Networks with Node-Level Differential Privacy

    Full text link
    Graph Neural Networks (GNNs) have achieved great success in mining graph-structured data. Despite the superior performance of GNNs in learning graph representations, serious privacy concerns have been raised for the trained models which could expose the sensitive information of graphs. We conduct the first formal study of training GNN models to ensure utility while satisfying the rigorous node-level differential privacy considering the private information of both node features and edges. We adopt the training framework utilizing personalized PageRank to decouple the message-passing process from feature aggregation during training GNN models and propose differentially private PageRank algorithms to protect graph topology information formally. Furthermore, we analyze the privacy degradation caused by the sampling process dependent on the differentially private PageRank results during model training and propose a differentially private GNN (DPGNN) algorithm to further protect node features and achieve rigorous node-level differential privacy. Extensive experiments on real-world graph datasets demonstrate the effectiveness of the proposed algorithms for providing node-level differential privacy while preserving good model utility

    Complementing Privacy and Utility Trade-Off with Self-Organising Maps

    Get PDF
    open access articleIn recent years, data-enabled technologies have intensified the rate and scale at which organisations collect and analyse data. Data mining techniques are applied to realise the full potential of large-scale data analysis. These techniques are highly efficient in sifting through big data to extract hidden knowledge and assist evidence-based decisions, offering significant benefits to their adopters. However, this capability is constrained by important legal, ethical and reputational concerns. These concerns arise because they can be exploited to allow inferences to be made on sensitive data, thus posing severe threats to individuals’ privacy. Studies have shown Privacy-Preserving Data Mining (PPDM) can adequately address this privacy risk and permit knowledge extraction in mining processes. Several published works in this area have utilised clustering techniques to enforce anonymisation models on private data, which work by grouping the data into clusters using a quality measure and generalising the data in each group separately to achieve an anonymisation threshold. However, existing approaches do not work well with high-dimensional data, since it is difficult to develop good groupings without incurring excessive information loss. Our work aims to complement this balancing act by optimising utility in PPDMprocesses. To illustrate this, we propose a hybrid approach, that combines self-organising maps with conventional privacy-based clustering algorithms. We demonstrate through experimental evaluation, that results from our approach produce more utility for data mining tasks and outperforms conventional privacy-based clustering algorithms. This approach can significantly enable large-scale analysis of data in a privacy-preserving and trustworthy manner

    Privacy-preserving data mashup model for trading person-specific information

    Get PDF
    © 2016 Elsevier B.V. All rights reserved. Business enterprises adopt cloud integration services to improve collaboration with their trading partners and to deliver quality data mining services. Data-as-a-Service (DaaS) mashup allows multiple enterprises to integrate their data upon the demand of consumers. Business enterprises face challenges not only to protect private data over the cloud but also to legally adhere to privacy compliance rules when trading person-specific data. They need an effective privacy-preserving business model to deal with the challenges in emerging markets. We propose a model that allows the collaboration of multiple enterprises for integrating their data and derives the contribution of each data provider by valuating the incorporated cost factors. This model serves as a guide for business decision-making, such as estimating the potential risk and finding the optimal value for publishing mashup data. Experiments on real-life data demonstrate that our approach can identify the optimal value in data mashup for different privacy models, including K-anonymity, LKC-privacy, and ∈-differential privacy, with various anonymization algorithms and privacy parameters

    Privacy Preserving Utility Mining: A Survey

    Full text link
    In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
    • …
    corecore