13,159 research outputs found
Semi-Trusted Mixer Based Privacy Preserving Distributed Data Mining for Resource Constrained Devices
In this paper a homomorphic privacy preserving association rule mining
algorithm is proposed which can be deployed in resource constrained devices
(RCD). Privacy preserved exchange of counts of itemsets among distributed
mining sites is a vital part in association rule mining process. Existing
cryptography based privacy preserving solutions consume lot of computation due
to complex mathematical equations involved. Therefore less computation involved
privacy solutions are extremely necessary to deploy mining applications in RCD.
In this algorithm, a semi-trusted mixer is used to unify the counts of itemsets
encrypted by all mining sites without revealing individual values. The proposed
algorithm is built on with a well known communication efficient association
rule mining algorithm named count distribution (CD). Security proofs along with
performance analysis and comparison show the well acceptability and
effectiveness of the proposed algorithm. Efficient and straightforward privacy
model and satisfactory performance of the protocol promote itself among one of
the initiatives in deploying data mining application in RCD.Comment: IEEE Publication format, International Journal of Computer Science
and Information Security, IJCSIS, Vol. 8 No. 1, April 2010, USA. ISSN 1947
5500, http://sites.google.com/site/ijcsis
EsPRESSo: Efficient Privacy-Preserving Evaluation of Sample Set Similarity
Electronic information is increasingly often shared among entities without
complete mutual trust. To address related security and privacy issues, a few
cryptographic techniques have emerged that support privacy-preserving
information sharing and retrieval. One interesting open problem in this context
involves two parties that need to assess the similarity of their datasets, but
are reluctant to disclose their actual content. This paper presents an
efficient and provably-secure construction supporting the privacy-preserving
evaluation of sample set similarity, where similarity is measured as the
Jaccard index. We present two protocols: the first securely computes the
(Jaccard) similarity of two sets, and the second approximates it, using MinHash
techniques, with lower complexities. We show that our novel protocols are
attractive in many compelling applications, including document/multimedia
similarity, biometric authentication, and genetic tests. In the process, we
demonstrate that our constructions are appreciably more efficient than prior
work.Comment: A preliminary version of this paper was published in the Proceedings
of the 7th ESORICS International Workshop on Digital Privacy Management (DPM
2012). This is the full version, appearing in the Journal of Computer
Securit
Privacy Preserving Utility Mining: A Survey
In big data era, the collected data usually contains rich information and
hidden knowledge. Utility-oriented pattern mining and analytics have shown a
powerful ability to explore these ubiquitous data, which may be collected from
various fields and applications, such as market basket analysis, retail,
click-stream analysis, medical analysis, and bioinformatics. However, analysis
of these data with sensitive private information raises privacy concerns. To
achieve better trade-off between utility maximizing and privacy preserving,
Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent
years. In this paper, we provide a comprehensive overview of PPUM. We first
present the background of utility mining, privacy-preserving data mining and
PPUM, then introduce the related preliminaries and problem formulation of PPUM,
as well as some key evaluation criteria for PPUM. In particular, we present and
discuss the current state-of-the-art PPUM algorithms, as well as their
advantages and deficiencies in detail. Finally, we highlight and discuss some
technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
Privacy Preserving Data Mining, A Data Quality Approach
Privacy is one of the most important properties an information system must satisfy. A relatively new trend shows that classical access control techniques are not sufficient to guarantee privacy when datamining techniques are used. Privacy Preserving Data Mining (PPDM)
algorithms have been recently introduced with the aim of sanitizing the database in such a way to prevent the discovery of sensible information (e.g. association rules). A drawback of such algorithms is that the introduced sanitization may disrupt the quality of data itself. In this report we introduce a new methodology and algorithms for performing useful PPDM operations, while preserving the data quality of the underlying database.JRC.G.6-Sensors, radar technologies and cybersecurit
State of the Art in Privacy Preserving Data Mining
Privacy is one of the most important properties an information system must satisfy. A relatively new trend shows that classical
access control techniques are not sufficient to guarantee privacy when Data Mining techniques are used. Such a trend, especially in the context of public databases, or in the context of sensible information related to critical infrastructures, represents, nowadays a not negligible thread. Privacy Preserving Data Mining (PPDM) algorithms have been recently introduced with the aim of modifying the database in such a way to prevent the discovery of sensible information. This is a very complex task and there exist in the scientific literature some different approaches to the problem. In this work we present a "Survey" of the current PPDM methodologies which seem promising for the future.JRC.G.6-Sensors, radar technologies and cybersecurit
Exploring the Existing and Unknown Side Effects of Privacy Preserving Data Mining Algorithms
The data mining sanitization process involves converting the data by masking the sensitive data and then releasing it to public domain. During the sanitization process, side effects such as hiding failure, missing cost and artificial cost of the data were observed. Privacy Preserving Data Mining (PPDM) algorithms were developed for the sanitization process to overcome information loss and yet maintain data integrity. While these PPDM algorithms did provide benefits for privacy preservation, they also made sure to solve the side effects that occurred during the sanitization process. Many PPDM algorithms were developed to reduce these side effects. There are several PPDM algorithms created based on different PPDM techniques. However, previous studies have not explored or justified why non-traditional side effects were not given much importance.
This study reported the findings of the side effects for the PPDM algorithms in a newly created web repository. The research methodology adopted for this study was Design Science Research (DSR). This research was conducted in four phases, which were as follows. The first phase addressed the characteristics, similarities, differences, and relationships of existing side effects. The next phase found the characteristics of non-traditional side effects. The third phase used the Privacy Preservation and Security Framework (PPSF) tool to test if non-traditional side effects occur in PPDM algorithms. This phase also attempted to find additional unknown side effects which have not been found in prior studies. PPDM algorithms considered were Greedy, POS2DT, SIF_IDF, cpGA2DT, pGA2DT, sGA2DT. PPDM techniques associated were anonymization, perturbation, randomization, condensation, heuristic, reconstruction, and cryptography. The final phase involved creating a new online web repository to report all the side effects found for the PPDM algorithms. A Web repository was created using full stack web development. AngularJS, Spring, Spring Boot and Hibernate frameworks were used to build the web application. The results of the study implied various PPDM algorithms and their side effects. Additionally, the relationship and impact that hiding failure, missing cost, and artificial cost have on each other was also understood. Interestingly, the side effects and their relationship with the type of data (sensitive or non-sensitive or new) was observed. As the web repository acts as a quick reference domain for PPDM algorithms. Developing, improving, inventing, and reporting PPDM algorithms is necessary. This study will influence researchers or organizations to report, use, reuse, or develop better PPDM algorithms
Privacy Preserving Data Mining, Evaluation Methodologies
Privacy is one of the most important properties an information system must satisfy. A relatively new trend shows that classical access control techniques are not sufficient to guarantee privacy when datamining techniques are used. Privacy Preserving Data Mining (PPDM)
algorithms have been recently introduced with the aim of modifying the database in such a way to prevent the discovery of sensible information. Due to the large amount of possible techniques that can be used to achieve this goal, it is necessary to provide some standard evaluation metrics to determine the best algorithms for a specific application or context. Currently, however, there is no common set of parameters that can be used for this purpose. Moreover, because sanitization modifies the data, an important issue, especially for critical data, is to preserve the quality of data. However, to the best of our knowledge, no approaches have been developed dealing with the issue of data quality in the context of PPDM algorithms. This report explores the problem of PPDM algorithm evaluation, starting from the key goal of preserving of data quality. To achieve such goal, we propose a formal definition of data quality specifically tailored for use in the context of PPDM algorithms, a set of evaluation parameters and an evaluation algorithm. Moreover, because of the "environment related" nature of data quality, a structure to represent constraints and information relevance related to data is presented. The resulting evaluation core process is then presented as a part of a more general three step evaluation framework, taking also into account other aspects of the algorithm evaluation such as efficiency, scalability and level of privacy.JRC.G.6-Sensors, radar technologies and cybersecurit
- …