42,562 research outputs found

    Secret charing vs. encryption-based techniques for privacy preserving data mining

    Get PDF
    Privacy preserving querying and data publishing has been studied in the context of statistical databases and statistical disclosure control. Recently, large-scale data collection and integration efforts increased privacy concerns which motivated data mining researchers to investigate privacy implications of data mining and how data mining can be performed without violating privacy. In this paper, we first provide an overview of privacy preserving data mining focusing on distributed data sources, then we compare two technologies used in privacy preserving data mining. The first technology is encryption based, and it is used in earlier approaches. The second technology is secret-sharing which is recently being considered as a more efficient approach

    Privacy Preserving Utility Mining: A Survey

    Full text link
    In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page

    Privacy preserving data mining

    Get PDF
    A fruitful direction for future data mining research will be the development of technique that incorporates privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We analyze the possibility of privacy in data mining techniques in two phasesrandomization and reconstruction. Data mining services require accurate input data for their results to be meaningful, but privacy concerns may influence users to provide spurious information. To preserve client privacy in the data mining process, techniques based on random perturbation of data records are used. Suppose there are many clients, each having some personal information, and one server, which is interested only in aggregate, statistically significant, properties of this information. The clients can protect privacy of their data by perturbing it with a randomization algorithm and then submitting the randomized version. This approach is called randomization. The randomization algorithm is chosen so that aggregate properties of the data can be recovered with sufficient precision, while individual entries are significantly distorted. For the concept of using value distortion to protect privacy to be useful, we need to be able to reconstruct the original data distribution so that data mining techniques can be effectively utilized to yield the required statistics. Analysis Let xi be the original instance of data at client i. We introduce a random shift yi using randomization technique explained below. The server runs the reconstruction algorithm (also explained below) on the perturbed value zi = xi + yi to get an approximate of the original data distribution suitable for data mining applications. Randomization We have used the following randomizing operator for data perturbation: Given x, let R(x) be x+€ (mod 1001) where € is chosen uniformly at random in {-100…100}. Reconstruction of discrete data set P(X=x) = f X (x) ----Given P(Y=y) = F y (y) ---Given P (Z=z) = f Z (z) ---Given f (X/Z) = P(X=x | Z=z) = P(X=x, Z=z)/P (Z=z) = P(X=x, X+Y=Z)/ f Z (z) = P(X=x, Y=Z - X)/ f Z (z) = P(X=x)*P(Y=Z-X)/ f Z (z) = P(X=x)*P(Y=y)/ f Z (z) Results In this project we have done two aspects of privacy preserving data mining. The first phase involves perturbing the original data set using ‘randomization operator’ techniques and the second phase deals with reconstructing the randomized data set using the proposed algorithm to get an approximate of the original data set. The performance metrics like percentage deviation, accuracy and privacy breaches were calculated. In this project we studied the technical feasibility of realizing privacy preserving data mining. The basic promise was that the sensitive values in a user’s record will be perturbed using a randomizing function and an approximate of the perturbed data set be recovered using reconstruction algorithm

    Privacy Preserving Mining in Code Profiling Data

    Get PDF
    Privacy preserving data mining is an important issue nowadays for data mining. Since various organizations and people are generating sensitive data or information these days. They don’t want to share their sensitive data however that data can be useful for data mining purpose. So, due to privacy preserving mining that data can be mined usefully without harming the privacy of that data. Privacy can be preserved by applying encryption on database which is to be mined because now the data is secure due to encryption. Code profiling is a field in software engineering where we can apply data mining to discover some knowledge so that it will be useful in future development of software. In this work we have applied privacy preserving mining in code profiling data such as software metrics of various codes. Results of data mining on actual and encrypted data are compared for accuracy. We have also analyzed the results of privacy preserving mining in code profiling data and found interesting results

    Privacy Preserving Data Mining

    Get PDF
    corecore