27,635 research outputs found

    Privacy Preserving ID3 over Horizontally, Vertically and Grid Partitioned Data

    Get PDF
    We consider privacy preserving decision tree induction via ID3 in the case where the training data is horizontally or vertically distributed. Furthermore, we consider the same problem in the case where the data is both horizontally and vertically distributed, a situation we refer to as grid partitioned data. We give an algorithm for privacy preserving ID3 over horizontally partitioned data involving more than two parties. For grid partitioned data, we discuss two different evaluation methods for preserving privacy ID3, namely, first merging horizontally and developing vertically or first merging vertically and next developing horizontally. Next to introducing privacy preserving data mining over grid-partitioned data, the main contribution of this paper is that we show, by means of a complexity analysis that the former evaluation method is the more efficient.Comment: 25 page

    Privacy-preserving Data Sharing on Vertically Partitioned Data

    Full text link
    In this work, we introduce a differentially private method for generating synthetic data from vertically partitioned data, \emph{i.e.}, where data of the same individuals is distributed across multiple data holders or parties. We present a differentially privacy stochastic gradient descent (DP-SGD) algorithm to train a mixture model over such partitioned data using variational inference. We modify a secure multiparty computation (MPC) framework to combine MPC with differential privacy (DP), in order to use differentially private MPC effectively to learn a probabilistic generative model under DP on such vertically partitioned data. Assuming the mixture components contain no dependencies across different parties, the objective function can be factorized into a sum of products of the contributions calculated by the parties. Finally, MPC is used to compute the aggregate between the different contributions. Moreover, we rigorously define the privacy guarantees with respect to the different players in the system. To demonstrate the accuracy of our method, we run our algorithm on the Adult dataset from the UCI machine learning repository, where we obtain comparable results to the non-partitioned case

    Privacy-Preserving Sequential Pattern Mining Over Vertically Partitioned Data

    Get PDF
    Privacy-preserving data mining in distributed environments is an important issue in the field of data mining. In this paper, we study how to conduct sequential patterns mining, which is one of the data mining computations, on private data in the following scenario: Multiple parties, each having a private data set, want to jointly conduct sequential pattern mining. Since no party wants to disclose its private data to other parties, a secure method needs to be provided to make such a computation feasible. We develop a practical solution to the above problem in this paper

    Compressed-VFL: Communication-Efficient Learning with Vertically Partitioned Data

    Full text link
    We propose Compressed Vertical Federated Learning (C-VFL) for communication-efficient training on vertically partitioned data. In C-VFL, a server and multiple parties collaboratively train a model on their respective features utilizing several local iterations and sharing compressed intermediate results periodically. Our work provides the first theoretical analysis of the effect message compression has on distributed training over vertically partitioned data. We prove convergence of non-convex objectives at a rate of O(1T)O(\frac{1}{\sqrt{T}}) when the compression error is bounded over the course of training. We provide specific requirements for convergence with common compression techniques, such as quantization and top-kk sparsification. Finally, we experimentally show compression can reduce communication by over 90%90\% without a significant decrease in accuracy over VFL without compression

    Privacy-Preserving Naive Bayesian Classification Over Vertically Partitioned Data

    Get PDF
    Protection of privacy is a critical problem in data mining. Preserving data privacy in distributed data mining is even more challenging. In this paper, we consider the problem of privacy-preserving naive Bayesian classification over vertically partitioned data. The problem is one of important issues in privacypreserving distributed data mining. Our approach is based on homomorphic encryption. The scheme is very efficient in the term of computation and communication cost
    corecore