1,295 research outputs found
Privacy Preserving ID3 over Horizontally, Vertically and Grid Partitioned Data
We consider privacy preserving decision tree induction via ID3 in the case
where the training data is horizontally or vertically distributed. Furthermore,
we consider the same problem in the case where the data is both horizontally
and vertically distributed, a situation we refer to as grid partitioned data.
We give an algorithm for privacy preserving ID3 over horizontally partitioned
data involving more than two parties. For grid partitioned data, we discuss two
different evaluation methods for preserving privacy ID3, namely, first merging
horizontally and developing vertically or first merging vertically and next
developing horizontally. Next to introducing privacy preserving data mining
over grid-partitioned data, the main contribution of this paper is that we
show, by means of a complexity analysis that the former evaluation method is
the more efficient.Comment: 25 page
Privacy-Preserving Federated Learning over Vertically and Horizontally Partitioned Data for Financial Anomaly Detection
The effective detection of evidence of financial anomalies requires
collaboration among multiple entities who own a diverse set of data, such as a
payment network system (PNS) and its partner banks. Trust among these financial
institutions is limited by regulation and competition. Federated learning (FL)
enables entities to collaboratively train a model when data is either
vertically or horizontally partitioned across the entities. However, in
real-world financial anomaly detection scenarios, the data is partitioned both
vertically and horizontally and hence it is not possible to use existing FL
approaches in a plug-and-play manner.
Our novel solution, PV4FAD, combines fully homomorphic encryption (HE),
secure multi-party computation (SMPC), differential privacy (DP), and
randomization techniques to balance privacy and accuracy during training and to
prevent inference threats at model deployment time. Our solution provides input
privacy through HE and SMPC, and output privacy against inference time attacks
through DP. Specifically, we show that, in the honest-but-curious threat model,
banks do not learn any sensitive features about PNS transactions, and the PNS
does not learn any information about the banks' dataset but only learns
prediction labels. We also develop and analyze a DP mechanism to protect output
privacy during inference. Our solution generates high-utility models by
significantly reducing the per-bank noise level while satisfying distributed
DP. To ensure high accuracy, our approach produces an ensemble model, in
particular, a random forest. This enables us to take advantage of the
well-known properties of ensembles to reduce variance and increase accuracy.
Our solution won second prize in the first phase of the U.S. Privacy Enhancing
Technologies (PETs) Prize Challenge.Comment: Prize Winner in the U.S. Privacy Enhancing Technologies (PETs) Prize
Challeng
Privet: A Privacy-Preserving Vertical Federated Learning Service for Gradient Boosted Decision Tables
Vertical federated learning (VFL) has recently emerged as an appealing
distributed paradigm empowering multi-party collaboration for training
high-quality models over vertically partitioned datasets. Gradient boosting has
been popularly adopted in VFL, which builds an ensemble of weak learners
(typically decision trees) to achieve promising prediction performance.
Recently there have been growing interests in using decision table as an
intriguing alternative weak learner in gradient boosting, due to its simpler
structure, good interpretability, and promising performance. In the literature,
there have been works on privacy-preserving VFL for gradient boosted decision
trees, but no prior work has been devoted to the emerging case of decision
tables. Training and inference on decision tables are different from that the
case of generic decision trees, not to mention gradient boosting with decision
tables in VFL. In light of this, we design, implement, and evaluate Privet, the
first system framework enabling privacy-preserving VFL service for gradient
boosted decision tables. Privet delicately builds on lightweight cryptography
and allows an arbitrary number of participants holding vertically partitioned
datasets to securely train gradient boosted decision tables. Extensive
experiments over several real-world datasets and synthetic datasets demonstrate
that Privet achieves promising performance, with utility comparable to
plaintext centralized learning.Comment: Accepted in IEEE Transactions on Services Computing (TSC
An Enhanced CART Algorithm for Preserving Privacy of Distributed Data and Provide Access Control over Tree Data
Now in these days the utilization of distributed applications are increases rapidly because these applications are serve more than one client at a time. In the use of distributed database data distribution and management is a key area of attraction. Because of privacy of private data organizations are unwilling to participate for data mining due to the data leakage. So it is required to collect data from different parties in a secured way. This paper represents how CART algorithm can be used for multi parties in vertically partitioned environment. In order to solve the privacy and security issues the proposed model incorporates the server side random key generation and key distribution. Finally the performance of proposed classification technique is evaluated in terms of memory consumption, training time, search time, accuracy and there error rate
Efficient Privacy Preserving Distributed Clustering Based on Secret Sharing
In this paper, we propose a privacy preserving distributed
clustering protocol for horizontally partitioned data based on a very efficient
homomorphic additive secret sharing scheme. The model we use
for the protocol is novel in the sense that it utilizes two non-colluding
third parties. We provide a brief security analysis of our protocol from
information theoretic point of view, which is a stronger security model.
We show communication and computation complexity analysis of our
protocol along with another protocol previously proposed for the same
problem. We also include experimental results for computation and communication
overhead of these two protocols. Our protocol not only outperforms
the others in execution time and communication overhead on
data holders, but also uses a more efficient model for many data mining
applications
- …