2,156 research outputs found
Privacy-preserving distributed data mining
This thesis is concerned with privacy-preserving distributed data mining algorithms. The main challenges in this setting are inference attacks and the formation of collusion groups. The inference problem is the reconstruction of sensitive data by attackers from non-sensitive sources, such as intermediate results, exchanged messages, or public information. Moreover, in a distributed scenario, malicious insiders can organize collusion groups to deploy more effective inference attacks. This thesis shows that existing privacy measures do not adequately protect privacy against inference and collusion. Therefore, in this thesis, new measures based on information theory are developed to overcome the identiffied limitations. Furthermore, a new distributed data clustering algorithm is presented. The clustering approach is based on a kernel density estimates approximation that generates a controlled amount of ambiguity in the density estimates and provides privacy to original data. Besides, this thesis also introduces the first privacy-preserving algorithms for frequent pattern discovery in a distributed time series. Time series are transformed into a set of n-dimensional data points and finding frequent patterns reduced to finding local maxima in the n-dimensional density space. The proposed algorithms are linear in the size of the dataset with low communication costs, validated by experimental evaluation using different datasets.Diese Arbeit befasst sich mit vertraulichkeitsbewahrendem Data Mining in verteilten Umgebungen mit Schwerpunkt auf ausgewählten N-Agenten-Angriffsszenarien für das Inferenzproblem im Data-Clustering und der Zeitreihenanalyse. Dabei handelt es sich um Angriffe von einzelnen oder Teilgruppen von Agenten innerhalb einer verteilten Data Mining-Gruppe oder von einem einzelnen Agenten außerhalb dieser Gruppe. Zunächst werden in dieser Arbeit zwei neue Privacy-Maße vorgestellt, die im Gegensatz zu bislang existierenden, die im verteilten Data Mining allgemein geforderte Eigenschaften zur Vertraulichkeitsbewahrung erfüllen und bei denen sich der gemessene Grad der Vertraulichkeit auf die verwendete Datenanalysemethode und die Anzahl von Angreifern bezieht. Für den Zweck eines vertraulichkeitsbewahrenden, verteilten Data-Clustering wird ein neues Kernel-Dichteabschätzungsbasiertes Verfahren namens KDECS vorgestellt. KDECS verwendet eine Approximation der originalen, lokalen Kernel-Dichteschätzung, so dass die ursprünglichen Daten anderer Agenten in der Data Mining-Gruppe mit einer höheren Wahrscheinlichkeit als einem hierfür vorgegebenen Wert nicht mehr zu rekonstruieren sind. Das Verfahren ist nachweislich sicherer als Data-Clustering mit generativen Mixture Modellen und SMC-basiert sicherem k-means Data-Clustering. Zusätzlich stellen wir neue Verfahren, namens DPD-TS, DPD-HE und DPDFS, für eine vertraulichkeitsbewahrende, verteilte Mustererkennung in Zeitreihen vor, deren Komplexität und Sicherheitsgrad wir mit den zuvor erwähnten neuen Privacy-Maßen analysieren. Dabei hängt ein von einzelnen Agenten einer Data Mining-Gruppe jeweils vorgegebener, minimaler Sicherheitsgrad von DPD-TS und DPD-FS nur von der Dimensionsreduktion der Zeitreihenwerte und ihrer Diskretisierung ab und kann leicht überprüft werden. Einen noch besseren Schutz von sensiblen Daten bietet das Verfahren DPD HE mit Hilfe von homomorpher Verschlüsselung. Neben der theoretischen Analyse wurden die experimentellen Leistungsbewertungen der entwickelten Verfahren mit verschiedenen, öffentlich verfügbaren Datensätzen durchgeführt
Enabling Multi-level Trust in Privacy Preserving Data Mining
Privacy Preserving Data Mining (PPDM) addresses the problem of developing
accurate models about aggregated data without access to precise information in
individual data record. A widely studied \emph{perturbation-based PPDM}
approach introduces random perturbation to individual values to preserve
privacy before data is published. Previous solutions of this approach are
limited in their tacit assumption of single-level trust on data miners.
In this work, we relax this assumption and expand the scope of
perturbation-based PPDM to Multi-Level Trust (MLT-PPDM). In our setting, the
more trusted a data miner is, the less perturbed copy of the data it can
access. Under this setting, a malicious data miner may have access to
differently perturbed copies of the same data through various means, and may
combine these diverse copies to jointly infer additional information about the
original data that the data owner does not intend to release. Preventing such
\emph{diversity attacks} is the key challenge of providing MLT-PPDM services.
We address this challenge by properly correlating perturbation across copies at
different trust levels. We prove that our solution is robust against diversity
attacks with respect to our privacy goal. That is, for data miners who have
access to an arbitrary collection of the perturbed copies, our solution prevent
them from jointly reconstructing the original data more accurately than the
best effort using any individual copy in the collection. Our solution allows a
data owner to generate perturbed copies of its data for arbitrary trust levels
on-demand. This feature offers data owners maximum flexibility.Comment: 20 pages, 5 figures. Accepted for publication in IEEE Transactions on
Knowledge and Data Engineerin
Peer-to-Peer Secure Multi-Party Numerical Computation Facing Malicious Adversaries
We propose an efficient framework for enabling secure multi-party numerical
computations in a Peer-to-Peer network. This problem arises in a range of
applications such as collaborative filtering, distributed computation of trust
and reputation, monitoring and other tasks, where the computing nodes is
expected to preserve the privacy of their inputs while performing a joint
computation of a certain function. Although there is a rich literature in the
field of distributed systems security concerning secure multi-party
computation, in practice it is hard to deploy those methods in very large scale
Peer-to-Peer networks. In this work, we try to bridge the gap between
theoretical algorithms in the security domain, and a practical Peer-to-Peer
deployment.
We consider two security models. The first is the semi-honest model where
peers correctly follow the protocol, but try to reveal private information. We
provide three possible schemes for secure multi-party numerical computation for
this model and identify a single light-weight scheme which outperforms the
others. Using extensive simulation results over real Internet topologies, we
demonstrate that our scheme is scalable to very large networks, with up to
millions of nodes. The second model we consider is the malicious peers model,
where peers can behave arbitrarily, deliberately trying to affect the results
of the computation as well as compromising the privacy of other peers. For this
model we provide a fourth scheme to defend the execution of the computation
against the malicious peers. The proposed scheme has a higher complexity
relative to the semi-honest model. Overall, we provide the Peer-to-Peer network
designer a set of tools to choose from, based on the desired level of security.Comment: Submitted to Peer-to-Peer Networking and Applications Journal (PPNA)
200
zPROBE: Zero Peek Robustness Checks for Federated Learning
Privacy-preserving federated learning allows multiple users to jointly train
a model with coordination of a central server. The server only learns the final
aggregation result, thus the users' (private) training data is not leaked from
the individual model updates. However, keeping the individual updates private
allows malicious users to perform Byzantine attacks and degrade the accuracy
without being detected. Best existing defenses against Byzantine workers rely
on robust rank-based statistics, e.g., median, to find malicious updates.
However, implementing privacy-preserving rank-based statistics is nontrivial
and not scalable in the secure domain, as it requires sorting all individual
updates. We establish the first private robustness check that uses high break
point rank-based statistics on aggregated model updates. By exploiting
randomized clustering, we significantly improve the scalability of our defense
without compromising privacy. We leverage our statistical bounds in
zero-knowledge proofs to detect and remove malicious updates without revealing
the private user updates. Our novel framework, zPROBE, enables Byzantine
resilient and secure federated learning. Empirical evaluations demonstrate that
zPROBE provides a low overhead solution to defend against state-of-the-art
Byzantine attacks while preserving privacy.Comment: ICCV 202
- …