13,098 research outputs found
Privacy-Preserving Community Detection for Locally Distributed Multiple Networks
Modern multi-layer networks are commonly stored and analyzed in a local and
distributed fashion because of the privacy, ownership, and communication costs.
The literature on the model-based statistical methods for community detection
based on these data is still limited. This paper proposes a new method for
consensus community detection and estimation in a multi-layer stochastic block
model using locally stored and computed network data with privacy protection. A
novel algorithm named privacy-preserving Distributed Spectral Clustering
(ppDSC) is developed. To preserve the edges' privacy, we adopt the randomized
response (RR) mechanism to perturb the network edges, which satisfies the
strong notion of differential privacy. The ppDSC algorithm is performed on the
squared RR-perturbed adjacency matrices to prevent possible cancellation of
communities among different layers. To remove the bias incurred by RR and the
squared network matrices, we develop a two-step bias-adjustment procedure. Then
we perform eigen-decomposition on the debiased matrices, aggregation of the
local eigenvectors using an orthogonal Procrustes transformation, and k-means
clustering. We provide theoretical analysis on the statistical errors of ppDSC
in terms of eigen-vector estimation. In addition, the blessings and curses of
network heterogeneity are well-explained by our bounds
Privacy-preserving distributed data mining
This thesis is concerned with privacy-preserving distributed data mining algorithms. The main challenges in this setting are inference attacks and the formation of collusion groups. The inference problem is the reconstruction of sensitive data by attackers from non-sensitive sources, such as intermediate results, exchanged messages, or public information. Moreover, in a distributed scenario, malicious insiders can organize collusion groups to deploy more effective inference attacks. This thesis shows that existing privacy measures do not adequately protect privacy against inference and collusion. Therefore, in this thesis, new measures based on information theory are developed to overcome the identiffied limitations. Furthermore, a new distributed data clustering algorithm is presented. The clustering approach is based on a kernel density estimates approximation that generates a controlled amount of ambiguity in the density estimates and provides privacy to original data. Besides, this thesis also introduces the first privacy-preserving algorithms for frequent pattern discovery in a distributed time series. Time series are transformed into a set of n-dimensional data points and finding frequent patterns reduced to finding local maxima in the n-dimensional density space. The proposed algorithms are linear in the size of the dataset with low communication costs, validated by experimental evaluation using different datasets.Diese Arbeit befasst sich mit vertraulichkeitsbewahrendem Data Mining in verteilten Umgebungen mit Schwerpunkt auf ausgewählten N-Agenten-Angriffsszenarien für das Inferenzproblem im Data-Clustering und der Zeitreihenanalyse. Dabei handelt es sich um Angriffe von einzelnen oder Teilgruppen von Agenten innerhalb einer verteilten Data Mining-Gruppe oder von einem einzelnen Agenten außerhalb dieser Gruppe. Zunächst werden in dieser Arbeit zwei neue Privacy-Maße vorgestellt, die im Gegensatz zu bislang existierenden, die im verteilten Data Mining allgemein geforderte Eigenschaften zur Vertraulichkeitsbewahrung erfüllen und bei denen sich der gemessene Grad der Vertraulichkeit auf die verwendete Datenanalysemethode und die Anzahl von Angreifern bezieht. Für den Zweck eines vertraulichkeitsbewahrenden, verteilten Data-Clustering wird ein neues Kernel-Dichteabschätzungsbasiertes Verfahren namens KDECS vorgestellt. KDECS verwendet eine Approximation der originalen, lokalen Kernel-Dichteschätzung, so dass die ursprünglichen Daten anderer Agenten in der Data Mining-Gruppe mit einer höheren Wahrscheinlichkeit als einem hierfür vorgegebenen Wert nicht mehr zu rekonstruieren sind. Das Verfahren ist nachweislich sicherer als Data-Clustering mit generativen Mixture Modellen und SMC-basiert sicherem k-means Data-Clustering. Zusätzlich stellen wir neue Verfahren, namens DPD-TS, DPD-HE und DPDFS, für eine vertraulichkeitsbewahrende, verteilte Mustererkennung in Zeitreihen vor, deren Komplexität und Sicherheitsgrad wir mit den zuvor erwähnten neuen Privacy-Maßen analysieren. Dabei hängt ein von einzelnen Agenten einer Data Mining-Gruppe jeweils vorgegebener, minimaler Sicherheitsgrad von DPD-TS und DPD-FS nur von der Dimensionsreduktion der Zeitreihenwerte und ihrer Diskretisierung ab und kann leicht überprüft werden. Einen noch besseren Schutz von sensiblen Daten bietet das Verfahren DPD HE mit Hilfe von homomorpher Verschlüsselung. Neben der theoretischen Analyse wurden die experimentellen Leistungsbewertungen der entwickelten Verfahren mit verschiedenen, öffentlich verfügbaren Datensätzen durchgeführt
Privacy Preserving Multi-Server k-means Computation over Horizontally Partitioned Data
The k-means clustering is one of the most popular clustering algorithms in
data mining. Recently a lot of research has been concentrated on the algorithm
when the dataset is divided into multiple parties or when the dataset is too
large to be handled by the data owner. In the latter case, usually some servers
are hired to perform the task of clustering. The dataset is divided by the data
owner among the servers who together perform the k-means and return the cluster
labels to the owner. The major challenge in this method is to prevent the
servers from gaining substantial information about the actual data of the
owner. Several algorithms have been designed in the past that provide
cryptographic solutions to perform privacy preserving k-means. We provide a new
method to perform k-means over a large set using multiple servers. Our
technique avoids heavy cryptographic computations and instead we use a simple
randomization technique to preserve the privacy of the data. The k-means
computed has exactly the same efficiency and accuracy as the k-means computed
over the original dataset without any randomization. We argue that our
algorithm is secure against honest but curious and passive adversary.Comment: 19 pages, 4 tables. International Conference on Information Systems
Security. Springer, Cham, 201
Efficient Privacy Preserving Distributed Clustering Based on Secret Sharing
In this paper, we propose a privacy preserving distributed
clustering protocol for horizontally partitioned data based on a very efficient
homomorphic additive secret sharing scheme. The model we use
for the protocol is novel in the sense that it utilizes two non-colluding
third parties. We provide a brief security analysis of our protocol from
information theoretic point of view, which is a stronger security model.
We show communication and computation complexity analysis of our
protocol along with another protocol previously proposed for the same
problem. We also include experimental results for computation and communication
overhead of these two protocols. Our protocol not only outperforms
the others in execution time and communication overhead on
data holders, but also uses a more efficient model for many data mining
applications
Privacy-Preserving and Outsourced Multi-User k-Means Clustering
Many techniques for privacy-preserving data mining (PPDM) have been
investigated over the past decade. Often, the entities involved in the data
mining process are end-users or organizations with limited computing and
storage resources. As a result, such entities may want to refrain from
participating in the PPDM process. To overcome this issue and to take many
other benefits of cloud computing, outsourcing PPDM tasks to the cloud
environment has recently gained special attention. We consider the scenario
where n entities outsource their databases (in encrypted format) to the cloud
and ask the cloud to perform the clustering task on their combined data in a
privacy-preserving manner. We term such a process as privacy-preserving and
outsourced distributed clustering (PPODC). In this paper, we propose a novel
and efficient solution to the PPODC problem based on k-means clustering
algorithm. The main novelty of our solution lies in avoiding the secure
division operations required in computing cluster centers altogether through an
efficient transformation technique. Our solution builds the clusters securely
in an iterative fashion and returns the final cluster centers to all entities
when a pre-determined termination condition holds. The proposed solution
protects data confidentiality of all the participating entities under the
standard semi-honest model. To the best of our knowledge, ours is the first
work to discuss and propose a comprehensive solution to the PPODC problem that
incurs negligible cost on the participating entities. We theoretically estimate
both the computation and communication costs of the proposed protocol and also
demonstrate its practical value through experiments on a real dataset.Comment: 16 pages, 2 figures, 5 table
- …