55,761 research outputs found
Privacy-Preserving Federated Deep Clustering based on GAN
Federated clustering (FC) is an essential extension of centralized clustering
designed for the federated setting, wherein the challenge lies in constructing
a global similarity measure without the need to share private data.
Conventional approaches to FC typically adopt extensions of centralized
methods, like K-means and fuzzy c-means. However, these methods are susceptible
to non-independent-and-identically-distributed (non-IID) data among clients,
leading to suboptimal performance, particularly with high-dimensional data. In
this paper, we present a novel approach to address these limitations by
proposing a Privacy-Preserving Federated Deep Clustering based on Generative
Adversarial Networks (GANs). Each client trains a local generative adversarial
network (GAN) locally and uploads the synthetic data to the server. The server
applies a deep clustering network on the synthetic data to establish
cluster centroids, which are then downloaded to the clients for cluster
assignment. Theoretical analysis demonstrates that the GAN-generated samples,
shared among clients, inherently uphold certain privacy guarantees,
safeguarding the confidentiality of individual data. Furthermore, extensive
experimental evaluations showcase the effectiveness and utility of our proposed
method in achieving accurate and privacy-preserving federated clustering
Federated clustering with GAN-based data synthesis
Federated clustering (FC) is an extension of centralized clustering in
federated settings. The key here is how to construct a global similarity
measure without sharing private data, since the local similarity may be
insufficient to group local data correctly and the similarity of samples across
clients cannot be directly measured due to privacy constraints. Obviously, the
most straightforward way to analyze FC is to employ the methods extended from
centralized ones, such as K-means (KM) and fuzzy c-means (FCM). However, they
are vulnerable to non independent-and-identically-distributed (non-IID) data
among clients. To handle this, we propose a new federated clustering framework,
named synthetic data aided federated clustering (SDA-FC). It trains generative
adversarial network locally in each client and uploads the generated synthetic
data to the server, where KM or FCM is performed on the synthetic data. The
synthetic data can make the model immune to the non-IID problem and enable us
to capture the global similarity characteristics more effectively without
sharing private data. Comprehensive experiments reveals the advantages of
SDA-FC, including superior performance in addressing the non-IID problem and
the device failures
Privacy-Preserving and Outsourced Multi-User k-Means Clustering
Many techniques for privacy-preserving data mining (PPDM) have been
investigated over the past decade. Often, the entities involved in the data
mining process are end-users or organizations with limited computing and
storage resources. As a result, such entities may want to refrain from
participating in the PPDM process. To overcome this issue and to take many
other benefits of cloud computing, outsourcing PPDM tasks to the cloud
environment has recently gained special attention. We consider the scenario
where n entities outsource their databases (in encrypted format) to the cloud
and ask the cloud to perform the clustering task on their combined data in a
privacy-preserving manner. We term such a process as privacy-preserving and
outsourced distributed clustering (PPODC). In this paper, we propose a novel
and efficient solution to the PPODC problem based on k-means clustering
algorithm. The main novelty of our solution lies in avoiding the secure
division operations required in computing cluster centers altogether through an
efficient transformation technique. Our solution builds the clusters securely
in an iterative fashion and returns the final cluster centers to all entities
when a pre-determined termination condition holds. The proposed solution
protects data confidentiality of all the participating entities under the
standard semi-honest model. To the best of our knowledge, ours is the first
work to discuss and propose a comprehensive solution to the PPODC problem that
incurs negligible cost on the participating entities. We theoretically estimate
both the computation and communication costs of the proposed protocol and also
demonstrate its practical value through experiments on a real dataset.Comment: 16 pages, 2 figures, 5 table
Privacy Preserving Multi-Server k-means Computation over Horizontally Partitioned Data
The k-means clustering is one of the most popular clustering algorithms in
data mining. Recently a lot of research has been concentrated on the algorithm
when the dataset is divided into multiple parties or when the dataset is too
large to be handled by the data owner. In the latter case, usually some servers
are hired to perform the task of clustering. The dataset is divided by the data
owner among the servers who together perform the k-means and return the cluster
labels to the owner. The major challenge in this method is to prevent the
servers from gaining substantial information about the actual data of the
owner. Several algorithms have been designed in the past that provide
cryptographic solutions to perform privacy preserving k-means. We provide a new
method to perform k-means over a large set using multiple servers. Our
technique avoids heavy cryptographic computations and instead we use a simple
randomization technique to preserve the privacy of the data. The k-means
computed has exactly the same efficiency and accuracy as the k-means computed
over the original dataset without any randomization. We argue that our
algorithm is secure against honest but curious and passive adversary.Comment: 19 pages, 4 tables. International Conference on Information Systems
Security. Springer, Cham, 201
On Collaborative Predictive Blacklisting
Collaborative predictive blacklisting (CPB) allows to forecast future attack
sources based on logs and alerts contributed by multiple organizations.
Unfortunately, however, research on CPB has only focused on increasing the
number of predicted attacks but has not considered the impact on false
positives and false negatives. Moreover, sharing alerts is often hindered by
confidentiality, trust, and liability issues, which motivates the need for
privacy-preserving approaches to the problem. In this paper, we present a
measurement study of state-of-the-art CPB techniques, aiming to shed light on
the actual impact of collaboration. To this end, we reproduce and measure two
systems: a non privacy-friendly one that uses a trusted coordinating party with
access to all alerts (Soldo et al., 2010) and a peer-to-peer one using
privacy-preserving data sharing (Freudiger et al., 2015). We show that, while
collaboration boosts the number of predicted attacks, it also yields high false
positives, ultimately leading to poor accuracy. This motivates us to present a
hybrid approach, using a semi-trusted central entity, aiming to increase
utility from collaboration while, at the same time, limiting information
disclosure and false positives. This leads to a better trade-off of true and
false positive rates, while at the same time addressing privacy concerns.Comment: A preliminary version of this paper appears in ACM SIGCOMM's Computer
Communication Review (Volume 48 Issue 5, October 2018). This is the full
versio
- …