51,517 research outputs found
A Clustering-Anonymity Approach for Trajectory Data Publishing Considering both Distance and Direction
Trajectory data contains rich spatio-temporal information of moving objects. Directly publishing it for mining and analysis will result in severe privacy disclosure problems. Most existing clustering-anonymity methods cluster trajectories according to either distance- or direction-based similarities, leading to a high information loss. To bridge this gap, in this paper, we present a clustering-anonymity approach considering both these two types of similarities. As trajectories may not be synchronized, we first design a trajectory synchronization algorithm to synchronize them. Then, two similarity metrics between trajectories are quantitatively defined, followed by a comprehensive one. Furthermore, a clustering-anonymity algorithm for trajectory data publishing with privacy-preserving is proposed. It groups trajectories into clusters according to the comprehensive similarity metric. These clusters are finally anonymized. Experimental results show that our algorithm is effective in preserving privacy with low information loss
Privacy preservation in peer-to-peer gossiping networks in presence of a passive adversary
In the Web 2.0, more and more personal data are released by users (queries, social networks, geo-located data, ...), which create a huge pool of useful information to leverage in the context of search or recommendation for instance. In fully decentralized systems, tapping on the power of this information usually involves a clustering process that relies on an exchange of personal data (such as user proles) to compute the similarity between users. In this internship, we address the problem of computing similarity between users while preserving their privacy and without relying on a central entity, with regards to a passive adversary
Improving Spectral Clustering Using Spectrum-Preserving Node Reduction
Spectral clustering is one of the most popular clustering methods. However,
the high computational cost due to the involved eigen-decomposition procedure
can immediately hinder its applications in large-scale tasks. In this paper we
use spectrum-preserving node reduction to accelerate eigen-decomposition and
generate concise representations of data sets. Specifically, we create a small
number of pseudonodes based on spectral similarity. Then, standard spectral
clustering algorithm is performed on the smaller node set. Finally, each data
point in the original data set is assigned to the cluster as its representative
pseudo-node. The proposed framework run in nearly-linear time. Meanwhile, the
clustering accuracy can be significantly improved by mining concise
representations. The experimental results show dramatically improved clustering
performance when compared with state-of-the-art methods
Privacy-Preserving Federated Deep Clustering based on GAN
Federated clustering (FC) is an essential extension of centralized clustering
designed for the federated setting, wherein the challenge lies in constructing
a global similarity measure without the need to share private data.
Conventional approaches to FC typically adopt extensions of centralized
methods, like K-means and fuzzy c-means. However, these methods are susceptible
to non-independent-and-identically-distributed (non-IID) data among clients,
leading to suboptimal performance, particularly with high-dimensional data. In
this paper, we present a novel approach to address these limitations by
proposing a Privacy-Preserving Federated Deep Clustering based on Generative
Adversarial Networks (GANs). Each client trains a local generative adversarial
network (GAN) locally and uploads the synthetic data to the server. The server
applies a deep clustering network on the synthetic data to establish
cluster centroids, which are then downloaded to the clients for cluster
assignment. Theoretical analysis demonstrates that the GAN-generated samples,
shared among clients, inherently uphold certain privacy guarantees,
safeguarding the confidentiality of individual data. Furthermore, extensive
experimental evaluations showcase the effectiveness and utility of our proposed
method in achieving accurate and privacy-preserving federated clustering
Similarity Learning via Kernel Preserving Embedding
Data similarity is a key concept in many data-driven applications. Many
algorithms are sensitive to similarity measures. To tackle this fundamental
problem, automatically learning of similarity information from data via
self-expression has been developed and successfully applied in various models,
such as low-rank representation, sparse subspace learning, semi-supervised
learning. However, it just tries to reconstruct the original data and some
valuable information, e.g., the manifold structure, is largely ignored. In this
paper, we argue that it is beneficial to preserve the overall relations when we
extract similarity information. Specifically, we propose a novel similarity
learning framework by minimizing the reconstruction error of kernel matrices,
rather than the reconstruction error of original data adopted by existing work.
Taking the clustering task as an example to evaluate our method, we observe
considerable improvements compared to other state-of-the-art methods. More
importantly, our proposed framework is very general and provides a novel and
fundamental building block for many other similarity-based tasks. Besides, our
proposed kernel preserving opens up a large number of possibilities to embed
high-dimensional data into low-dimensional space.Comment: Published in AAAI 201
- …