51,517 research outputs found

    A Clustering-Anonymity Approach for Trajectory Data Publishing Considering both Distance and Direction

    Get PDF
    Trajectory data contains rich spatio-temporal information of moving objects. Directly publishing it for mining and analysis will result in severe privacy disclosure problems. Most existing clustering-anonymity methods cluster trajectories according to either distance- or direction-based similarities, leading to a high information loss. To bridge this gap, in this paper, we present a clustering-anonymity approach considering both these two types of similarities. As trajectories may not be synchronized, we first design a trajectory synchronization algorithm to synchronize them. Then, two similarity metrics between trajectories are quantitatively defined, followed by a comprehensive one. Furthermore, a clustering-anonymity algorithm for trajectory data publishing with privacy-preserving is proposed. It groups trajectories into clusters according to the comprehensive similarity metric. These clusters are finally anonymized. Experimental results show that our algorithm is effective in preserving privacy with low information loss

    Privacy preservation in peer-to-peer gossiping networks in presence of a passive adversary

    No full text
    In the Web 2.0, more and more personal data are released by users (queries, social networks, geo-located data, ...), which create a huge pool of useful information to leverage in the context of search or recommendation for instance. In fully decentralized systems, tapping on the power of this information usually involves a clustering process that relies on an exchange of personal data (such as user proles) to compute the similarity between users. In this internship, we address the problem of computing similarity between users while preserving their privacy and without relying on a central entity, with regards to a passive adversary

    Improving Spectral Clustering Using Spectrum-Preserving Node Reduction

    Full text link
    Spectral clustering is one of the most popular clustering methods. However, the high computational cost due to the involved eigen-decomposition procedure can immediately hinder its applications in large-scale tasks. In this paper we use spectrum-preserving node reduction to accelerate eigen-decomposition and generate concise representations of data sets. Specifically, we create a small number of pseudonodes based on spectral similarity. Then, standard spectral clustering algorithm is performed on the smaller node set. Finally, each data point in the original data set is assigned to the cluster as its representative pseudo-node. The proposed framework run in nearly-linear time. Meanwhile, the clustering accuracy can be significantly improved by mining concise representations. The experimental results show dramatically improved clustering performance when compared with state-of-the-art methods

    Privacy-Preserving Federated Deep Clustering based on GAN

    Full text link
    Federated clustering (FC) is an essential extension of centralized clustering designed for the federated setting, wherein the challenge lies in constructing a global similarity measure without the need to share private data. Conventional approaches to FC typically adopt extensions of centralized methods, like K-means and fuzzy c-means. However, these methods are susceptible to non-independent-and-identically-distributed (non-IID) data among clients, leading to suboptimal performance, particularly with high-dimensional data. In this paper, we present a novel approach to address these limitations by proposing a Privacy-Preserving Federated Deep Clustering based on Generative Adversarial Networks (GANs). Each client trains a local generative adversarial network (GAN) locally and uploads the synthetic data to the server. The server applies a deep clustering network on the synthetic data to establish kk cluster centroids, which are then downloaded to the clients for cluster assignment. Theoretical analysis demonstrates that the GAN-generated samples, shared among clients, inherently uphold certain privacy guarantees, safeguarding the confidentiality of individual data. Furthermore, extensive experimental evaluations showcase the effectiveness and utility of our proposed method in achieving accurate and privacy-preserving federated clustering

    Similarity Learning via Kernel Preserving Embedding

    Full text link
    Data similarity is a key concept in many data-driven applications. Many algorithms are sensitive to similarity measures. To tackle this fundamental problem, automatically learning of similarity information from data via self-expression has been developed and successfully applied in various models, such as low-rank representation, sparse subspace learning, semi-supervised learning. However, it just tries to reconstruct the original data and some valuable information, e.g., the manifold structure, is largely ignored. In this paper, we argue that it is beneficial to preserve the overall relations when we extract similarity information. Specifically, we propose a novel similarity learning framework by minimizing the reconstruction error of kernel matrices, rather than the reconstruction error of original data adopted by existing work. Taking the clustering task as an example to evaluate our method, we observe considerable improvements compared to other state-of-the-art methods. More importantly, our proposed framework is very general and provides a novel and fundamental building block for many other similarity-based tasks. Besides, our proposed kernel preserving opens up a large number of possibilities to embed high-dimensional data into low-dimensional space.Comment: Published in AAAI 201
    corecore