21,834 research outputs found
Revisiting "privacy preserving clustering by data transformation".
Preserving the privacy of individuals when data are shared for clustering is a complex problem. The challenge is how to protect the underlying data values subjected to clustering without jeopardizing the similarity between objects under analysis. In this short paper, we revisit a family of geometric data transformation methods (GDTMs) that distort numerical attributes by translations, scalings, rotations, or even by the combination of these geometric transformations. Such a method was designed to address privacy-preserving clustering, in scenarios where data owners must not only meet privacy requirements but also guarantee valid clustering results. We offer a detailed, comprehensive and up-to-date picture of methods for privacy-preserving clustering by data transformation
Privacy Preserving Optics Clustering
OPTICS is a well-known density-based clustering algorithm which uses DBSCAN theme without producing a clustering of a data set openly, but as a substitute, it creates an augmented ordering of that particular database which represents its density-based clustering structure. This resulted cluster-ordering comprises information which is similar to the density based clustering’s conforming to a wide range of parameter settings. The same algorithm can be applied in the field of privacy-preserving data mining, where extracting the useful information from data which is distributed over a network requires preservation of privacy of individuals’ information. The problem of getting the clusters of a distributed database is considered as an example of this algorithm, where two parties want to know their cluster numbers on combined database without revealing one party information to other party. This issue can be seen as a particular example of secure multi-party computation and such sort of issues can be solved with the assistance of proposed protocols in our work along with some standard protocols
Hybrid Cloud-Based Privacy Preserving Clustering as Service for Enterprise Big Data
Clustering as service is being offered by many cloud service providers. It helps enterprises to learn hidden patterns and learn knowledge from large, big data generated by enterprises. Though it brings lot of value to enterprises, it also exposes the data to various security and privacy threats. Privacy preserving clustering is being proposed a solution to address this problem. But the privacy preserving clustering as outsourced service model involves too much overhead on querying user, lacks adaptivity to incremental data and involves frequent interaction between service provider and the querying user. There is also a lack of personalization to clustering by the querying user. This work “Locality Sensitive Hashing for Transformed Dataset (LSHTD)” proposes a hybrid cloud-based clustering as service model for streaming data that address the problems in the existing model such as privacy preserving k-means clustering outsourcing under multiple keys (PPCOM) and secure nearest neighbor clustering (SNNC) models, The solution combines hybrid cloud, LSHTD clustering algorithm as outsourced service model. Through experiments, the proposed solution is able is found to reduce the computation cost by 23% and communication cost by 6% and able to provide better clustering accuracy with ARI greater than 4.59% compared to existing works
Privacy Preserving Clustering with Constraints
The k-center problem is a classical combinatorial optimization problem which asks to find k centers such that the maximum distance of any input point in a set P to its assigned center is minimized. The problem allows for elegant 2-approximations. However, the situation becomes significantly more difficult when constraints are added to the problem. We raise the question whether general methods can be derived to turn an approximation algorithm for a clustering problem with some constraints into an approximation algorithm that respects one constraint more. Our constraint of choice is privacy: Here, we are asked to only open a center when at least l clients will be assigned to it. We show how to combine privacy with several other constraints
Privacy-Preserving and Outsourced Multi-User k-Means Clustering
Many techniques for privacy-preserving data mining (PPDM) have been
investigated over the past decade. Often, the entities involved in the data
mining process are end-users or organizations with limited computing and
storage resources. As a result, such entities may want to refrain from
participating in the PPDM process. To overcome this issue and to take many
other benefits of cloud computing, outsourcing PPDM tasks to the cloud
environment has recently gained special attention. We consider the scenario
where n entities outsource their databases (in encrypted format) to the cloud
and ask the cloud to perform the clustering task on their combined data in a
privacy-preserving manner. We term such a process as privacy-preserving and
outsourced distributed clustering (PPODC). In this paper, we propose a novel
and efficient solution to the PPODC problem based on k-means clustering
algorithm. The main novelty of our solution lies in avoiding the secure
division operations required in computing cluster centers altogether through an
efficient transformation technique. Our solution builds the clusters securely
in an iterative fashion and returns the final cluster centers to all entities
when a pre-determined termination condition holds. The proposed solution
protects data confidentiality of all the participating entities under the
standard semi-honest model. To the best of our knowledge, ours is the first
work to discuss and propose a comprehensive solution to the PPODC problem that
incurs negligible cost on the participating entities. We theoretically estimate
both the computation and communication costs of the proposed protocol and also
demonstrate its practical value through experiments on a real dataset.Comment: 16 pages, 2 figures, 5 table
Efficient Privacy Preserving Distributed Clustering Based on Secret Sharing
In this paper, we propose a privacy preserving distributed
clustering protocol for horizontally partitioned data based on a very efficient
homomorphic additive secret sharing scheme. The model we use
for the protocol is novel in the sense that it utilizes two non-colluding
third parties. We provide a brief security analysis of our protocol from
information theoretic point of view, which is a stronger security model.
We show communication and computation complexity analysis of our
protocol along with another protocol previously proposed for the same
problem. We also include experimental results for computation and communication
overhead of these two protocols. Our protocol not only outperforms
the others in execution time and communication overhead on
data holders, but also uses a more efficient model for many data mining
applications
- …