54 research outputs found
Efficient Privacy Preserving Distributed Clustering Based on Secret Sharing
In this paper, we propose a privacy preserving distributed
clustering protocol for horizontally partitioned data based on a very efficient
homomorphic additive secret sharing scheme. The model we use
for the protocol is novel in the sense that it utilizes two non-colluding
third parties. We provide a brief security analysis of our protocol from
information theoretic point of view, which is a stronger security model.
We show communication and computation complexity analysis of our
protocol along with another protocol previously proposed for the same
problem. We also include experimental results for computation and communication
overhead of these two protocols. Our protocol not only outperforms
the others in execution time and communication overhead on
data holders, but also uses a more efficient model for many data mining
applications
Efficient distributed privacy preserving clustering
With recent growing concerns about data privacy, researchers have focused their attention to developing new algorithms to perform privacy preserving data mining. However, methods proposed until now are either very inefficient to deal with large datasets, or compromise privacy with accuracy of data mining results. Secure multiparty computation helps researchers develop privacy preserving data mining algorithms without having to compromise quality of data mining results with data privacy. Also it provides formal guarantees about privacy. On the other hand, algorithms based on secure multiparty computation often rely on computationally expensive cryptographic operations, thus making them infeasible to use in real world scenarios. In this thesis, we study the problem of privacy preserving distributed clustering and propose an efficient and secure algorithm for this problem based on secret sharing and compare it to the state of the art. Experiments show that our algorithm has a lower communication overhead and a much lower computation overhead than the state of the art
Secret charing vs. encryption-based techniques for privacy preserving data mining
Privacy preserving querying and data publishing has been studied in the context of statistical databases and statistical disclosure control. Recently, large-scale data collection and integration efforts increased privacy concerns which motivated data mining researchers to investigate privacy implications of data mining and how data mining can be performed without violating privacy. In this paper, we first provide an overview of privacy preserving data mining focusing on distributed data sources, then we compare two technologies used in privacy preserving data mining. The first technology is encryption based, and it is used in earlier approaches. The second technology is secret-sharing which is recently being considered as a more efficient approach
Privacy-Preserving and Outsourced Multi-User k-Means Clustering
Many techniques for privacy-preserving data mining (PPDM) have been
investigated over the past decade. Often, the entities involved in the data
mining process are end-users or organizations with limited computing and
storage resources. As a result, such entities may want to refrain from
participating in the PPDM process. To overcome this issue and to take many
other benefits of cloud computing, outsourcing PPDM tasks to the cloud
environment has recently gained special attention. We consider the scenario
where n entities outsource their databases (in encrypted format) to the cloud
and ask the cloud to perform the clustering task on their combined data in a
privacy-preserving manner. We term such a process as privacy-preserving and
outsourced distributed clustering (PPODC). In this paper, we propose a novel
and efficient solution to the PPODC problem based on k-means clustering
algorithm. The main novelty of our solution lies in avoiding the secure
division operations required in computing cluster centers altogether through an
efficient transformation technique. Our solution builds the clusters securely
in an iterative fashion and returns the final cluster centers to all entities
when a pre-determined termination condition holds. The proposed solution
protects data confidentiality of all the participating entities under the
standard semi-honest model. To the best of our knowledge, ours is the first
work to discuss and propose a comprehensive solution to the PPODC problem that
incurs negligible cost on the participating entities. We theoretically estimate
both the computation and communication costs of the proposed protocol and also
demonstrate its practical value through experiments on a real dataset.Comment: 16 pages, 2 figures, 5 table
Privacy Preserving Mining on Vertically Partitioned Database
This system allows multiple data owners to outsource their data in a common could. This paper mainly emphases on privacy preserving mining on vertically partitioned database. It provides an even solution to protect data owner's raw data from the other data owners. To achieve secure outsourcing technique, the system proposes a cloud-aided frequent itemset mining solution. The run time in this system is one order higher than non-privacy preserving mining algorithms
CBTS: Correlation based transformation strategy for privacy preserving data mining
Mining useful knowledge from corpus of data has become an important application in many fields. Data mining algorithms like clustering, classification work on this data and provide crisp information for analysis. As these data are available through various channels into public domain, privacy for the owners of the data is increasing need. Though privacy can be provided by hiding sensitive data, it will affect the data mining algorithms in knowledge extraction, so an effective mechanism is required to provide privacy to the data and at the same time without affecting the data mining algorithms. Privacy concern is a primary hindrance for quality data analysis. Data mining algorithms on the contrary focus on the mathematical nature than on the private nature of the information. Therefore instead of removing or encrypting sensitive data, we propose transformation strategies that retain the statistical, semantic and heuristic nature of the data while masking the sensitive information. The proposed Correlation Based Transformation Strategy (CBTS) combines Correlation Analysis in tandem with data transformation techniques such as Singular Value Decomposition (SVD), Principal Component Analysis (PCA) and Non Negative Matrix Factorization (NNMF) provides the intended level of privacy preservation and enables data analysis. The outcome of CBTS is evaluated on standard datasets against popular data mining techniques with significant success and Information Entropy is also accounted
A Toolbox for privacy preserving distributed data mining
Distributed structure of individual data makes it necessary for data holders to perform collaborative analysis over the collective database for better data mining results. However each site has to ensure the privacy of its individual data, which means no information is revealed about individual values. Privacy preserving distributed data mining is utilized for that purpose. In this study, we try to draw more attention to the topic of privacy preserving data mining by showing a model which is realistic for data mining, and allows for very efficient protocols. We give two protocols which are useful tools in data mining: a protocol for Yaoѫs millionaires problem, and a protocol for numerical distance. Our solution to Yaoѫs millionaires problem is of independent interest since it gives a solution which improves on known protocols with respect to both computation complexity and communication overhead. This protocol can be used for different purposes in privacy preserving data mining algorithms such as comparison and equality test of data records. Our numerical distance protocol is also applicable to variety of algorithms. In this study we applied our numerical distance protocol in a privacy preserving distributed clustering protocol for horizontally partitioned data. We show application of our protocol over different attribute types such as interval-scaled,binary, nominal, ordinal, ratio-scaled, and alphanumeric. We present proof of security of our protocol, and explain communication, and computation complexity analysis indetail
Exploring Cloud Computing Challenges: A Thorough Examination of Issues in the Cloud Environment
Cloud computing has evolved into a critical component of contemporary enterprises, providing various advantages including scalability, adaptability, and cost efficiency. However, Cloud Computing also introduces a number of security vulnerabilities that can be exploited by cybercriminals. This article provides an overview of the most common cloud vulnerabilities and their impact on cyber security threats. Among various issues, DDoS issue is very serious, so we identify the causes and issues that address these issues
- …