2,665 research outputs found
Privacy Preserving Multi-Server k-means Computation over Horizontally Partitioned Data
The k-means clustering is one of the most popular clustering algorithms in
data mining. Recently a lot of research has been concentrated on the algorithm
when the dataset is divided into multiple parties or when the dataset is too
large to be handled by the data owner. In the latter case, usually some servers
are hired to perform the task of clustering. The dataset is divided by the data
owner among the servers who together perform the k-means and return the cluster
labels to the owner. The major challenge in this method is to prevent the
servers from gaining substantial information about the actual data of the
owner. Several algorithms have been designed in the past that provide
cryptographic solutions to perform privacy preserving k-means. We provide a new
method to perform k-means over a large set using multiple servers. Our
technique avoids heavy cryptographic computations and instead we use a simple
randomization technique to preserve the privacy of the data. The k-means
computed has exactly the same efficiency and accuracy as the k-means computed
over the original dataset without any randomization. We argue that our
algorithm is secure against honest but curious and passive adversary.Comment: 19 pages, 4 tables. International Conference on Information Systems
Security. Springer, Cham, 201
Peer-to-Peer Secure Multi-Party Numerical Computation Facing Malicious Adversaries
We propose an efficient framework for enabling secure multi-party numerical
computations in a Peer-to-Peer network. This problem arises in a range of
applications such as collaborative filtering, distributed computation of trust
and reputation, monitoring and other tasks, where the computing nodes is
expected to preserve the privacy of their inputs while performing a joint
computation of a certain function. Although there is a rich literature in the
field of distributed systems security concerning secure multi-party
computation, in practice it is hard to deploy those methods in very large scale
Peer-to-Peer networks. In this work, we try to bridge the gap between
theoretical algorithms in the security domain, and a practical Peer-to-Peer
deployment.
We consider two security models. The first is the semi-honest model where
peers correctly follow the protocol, but try to reveal private information. We
provide three possible schemes for secure multi-party numerical computation for
this model and identify a single light-weight scheme which outperforms the
others. Using extensive simulation results over real Internet topologies, we
demonstrate that our scheme is scalable to very large networks, with up to
millions of nodes. The second model we consider is the malicious peers model,
where peers can behave arbitrarily, deliberately trying to affect the results
of the computation as well as compromising the privacy of other peers. For this
model we provide a fourth scheme to defend the execution of the computation
against the malicious peers. The proposed scheme has a higher complexity
relative to the semi-honest model. Overall, we provide the Peer-to-Peer network
designer a set of tools to choose from, based on the desired level of security.Comment: Submitted to Peer-to-Peer Networking and Applications Journal (PPNA)
200
CryptGraph: Privacy Preserving Graph Analytics on Encrypted Graph
Many graph mining and analysis services have been deployed on the cloud,
which can alleviate users from the burden of implementing and maintaining graph
algorithms. However, putting graph analytics on the cloud can invade users'
privacy. To solve this problem, we propose CryptGraph, which runs graph
analytics on encrypted graph to preserve the privacy of both users' graph data
and the analytic results. In CryptGraph, users encrypt their graphs before
uploading them to the cloud. The cloud runs graph analysis on the encrypted
graphs and obtains results which are also in encrypted form that the cloud
cannot decipher. During the process of computing, the encrypted graphs are
never decrypted on the cloud side. The encrypted results are sent back to users
and users perform the decryption to obtain the plaintext results. In this
process, users' graphs and the analytics results are both encrypted and the
cloud knows neither of them. Thereby, users' privacy can be strongly protected.
Meanwhile, with the help of homomorphic encryption, the results analyzed from
the encrypted graphs are guaranteed to be correct. In this paper, we present
how to encrypt a graph using homomorphic encryption and how to query the
structure of an encrypted graph by computing polynomials. To solve the problem
that certain operations are not executable on encrypted graphs, we propose hard
computation outsourcing to seek help from users. Using two graph algorithms as
examples, we show how to apply our methods to perform analytics on encrypted
graphs. Experiments on two datasets demonstrate the correctness and feasibility
of our methods
MPC for MPC: Secure Computation on a Massively Parallel Computing Architecture
Massively Parallel Computation (MPC) is a model of computation widely believed to best capture realistic parallel computing architectures such as large-scale MapReduce and Hadoop clusters. Motivated by the fact that many data analytics tasks performed on these platforms involve sensitive user data, we initiate the theoretical exploration of how to leverage MPC architectures to enable efficient, privacy-preserving computation over massive data. Clearly if a computation task does not lend itself to an efficient implementation on MPC even without security, then we cannot hope to compute it efficiently on MPC with security. We show, on the other hand, that any task that can be efficiently computed on MPC can also be securely computed with comparable efficiency. Specifically, we show the following results:
- any MPC algorithm can be compiled to a communication-oblivious counterpart while asymptotically preserving its round and space complexity, where communication-obliviousness ensures that any network intermediary observing the communication patterns learn no information about the secret inputs;
- assuming the existence of Fully Homomorphic Encryption with a suitable notion of compactness and other standard cryptographic assumptions, any MPC algorithm can be compiled to a secure counterpart that defends against an adversary who controls not only intermediate network routers but additionally up to 1/3 - ? fraction of machines (for an arbitrarily small constant ?) - moreover, this compilation preserves the round complexity tightly, and preserves the space complexity upto a multiplicative security parameter related blowup.
As an initial exploration of this important direction, our work suggests new definitions and proposes novel protocols that blend algorithmic and cryptographic techniques
Confidential Machine Learning on Untrusted Platforms: a Survey
With the ever-growing data and the need for developing powerful machine learning models, data owners increasingly depend on various untrusted platforms (e.g., public clouds, edges, and machine learning service providers) for scalable processing or collaborative learning. Thus, sensitive data and models are in danger of unauthorized access, misuse, and privacy compromises. A relatively new body of research confidentially trains machine learning models on protected data to address these concerns. In this survey, we summarize notable studies in this emerging area of research. With a unified framework, we highlight the critical challenges and innovations in outsourcing machine learning confidentially. We focus on the cryptographic approaches for confidential machine learning (CML), primarily on model training, while also covering other directions such as perturbation-based approaches and CML in the hardware-assisted computing environment. The discussion will take a holistic way to consider a rich context of the related threat models, security assumptions, design principles, and associated trade-offs amongst data utility, cost, and confidentiality
Secure Computation in Privacy Preserving Data Mining
Data mining is a process in which data collected from different sources is analyzed for useful information. Because data mini ng tools provides a base for upcoming trends and reactions by reading through databases for secret patterns, they allow organizations to make proactive, knowledge - driven actions and the problems that were previously too much time - consuming to resolve. Data mining software is one of a number of analytical tools for analyzing data. In the field of data mining the Privacy is most important issue when data is shared. A fruitful direction for future trends of data mining research will be the enhancement of methods that incorporate privacy concerns. Most of the methods use random permutation techniques to mask the data, for preserving the privacy of sensitive data. Randomize response techniques were dev eloped for the purpose of protecting surveys privacy and avoiding g biased answers. The proposed work is to enhance the privacy level in RR technique using four group schemes. First according to the algorithm random attributes a, b, c, d were considered, then the randomization hav e been performed on every dataset accordi ng to the values of theta. Then ID3 and CART algorithm are applied on the randomized data. The result shows that by increasing the group, the privacy level will increase. This work shows that as compared with three group scheme with four groups scheme the accuracy decreases 6% but the privacy increases 65
What Storage Access Privacy is Achievable with Small Overhead?
Oblivious RAM (ORAM) and private information retrieval (PIR) are classic
cryptographic primitives used to hide the access pattern to data whose storage
has been outsourced to an untrusted server. Unfortunately, both primitives
require considerable overhead compared to plaintext access. For large-scale
storage infrastructure with highly frequent access requests, the degradation in
response time and the exorbitant increase in resource costs incurred by either
ORAM or PIR prevent their usage. In an ideal scenario, a privacy-preserving
storage protocols with small overhead would be implemented for these heavily
trafficked storage systems to avoid negatively impacting either performance
and/or costs. In this work, we study the problem of the best $\mathit{storage\
access\ privacy}\mathit{small\ overhead}\mathit{differential\ privacy\ access}\mathit{oblivious\ access}\epsilon = \Omega(\log n)\epsilon = \Theta(\log n)O(1)\epsilon = \Theta(\log n)O(\log\log n)$
overhead. This construction uses a new oblivious, two-choice hashing scheme
that may be of independent interest.Comment: To appear at PODS'1
- …