2,665 research outputs found

    Privacy Preserving Multi-Server k-means Computation over Horizontally Partitioned Data

    Full text link
    The k-means clustering is one of the most popular clustering algorithms in data mining. Recently a lot of research has been concentrated on the algorithm when the dataset is divided into multiple parties or when the dataset is too large to be handled by the data owner. In the latter case, usually some servers are hired to perform the task of clustering. The dataset is divided by the data owner among the servers who together perform the k-means and return the cluster labels to the owner. The major challenge in this method is to prevent the servers from gaining substantial information about the actual data of the owner. Several algorithms have been designed in the past that provide cryptographic solutions to perform privacy preserving k-means. We provide a new method to perform k-means over a large set using multiple servers. Our technique avoids heavy cryptographic computations and instead we use a simple randomization technique to preserve the privacy of the data. The k-means computed has exactly the same efficiency and accuracy as the k-means computed over the original dataset without any randomization. We argue that our algorithm is secure against honest but curious and passive adversary.Comment: 19 pages, 4 tables. International Conference on Information Systems Security. Springer, Cham, 201

    Peer-to-Peer Secure Multi-Party Numerical Computation Facing Malicious Adversaries

    Full text link
    We propose an efficient framework for enabling secure multi-party numerical computations in a Peer-to-Peer network. This problem arises in a range of applications such as collaborative filtering, distributed computation of trust and reputation, monitoring and other tasks, where the computing nodes is expected to preserve the privacy of their inputs while performing a joint computation of a certain function. Although there is a rich literature in the field of distributed systems security concerning secure multi-party computation, in practice it is hard to deploy those methods in very large scale Peer-to-Peer networks. In this work, we try to bridge the gap between theoretical algorithms in the security domain, and a practical Peer-to-Peer deployment. We consider two security models. The first is the semi-honest model where peers correctly follow the protocol, but try to reveal private information. We provide three possible schemes for secure multi-party numerical computation for this model and identify a single light-weight scheme which outperforms the others. Using extensive simulation results over real Internet topologies, we demonstrate that our scheme is scalable to very large networks, with up to millions of nodes. The second model we consider is the malicious peers model, where peers can behave arbitrarily, deliberately trying to affect the results of the computation as well as compromising the privacy of other peers. For this model we provide a fourth scheme to defend the execution of the computation against the malicious peers. The proposed scheme has a higher complexity relative to the semi-honest model. Overall, we provide the Peer-to-Peer network designer a set of tools to choose from, based on the desired level of security.Comment: Submitted to Peer-to-Peer Networking and Applications Journal (PPNA) 200

    CryptGraph: Privacy Preserving Graph Analytics on Encrypted Graph

    Full text link
    Many graph mining and analysis services have been deployed on the cloud, which can alleviate users from the burden of implementing and maintaining graph algorithms. However, putting graph analytics on the cloud can invade users' privacy. To solve this problem, we propose CryptGraph, which runs graph analytics on encrypted graph to preserve the privacy of both users' graph data and the analytic results. In CryptGraph, users encrypt their graphs before uploading them to the cloud. The cloud runs graph analysis on the encrypted graphs and obtains results which are also in encrypted form that the cloud cannot decipher. During the process of computing, the encrypted graphs are never decrypted on the cloud side. The encrypted results are sent back to users and users perform the decryption to obtain the plaintext results. In this process, users' graphs and the analytics results are both encrypted and the cloud knows neither of them. Thereby, users' privacy can be strongly protected. Meanwhile, with the help of homomorphic encryption, the results analyzed from the encrypted graphs are guaranteed to be correct. In this paper, we present how to encrypt a graph using homomorphic encryption and how to query the structure of an encrypted graph by computing polynomials. To solve the problem that certain operations are not executable on encrypted graphs, we propose hard computation outsourcing to seek help from users. Using two graph algorithms as examples, we show how to apply our methods to perform analytics on encrypted graphs. Experiments on two datasets demonstrate the correctness and feasibility of our methods

    MPC for MPC: Secure Computation on a Massively Parallel Computing Architecture

    Get PDF
    Massively Parallel Computation (MPC) is a model of computation widely believed to best capture realistic parallel computing architectures such as large-scale MapReduce and Hadoop clusters. Motivated by the fact that many data analytics tasks performed on these platforms involve sensitive user data, we initiate the theoretical exploration of how to leverage MPC architectures to enable efficient, privacy-preserving computation over massive data. Clearly if a computation task does not lend itself to an efficient implementation on MPC even without security, then we cannot hope to compute it efficiently on MPC with security. We show, on the other hand, that any task that can be efficiently computed on MPC can also be securely computed with comparable efficiency. Specifically, we show the following results: - any MPC algorithm can be compiled to a communication-oblivious counterpart while asymptotically preserving its round and space complexity, where communication-obliviousness ensures that any network intermediary observing the communication patterns learn no information about the secret inputs; - assuming the existence of Fully Homomorphic Encryption with a suitable notion of compactness and other standard cryptographic assumptions, any MPC algorithm can be compiled to a secure counterpart that defends against an adversary who controls not only intermediate network routers but additionally up to 1/3 - ? fraction of machines (for an arbitrarily small constant ?) - moreover, this compilation preserves the round complexity tightly, and preserves the space complexity upto a multiplicative security parameter related blowup. As an initial exploration of this important direction, our work suggests new definitions and proposes novel protocols that blend algorithmic and cryptographic techniques

    Confidential Machine Learning on Untrusted Platforms: a Survey

    Get PDF
    With the ever-growing data and the need for developing powerful machine learning models, data owners increasingly depend on various untrusted platforms (e.g., public clouds, edges, and machine learning service providers) for scalable processing or collaborative learning. Thus, sensitive data and models are in danger of unauthorized access, misuse, and privacy compromises. A relatively new body of research confidentially trains machine learning models on protected data to address these concerns. In this survey, we summarize notable studies in this emerging area of research. With a unified framework, we highlight the critical challenges and innovations in outsourcing machine learning confidentially. We focus on the cryptographic approaches for confidential machine learning (CML), primarily on model training, while also covering other directions such as perturbation-based approaches and CML in the hardware-assisted computing environment. The discussion will take a holistic way to consider a rich context of the related threat models, security assumptions, design principles, and associated trade-offs amongst data utility, cost, and confidentiality

    Secure Computation in Privacy Preserving Data Mining

    Get PDF
    Data mining is a process in which data collected from different sources is analyzed for useful information. Because data mini ng tools provides a base for upcoming trends and reactions by reading through databases for secret patterns, they allow organizations to make proactive, knowledge - driven actions and the problems that were previously too much time - consuming to resolve. Data mining software is one of a number of analytical tools for analyzing data. In the field of data mining the Privacy is most important issue when data is shared. A fruitful direction for future trends of data mining research will be the enhancement of methods that incorporate privacy concerns. Most of the methods use random permutation techniques to mask the data, for preserving the privacy of sensitive data. Randomize response techniques were dev eloped for the purpose of protecting surveys privacy and avoiding g biased answers. The proposed work is to enhance the privacy level in RR technique using four group schemes. First according to the algorithm random attributes a, b, c, d were considered, then the randomization hav e been performed on every dataset accordi ng to the values of theta. Then ID3 and CART algorithm are applied on the randomized data. The result shows that by increasing the group, the privacy level will increase. This work shows that as compared with three group scheme with four groups scheme the accuracy decreases 6% but the privacy increases 65

    What Storage Access Privacy is Achievable with Small Overhead?

    Get PDF
    Oblivious RAM (ORAM) and private information retrieval (PIR) are classic cryptographic primitives used to hide the access pattern to data whose storage has been outsourced to an untrusted server. Unfortunately, both primitives require considerable overhead compared to plaintext access. For large-scale storage infrastructure with highly frequent access requests, the degradation in response time and the exorbitant increase in resource costs incurred by either ORAM or PIR prevent their usage. In an ideal scenario, a privacy-preserving storage protocols with small overhead would be implemented for these heavily trafficked storage systems to avoid negatively impacting either performance and/or costs. In this work, we study the problem of the best $\mathit{storage\ access\ privacy}thatisachievablewithonly that is achievable with only \mathit{small\ overhead}overplaintextaccess.Toanswerthisquestion,weconsider over plaintext access. To answer this question, we consider \mathit{differential\ privacy\ access}whichisageneralizationofthe which is a generalization of the \mathit{oblivious\ access}securitynotionthatareconsideredbyORAMandPIR.Quitesurprisingly,wepresentstrongevidencethatconstantoverheadstorageschemesmayonlybeachievedwithprivacybudgetsof security notion that are considered by ORAM and PIR. Quite surprisingly, we present strong evidence that constant overhead storage schemes may only be achieved with privacy budgets of \epsilon = \Omega(\log n).WepresentasymptoticallyoptimalconstructionsfordifferentiallyprivatevariantsofbothORAMandPIRwithprivacybudgets. We present asymptotically optimal constructions for differentially private variants of both ORAM and PIR with privacy budgets \epsilon = \Theta(\log n)withonly with only O(1)overhead.Inaddition,weconsideramorecomplexstorageprimitivecalledkey−valuestorageinwhichdataisindexedbykeysfromalargeuniverse(asopposedtoconsecutiveintegersinORAMandPIR).Wepresentadifferentiallyprivatekey−valuestorageschemewith overhead. In addition, we consider a more complex storage primitive called key-value storage in which data is indexed by keys from a large universe (as opposed to consecutive integers in ORAM and PIR). We present a differentially private key-value storage scheme with \epsilon = \Theta(\log n)and and O(\log\log n)$ overhead. This construction uses a new oblivious, two-choice hashing scheme that may be of independent interest.Comment: To appear at PODS'1
    • …
    corecore