53 research outputs found

    Privacy-preserving decision trees over vertically partitioned data

    Full text link
    Abstract. Privacy and security concerns can prevent sharing of data, derailing data mining projects. Distributed knowledge discovery, if done correctly, can alleviate this problem. In this paper, we tackle the problem of classification. We introduce a generalized privacy preserving variant of the ID3 algorithm for vertically partitioned data distributed over two or more parties. Along with the algorithm, we give a complete proof of security that gives a tight bound on the information revealed.

    A fast secure dot product protocol with application to privacy preserving association rule mining

    Get PDF
    Data mining often causes privacy concerns. To ease the concerns, various privacy preserving data mining techniques have been proposed. However, those techniques are often too computationally intensive to be deployed in practice. Efficiency becomes a major challenge in privacy preserving data mining. In this paper we present an efficient secure dot product protocol and show its application in privacy preserving association rule mining, one of the most widely used data mining techniques. The protocol is orders of magnitude faster than previous protocols because it employs mostly cheap cryptographic operations, e.g. hashing and modular multiplication. The performance has been further improved by parallelization. We implemented the protocol and tested the performance. The test result shows that on moderate commodity hardware, the dot product of two vectors of size 1 million can be computed within 1 minute. As a comparison, the currently most widely used protocol needs about 1 hour and 23 minutes

    Distributed Data Federation without Disclosure of User Existence

    Full text link
    Part 9: Cloud ComputingInternational audienceService providers collect user’s personal information relevant to their businesses. Personal information stored by different service providers is expected to be combined to make new services. However, specific user records risk being identified from the combined personal information, and the user’s sensitive information may be revealed. Also, personal information collected by a service provider must not be disclosed to other service providers because of security issues. Thus, several researchers have been investigating distributed anonymization protocols, which combine the personal information stored by the providers and sanitize it to ensure an anonymity policy with minimum disclosure. However, when providers have different sets of the users, there is a problem that the existence of users in either service provider may be revealed. This paper introduces a new notion, δ-max-site-presence, which indicates the probability of the existence of users being revealed in a distributed environment and a new distributed anonymization protocol for hiding the existence of users. Our evaluation results show that the proposed protocol can anonymize users in accordance with the policy of hiding their existence and user anonymity without too much information loss

    Collaborative, Privacy-Preserving Data Aggregation at Scale

    Get PDF
    Combining and analyzing data collected at multiple locations is critical for a wide variety of applications, such as detecting and diagnosing malicious attacks or computing an accurate estimate of the popularity of Web sites. However, legitimate concerns about privacy often inhibit participation in collaborative data-analysis systems. In this paper, we design, implement, and evaluate a practical solution for privacy-preserving collaboration among a large number of participants. Scalability is achieved through a “semi-centralized ” architecture that divides responsibility between a proxy that obliviously blinds the client inputs and a database that identifies the (blinded) keywords that have values satisfying some evaluation function. Our solution leverages a novel cryptographic protocol that provably protects the privacy of both the participants and the keywords. In the example of Web servers collaborating to detect source IP addresses responsible for denial-of-service attacks, our protocol would not reveal the traffic mix of the Web servers or the identity of the “good ” IP addresses. We implemented a prototype of our design, including an amortized oblivious transfer protocol that substantially improves the efficiency of client-proxy interactions. Our experiments show that the implementation scales linearly with the computing resources, making it easy to improve performance by adding more cores or machines. For collaborative diagnosis of denial-of-service attacks, our system can handle millions of suspect IP addresses per hour when the proxy and the database each run on two quad-core machines

    Secure Two-party Computation is Practical

    Full text link
    Abstract. Secure multi-party computation has been considered by the cryptographic community for a number of years. Until recently it has been a purely theoretical area, with few implementations with which to test various ideas. This has led to a number of optimisations being proposed which are quite restricted in their application. In this paper we describe an implementation of the two-party case, using Yao’s garbled circuits, and present various algorithmic protocol improvements. These optimisations are analysed both theoretically and empirically, using experiments of various adversarial situations. Our experimental data is provided for reasonably large circuits, including one which performs an AES encryption, a problem which we discuss in the context of various possible applications.
    • …
    corecore