50 research outputs found

    Scalable Privacy-Preserving Data Sharing Methodology for Genome-Wide Association Studies

    Full text link
    The protection of privacy of individual-level information in genome-wide association study (GWAS) databases has been a major concern of researchers following the publication of "an attack" on GWAS data by Homer et al. (2008) Traditional statistical methods for confidentiality and privacy protection of statistical databases do not scale well to deal with GWAS data, especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach that provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees may come at a serious price in terms of data utility. Building on such notions, Uhler et al. (2013) proposed new methods to release aggregate GWAS data without compromising an individual's privacy. We extend the methods developed in Uhler et al. (2013) for releasing differentially-private χ2\chi^2-statistics by allowing for arbitrary number of cases and controls, and for releasing differentially-private allelic test statistics. We also provide a new interpretation by assuming the controls' data are known, which is a realistic assumption because some GWAS use publicly available data as controls. We assess the performance of the proposed methods through a risk-utility analysis on a real data set consisting of DNA samples collected by the Wellcome Trust Case Control Consortium and compare the methods with the differentially-private release mechanism proposed by Johnson and Shmatikov (2013).Comment: 28 pages, 2 figures, source code available upon reques

    Secure Login of Statistical Data With Two Parties

    Get PDF
    Privacy-containing data publishing shows the problem of releasing sensitive data while the mining of useful information. Among present privacy models, SHA privacy algorithm provides more security and privacy model. In this paper, we address the problem of released private data, where different dataset for the same set of user are held by two parties. Here, we present an algorithm for sensitive private data released on web in the form of statistical data. After this, we propose a SHA algorithm that releases differentially private data in a secure way during the privacy computation. Experimental results on real-scenario suggest that the proposed algorithm can effectively preserve information during mining of private information. DOI: 10.17762/ijritcc2321-8169.150312

    Privacy via the Johnson-Lindenstrauss Transform

    Full text link
    Suppose that party A collects private information about its users, where each user's data is represented as a bit vector. Suppose that party B has a proprietary data mining algorithm that requires estimating the distance between users, such as clustering or nearest neighbors. We ask if it is possible for party A to publish some information about each user so that B can estimate the distance between users without being able to infer any private bit of a user. Our method involves projecting each user's representation into a random, lower-dimensional space via a sparse Johnson-Lindenstrauss transform and then adding Gaussian noise to each entry of the lower-dimensional representation. We show that the method preserves differential privacy---where the more privacy is desired, the larger the variance of the Gaussian noise. Further, we show how to approximate the true distances between users via only the lower-dimensional, perturbed data. Finally, we consider other perturbation methods such as randomized response and draw comparisons to sketch-based methods. While the goal of releasing user-specific data to third parties is more broad than preserving distances, this work shows that distance computations with privacy is an achievable goal.Comment: 24 page

    Differential Private Data Collection and Analysis Based on Randomized Multiple Dummies for Untrusted Mobile Crowdsensing

    Get PDF
    Mobile crowdsensing, which collects environmental information from mobile phone users, is growing in popularity. These data can be used by companies for marketing surveys or decision making. However, collecting sensing data from other users may violate their privacy. Moreover, the data aggregator and/or the participants of crowdsensing may be untrusted entities. Recent studies have proposed randomized response schemes for anonymized data collection. This kind of data collection can analyze the sensing data of users statistically without precise information about other users\u27 sensing results. However, traditional randomized response schemes and their extensions require a large number of samples to achieve proper estimation. In this paper, we propose a new anonymized data-collection scheme that can estimate data distributions more accurately. Using simulations with synthetic and real datasets, we prove that our proposed method can reduce the mean squared error and the JS divergence by more than 85% as compared with other existing studies
    corecore