12 research outputs found

    Machine Learning SNP Based Prediction for Precision Medicine

    Get PDF
    In the past decade, precision genomics based medicine has emerged to provide tailored and effective healthcare for patients depending upon their genetic features. Genome Wide Association Studies have also identified population based risk genetic variants for common and complex diseases. In order to meet the full promise of precision medicine, research is attempting to leverage our increasing genomic understanding and further develop personalized medical healthcare through ever more accurate disease risk prediction models. Polygenic risk scoring and machine learning are two primary approaches for disease risk prediction. Despite recent improvements, the results of polygenic risk scoring remain limited due to the approaches that are currently used. By contrast, machine learning algorithms have increased predictive abilities for complex disease risk. This increase in predictive abilities results from the ability of machine learning algorithms to handle multi-dimensional data. Here, we provide an overview of polygenic risk scoring and machine learning in complex disease risk prediction. We highlight recent machine learning application developments and describe how machine learning approaches can lead to improved complex disease prediction, which will help to incorporate genetic features into future personalized healthcare. Finally, we discuss how the future application of machine learning prediction models might help manage complex disease by providing tissue-specific targets for customized, preventive interventions

    DPWeka: Achieving Differential Privacy in WEKA

    Get PDF
    Organizations belonging to the government, commercial, and non-profit industries collect and store large amounts of sensitive data, which include medical, financial, and personal information. They use data mining methods to formulate business strategies that yield high long-term and short-term financial benefits. While analyzing such data, the private information of the individuals present in the data must be protected for moral and legal reasons. Current practices such as redacting sensitive attributes, releasing only the aggregate values, and query auditing do not provide sufficient protection against an adversary armed with auxiliary information. In the presence of additional background information, the privacy protection framework, differential privacy, provides mathematical guarantees against adversarial attacks. Existing platforms for differential privacy employ specific mechanisms for limited applications of data mining. Additionally, widely used data mining tools do not contain differentially private data mining algorithms. As a result, for analyzing sensitive data, the cognizance of differentially private methods is currently limited outside the research community. This thesis examines various mechanisms to realize differential privacy in practice and investigates methods to integrate them with a popular machine learning toolkit, WEKA. We present DPWeka, a package that provides differential privacy capabilities to WEKA, for practical data mining. DPWeka includes a suite of differential privacy preserving algorithms which support a variety of data mining tasks including attribute selection and regression analysis. It has provisions for users to control privacy and model parameters, such as privacy mechanism, privacy budget, and other algorithm specific variables. We evaluate private algorithms on real-world datasets, such as genetic data and census data, to demonstrate the practical applicability of DPWeka

    IST Austria Thesis

    Get PDF
    In this thesis we discuss the exact security of message authentications codes HMAC , NMAC , and PMAC . NMAC is a mode of operation which turns a fixed input-length keyed hash function f into a variable input-length function. A practical single-key variant of NMAC called HMAC is a very popular and widely deployed message authentication code (MAC). PMAC is a block-cipher based mode of operation, which also happens to be the most famous fully parallel MAC. NMAC was introduced by Bellare, Canetti and Krawczyk Crypto’96, who proved it to be a secure pseudorandom function (PRF), and thus also a MAC, under two assumptions. Unfortunately, for many instantiations of HMAC one of them has been found to be wrong. To restore the provable guarantees for NMAC , Bellare [Crypto’06] showed its security without this assumption. PMAC was introduced by Black and Rogaway at Eurocrypt 2002. If instantiated with a pseudorandom permutation over n -bit strings, PMAC constitutes a provably secure variable input-length PRF. For adversaries making q queries, each of length at most ` (in n -bit blocks), and of total length σ ≤ q` , the original paper proves an upper bound on the distinguishing advantage of O ( σ 2 / 2 n ), while the currently best bound is O ( qσ/ 2 n ). In this work we show that this bound is tight by giving an attack with advantage Ω( q 2 `/ 2 n ). In the PMAC construction one initially XORs a mask to every message block, where the mask for the i th block is computed as τ i := γ i · L , where L is a (secret) random value, and γ i is the i -th codeword of the Gray code. Our attack applies more generally to any sequence of γ i ’s which contains a large coset of a subgroup of GF (2 n ). As for NMAC , our first contribution is a simpler and uniform proof: If f is an ε -secure PRF (against q queries) and a δ - non-adaptively secure PRF (against q queries), then NMAC f is an ( ε + `qδ )-secure PRF against q queries of length at most ` blocks each. We also show that this ε + `qδ bound is basically tight by constructing an f for which an attack with advantage `qδ exists. Moreover, we analyze the PRF-security of a modification of NMAC called NI by An and Bellare that avoids the constant rekeying on multi-block messages in NMAC and allows for an information-theoretic analysis. We carry out such an analysis, obtaining a tight `q 2 / 2 c bound for this step, improving over the trivial bound of ` 2 q 2 / 2 c . Finally, we investigate, if the security of PMAC can be further improved by using τ i ’s that are k -wise independent, for k > 1 (the original has k = 1). We observe that the security of PMAC will not increase in general if k = 2, and then prove that the security increases to O ( q 2 / 2 n ), if the k = 4. Due to simple extension attacks, this is the best bound one can hope for, using any distribution on the masks. Whether k = 3 is already sufficient to get this level of security is left as an open problem. Keywords: Message authentication codes, Pseudorandom functions, HMAC, PMAC

    GenoShare: Supporting Privacy-Informed Decisions for Sharing Exact Genomic Data

    Get PDF
    The academic community has proposed many solutions to address the privacy concerns associated with genomic-data sharing. However, practitioners have not adopted these solutions due to their impact on the data utility. To address this problem, we introduce GenoShare, a framework that helps practitioners to make informed decisions about the sharing of exact genomic data by providing means to systematically reason about the risk of disclosing privacy-sensitive attributes (e.g., health status, kinship, physical traits). We instantiate GenoShare with three of the most important genomics-oriented inference attacks, and demonstrate its capability to detect potential leakage of sensitive attributes using real data from the 1000 Genomes Project
    corecore