207 research outputs found

    Encrypted statistical machine learning: new privacy preserving methods

    Full text link
    We present two new statistical machine learning methods designed to learn on fully homomorphic encrypted (FHE) data. The introduction of FHE schemes following Gentry (2009) opens up the prospect of privacy preserving statistical machine learning analysis and modelling of encrypted data without compromising security constraints. We propose tailored algorithms for applying extremely random forests, involving a new cryptographic stochastic fraction estimator, and na\"{i}ve Bayes, involving a semi-parametric model for the class decision boundary, and show how they can be used to learn and predict from encrypted data. We demonstrate that these techniques perform competitively on a variety of classification data sets and provide detailed information about the computational practicalities of these and other FHE methods.Comment: 39 page

    kalis: a modern implementation of the Li & Stephens model for local ancestry inference in R

    Get PDF
    Background: Approximating the recent phylogeny of N phased haplotypes at a set of variants along the genome is a core problem in modern population genomics and central to performing genome-wide screens for association, selection, introgression, and other signals. The Li & Stephens (LS) model provides a simple yet powerful hidden Markov model for inferring the recent ancestry at a given variant, represented as an N×N distance matrix based on posterior decodings. Results: We provide a high-performance engine to make these posterior decodings readily accessible with minimal pre-processing via an easy to use package kalis, in the statistical programming language R. kalis enables investigators to rapidly resolve the ancestry at loci of interest and developers to build a range of variant-specific ancestral inference pipelines on top. kalis exploits both multi-core parallelism and modern CPU vector instruction sets to enable scaling to hundreds of thousands of genomes. Conclusions: The resulting distance matrices accessible via kalis enable local ancestry, selection, and association studies in modern large scale genomic datasets

    Encrypted accelerated least squares regression.

    Get PDF
    Information that is stored in an encrypted format is, by definition, usually not amenable to statistical analysis or machine learning methods. In this paper we present detailed analysis of coordinate and accelerated gradient descent algorithms which are capable of fitting least squares and penalised ridge regression models, using data encrypted under a fully homomorphic encryption scheme. Gradient descent is shown to dominate in terms of encrypted computational speed, and theoretical results are proven to give parameter bounds which ensure correctness of decryption. The characteristics of encrypted computation are empirically shown to favour a non-standard acceleration technique. This demonstrates the possibility of approximating conventional statistical regression methods using encrypted data without compromising privacy

    Survival signature-based sensitivity analysis of systems with epistemic uncertainties

    Get PDF
    The survival signature provides a basis for efficient reliability assessment of systems with more than one component type. Often a perfect probabilistic modelling of the system is not possible due to limited information, vagueness and imprecision. Hence generalized probabilistic methods need to be used. These methods allow to explicitly model the uncertainties without the need of unjustified hypotheses and approximation. In this paper, a novel and efficient sensitivity approach is presented. The proposed approach is based on survival signature, allowing to identify and rank components in a system. A numerical example is used to illustrate the above methods

    Model updating after interventions paradoxically introduces bias

    Get PDF
    Machine learning is increasingly being used to generate prediction models for use in a number of real-world settings, from credit risk assessment to clinical decision support. Recent discussions have highlighted potential problems in the updating of a predictive score for a binary outcome when an existing predictive score forms part of the standard workflow, driving interventions. In this setting, the existing score induces an additional causative pathway which leads to miscalibration when the original score is replaced. We propose a general causal framework to describe and address this problem, and demonstrate an equivalent formulation as a partially observed Markov decision process. We use this model to demonstrate the impact of such `naive updating' when performed repeatedly. Namely, we show that successive predictive scores may converge to a point where they predict their own effect, or may eventually tend toward a stable oscillation between two values, and we argue that neither outcome is desirable. Furthermore, we demonstrate that even if model-fitting procedures improve, actual performance may worsen. We complement these findings with a discussion of several potential routes to overcome these issues.Comment: Sections of this preprint on 'Successive adjuvancy' (section 4, theorem 2, figures 4,5, and associated discussions) were not included in the originally submitted version of this paper due to length. This material does not appear in the published version of this manuscript, and the reader should be aware that these sections did not undergo peer revie
    • …
    corecore