Search CORE

207 research outputs found

Encrypted statistical machine learning: new privacy preserving methods

Author: Aslett Louis J. M.
Esperança Pedro M.
Holmes Chris C.
Publication venue
Publication date: 27/08/2015
Field of study

We present two new statistical machine learning methods designed to learn on fully homomorphic encrypted (FHE) data. The introduction of FHE schemes following Gentry (2009) opens up the prospect of privacy preserving statistical machine learning analysis and modelling of encrypted data without compromising security constraints. We propose tailored algorithms for applying extremely random forests, involving a new cryptographic stochastic fraction estimator, and na\"{i}ve Bayes, involving a semi-parametric model for the class decision boundary, and show how they can be used to learn and predict from encrypted data. We demonstrate that these techniques perform competitively on a variety of classification data sets and provide detailed information about the computational practicalities of these and other FHE methods.Comment: 39 page

arXiv.org e-Print Archive

CiteSeerX

kalis: a modern implementation of the Li & Stephens model for local ancestry inference in R

Author: Aslett Louis J. M.
Christ Ryan R.
Publication venue: BioMed Central
Publication date: 28/02/2024
Field of study

Background: Approximating the recent phylogeny of N phased haplotypes at a set of variants along the genome is a core problem in modern population genomics and central to performing genome-wide screens for association, selection, introgression, and other signals. The Li & Stephens (LS) model provides a simple yet powerful hidden Markov model for inferring the recent ancestry at a given variant, represented as an N×N distance matrix based on posterior decodings. Results: We provide a high-performance engine to make these posterior decodings readily accessible with minimal pre-processing via an easy to use package kalis, in the statistical programming language R. kalis enables investigators to rapidly resolve the ancestry at loci of interest and developers to build a range of variant-specific ancestral inference pipelines on top. kalis exploits both multi-core parallelism and modern CPU vector instruction sets to enable scaling to hundreds of thousands of genomes. Conclusions: The resulting distance matrices accessible via kalis enable local ancestry, selection, and association studies in modern large scale genomic datasets

Durham Research Online

Encrypted accelerated least squares regression.

Author: Esperança P. M.
Aslett L. J. M.
Holmes C. C.
Publication venue: PMLR
Publication date: 01/01/2017
Field of study

Information that is stored in an encrypted format is, by definition, usually not amenable to statistical analysis or machine learning methods. In this paper we present detailed analysis of coordinate and accelerated gradient descent algorithms which are capable of fitting least squares and penalised ridge regression models, using data encrypted under a fully homomorphic encryption scheme. Gradient descent is shown to dominate in terms of encrypted computational speed, and theoretical results are proven to give parameter bounds which ensure correctness of decryption. The characteristics of encrypted computation are empirically shown to favour a non-standard acceleration technique. This demonstrates the possibility of approximating conventional statistical regression methods using encrypted data without compromising privacy

Durham Research Online

Biblioteca Digital de la Comunidad de Madrid

Parasitic helminth genomics

Author: Aslett M.
Blaxter M.
Daub J.
Guiliano D.
Publication venue
Publication date: 01/01/1999
Field of study

Edinburgh Research Explorer

Survival signature-based sensitivity analysis of systems with epistemic uncertainties

Author: Aslett L.J.
Beer M.
Coolen F.P.
Coolen F.P.
Der Kiureghian A.
Ferson S.
Ferson S.
Ferson S.
Publication venue: CRC Press/Balkema
Publication date: 15/09/2015
Field of study

The survival signature provides a basis for efficient reliability assessment of systems with more than one component type. Often a perfect probabilistic modelling of the system is not possible due to limited information, vagueness and imprecision. Hence generalized probabilistic methods need to be used. These methods allow to explicitly model the uncertainties without the need of unjustified hypotheses and approximation. In this paper, a novel and efficient sensitivity approach is presented. The proposed approach is based on survival signature, allowing to identify and rank components in a system. A numerical example is used to illustrate the above methods

Crossref

University of Strathclyde Institutional Repository

Model updating after interventions paradoxically introduces bias

Author: Aslett Louis J M
Emerson Samuel R
Liley James
Mateen Bilal A.
Vallejos Catalina A
Vollmer Sebastian J.
Publication venue
Publication date: 22/02/2021
Field of study

Machine learning is increasingly being used to generate prediction models for use in a number of real-world settings, from credit risk assessment to clinical decision support. Recent discussions have highlighted potential problems in the updating of a predictive score for a binary outcome when an existing predictive score forms part of the standard workflow, driving interventions. In this setting, the existing score induces an additional causative pathway which leads to miscalibration when the original score is replaced. We propose a general causal framework to describe and address this problem, and demonstrate an equivalent formulation as a partially observed Markov decision process. We use this model to demonstrate the impact of such `naive updating' when performed repeatedly. Namely, we show that successive predictive scores may converge to a point where they predict their own effect, or may eventually tend toward a stable oscillation between two values, and we argue that neither outcome is desirable. Furthermore, we demonstrate that even if model-fitting procedures improve, actual performance may worsen. We complement these findings with a discussion of several potential routes to overcome these issues.Comment: Sections of this preprint on 'Successive adjuvancy' (section 4, theorem 2, figures 4,5, and associated discussions) were not included in the originally submitted version of this paper due to length. This material does not appear in the published version of this manuscript, and the reader should be aware that these sections did not undergo peer revie

arXiv.org e-Print Archive

Edinburgh Research Explorer