20 research outputs found

    A Predictive Approach to Bayesian Nonparametric Survival Analysis

    Get PDF
    Bayesian nonparametric methods are a popular choice for analysing survival data due to their ability to flexibly model the distribution of survival times. These methods typically employ a nonparametric prior on the survival function that is conjugate with respect to right-censored data. Eliciting these priors, particularly in the presence of covariates, can be challenging and inference typically relies on computationally intensive Markov chain Monte Carlo schemes. In this paper, we build on recent work that recasts Bayesian inference as assigning a predictive distribution on the unseen values of a population conditional on the observed samples, thus avoiding the need to specify a complex prior. We describe a copula-based predictive update which admits a scalable sequential importance sampling algorithm to perform inference that properly accounts for right-censoring. We provide theoretical justification through an extension of Doob’s consistency theorem and illustrate the method on a number of simulated and real data sets, including an example with covariates. Our approach enables analysts to perform Bayesian nonparametric inference through only the specification of a predictive distribution

    Optimal strategies for learning multi-ancestry polygenic scores vary across traits

    Get PDF
    Polygenic scores (PGSs) are individual-level measures that aggregate the genome-wide genetic predisposition to a given trait. As PGS have predominantly been developed using European-ancestry samples, trait prediction using such European ancestry-derived PGS is less accurate in non-European ancestry individuals. Although there has been recent progress in combining multiple PGS trained on distinct populations, the problem of how to maximize performance given a multiple-ancestry cohort is largely unexplored. Here, we investigate the effect of sample size and ancestry composition on PGS performance for fifteen traits in UK Biobank. For some traits, PGS estimated using a relatively small African-ancestry training set outperformed, on an African-ancestry test set, PGS estimated using a much larger European-ancestry only training set. We observe similar, but not identical, results when considering other minority-ancestry groups within UK Biobank. Our results emphasise the importance of targeted data collection from underrepresented groups in order to address existing disparities in PGS performance

    Neural Score Matching for High-Dimensional Causal Inference

    Get PDF
    Traditional methods for matching in causal inference are impractical for high-dimensional datasets. They suffer from the curse of dimensionality: exact matching and coarsened exact matching find exponentially fewer matches as the input dimension grows, and propensity score matching may match highly unrelated units together. To overcome this problem, we develop theoretical results which motivate the use of neural networks to obtain non-trivial, multivariate balancing scores of a chosen level of coarseness, in contrast to the classical, scalar propensity score. We leverage these balancing scores to perform matching for high-dimensional causal inference and call this procedure neural score matching. We show that our method is competitive against other matching approaches on semi-synthetic high-dimensional datasets, both in terms of treatment effect estimation and reducing imbalanc

    Bayesian imputation of COVID-19 positive test counts for nowcasting under reporting lag

    Get PDF
    Obtaining up to date information on the number of UK COVID-19 regional infections is hampered by the reporting lag in positive test results for people with COVID-19 symptoms. In the UK, for "Pillar 2" swab tests for those showing symptoms, it can take up to five days for results to be collated. We make use of the stability of the under reporting process over time to motivate a statistical temporal model that infers the final total count given the partial count information as it arrives. We adopt a Bayesian approach that provides for subjective priors on parameters and a hierarchical structure for an underlying latent intensity process for the infection counts. This results in a smoothed time-series representation now-casting the expected number of daily counts of positive tests with uncertainty bands that can be used to aid decision making. Inference is performed using sequential Monte Carlo

    Bayesian imputation of COVID-19 positive test counts for nowcasting under reporting lag

    Get PDF
    Obtaining up to date information on the number of UK COVID-19 regional infections is hampered by the reporting lag in positive test results for people with COVID-19 symptoms. In the UK, for ‘Pillar 2’ swab tests for those showing symptoms, it can take up to five days for results to be collated. We make use of the stability of the under reporting process over time to motivate a statistical temporal model that infers the final total count given the partial count information as it arrives. We adopt a Bayesian approach that provides for subjective priors on parameters and a hierarchical structure for an underlying latent intensity process for the infection counts. This results in a smoothed time-series representation nowcasting the expected number of daily counts of positive tests with uncertainty bands that can be used to aid decision making. Inference is performed using sequential Monte Carlo

    Improving local prevalence estimates of SARS-CoV-2 infections using a causal debiasing framework.

    Get PDF
    Funder: Oxford University | Jesus College, University of OxfordFunder: Joint Biosecurity CentreGlobal and national surveillance of SARS-CoV-2 epidemiology is mostly based on targeted schemes focused on testing individuals with symptoms. These tested groups are often unrepresentative of the wider population and exhibit test positivity rates that are biased upwards compared with the true population prevalence. Such data are routinely used to infer infection prevalence and the effective reproduction number, Rt, which affects public health policy. Here, we describe a causal framework that provides debiased fine-scale spatiotemporal estimates by combining targeted test counts with data from a randomized surveillance study in the United Kingdom called REACT. Our probabilistic model includes a bias parameter that captures the increased probability of an infected individual being tested, relative to a non-infected individual, and transforms observed test counts to debiased estimates of the true underlying local prevalence and Rt. We validated our approach on held-out REACT data over a 7-month period. Furthermore, our local estimates of Rt are indicative of 1-week- and 2-week-ahead changes in SARS-CoV-2-positive case numbers. We also observed increases in estimated local prevalence and Rt that reflect the spread of the Alpha and Delta variants. Our results illustrate how randomized surveys can augment targeted testing to improve statistical accuracy in monitoring the spread of emerging and ongoing infectious disease

    Interoperability of Statistical Models in Pandemic Preparedness: Principles and Reality

    Get PDF
    We present "interoperability" as a guiding framework for statistical modelling to assist policy makers asking multiple questions using diverse datasets in the face of an evolving pandemic response. Interoperability provides an important set of principles for future pandemic preparedness, through the joint design and deployment of adaptable systems of statistical models for disease surveillance using probabilistic reasoning. We illustrate this through case studies for inferring spatial-temporal coronavirus disease 2019 (COVID-19) prevalence and reproduction numbers in England
    corecore