Search CORE

21 research outputs found

A Predictive Approach to Bayesian Nonparametric Survival Analysis

Author: Fong Edwin
Lehmann Brieuc
Publication venue: PMLR 151
Publication date: 01/01/2022
Field of study

Bayesian nonparametric methods are a popular choice for analysing survival data due to their ability to flexibly model the distribution of survival times. These methods typically employ a nonparametric prior on the survival function that is conjugate with respect to right-censored data. Eliciting these priors, particularly in the presence of covariates, can be challenging and inference typically relies on computationally intensive Markov chain Monte Carlo schemes. In this paper, we build on recent work that recasts Bayesian inference as assigning a predictive distribution on the unseen values of a population conditional on the observed samples, thus avoiding the need to specify a complex prior. We describe a copula-based predictive update which admits a scalable sequential importance sampling algorithm to perform inference that properly accounts for right-censoring. We provide theoretical justification through an extension of Doob’s consistency theorem and illustrate the method on a number of simulated and real data sets, including an example with covariates. Our approach enables analysts to perform Bayesian nonparametric inference through only the specification of a predictive distribution

arXiv.org e-Print Archive

UCL Discovery

Optimal strategies for learning multi-ancestry polygenic scores vary across traits

Author: Holmes Chris
Lehmann Brieuc
Mackintosh Maxine
McVean Gil
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/07/2023
Field of study

Polygenic scores (PGSs) are individual-level measures that aggregate the genome-wide genetic predisposition to a given trait. As PGS have predominantly been developed using European-ancestry samples, trait prediction using such European ancestry-derived PGS is less accurate in non-European ancestry individuals. Although there has been recent progress in combining multiple PGS trained on distinct populations, the problem of how to maximize performance given a multiple-ancestry cohort is largely unexplored. Here, we investigate the effect of sample size and ancestry composition on PGS performance for fifteen traits in UK Biobank. For some traits, PGS estimated using a relatively small African-ancestry training set outperformed, on an African-ancestry test set, PGS estimated using a much larger European-ancestry only training set. We observe similar, but not identical, results when considering other minority-ancestry groups within UK Biobank. Our results emphasise the importance of targeted data collection from underrepresented groups in order to address existing disparities in PGS performance

UCL Discovery

Recommended from our members

Inferring differences between networks using Bayesian exponential random graph models

Author: Lehmann Brieuc Charles Louis
Publication venue: University of Cambridge
Publication date: 15/10/2019
Field of study

The goal of many neuroimaging studies is to better understand how the functional connectivity structure of the brain changes with a given phenotype such as age. Functional connectivity can be characterised as a network, with nodes corresponding to brain regions and edges corresponding to statistical dependencies between the respective regional time series of activity. A typical neuroimaging dataset will thus consist of one or more networks for each individual in the study. Most statistical network models, however, were originally proposed to describe a single underlying relational structure such as friendships between individuals or hyperlinks between web pages. As a result, the development of these models has largely been restricted to the single network case. While one could in principle fit a single network model to each individual separately, it is not always straightforward to combine these individual results into a single group result. In the first half of the thesis, we propose a multilevel framework for populations of networks based on exponential random graph models. By pooling information across the individual networks, this framework provides a principled approach to characterise the relational structure for an entire population. We use the framework to assess group-level variations in functional connectivity, providing a method for the inference of differences in the topological structure between groups of networks. Our motivation stems from the Cam-CAN project, a neuroimaging study on healthy ageing. Using this dataset, we illustrate how our method can be used to detect differences in functional connectivity between a group of young individuals and a group of old individuals. In the second half of the thesis, we shift our focus to dynamic functional connectivity (dFC). Recent studies have found that using static measures may average over informative fluctuations in functional connectivity. Several methods have been developed to measure dFC in functional magnetic resonance imaging (fMRI) data. However, spurious group differences in measured dFC may be caused by other sources of heterogeneity between people. We use a generic simulation framework for fMRI data to investigate the effect of such heterogeneity on estimates of dFC and find that, despite no differences in true dFC, individual differences in measured dFC can result from other (non-dynamic) features of the data. We then add a natural and novel extension to our multilevel framework by inserting time windows as an intermediate level between time points and subjects. Using magnetoencephalography data from the Cam-CAN study, we apply our method to detect differences in time-varying connectivity between a young group and an old group

Apollo (Cambridge)

Neural Score Matching for High-Dimensional Causal Inference

Author: Clivio Oscar
Deligiannidis George
Falck Fabian
Holmes Chris
Lehmann Brieuc
Publication venue: PMLR
Publication date: 01/01/2022
Field of study

Traditional methods for matching in causal inference are impractical for high-dimensional datasets. They suffer from the curse of dimensionality: exact matching and coarsened exact matching find exponentially fewer matches as the input dimension grows, and propensity score matching may match highly unrelated units together. To overcome this problem, we develop theoretical results which motivate the use of neural networks to obtain non-trivial, multivariate balancing scores of a chosen level of coarseness, in contrast to the classical, scalar propensity score. We leverage these balancing scores to perform matching for high-dimensional causal inference and call this procedure neural score matching. We show that our method is competitive against other matching approaches on semi-synthetic high-dimensional datasets, both in terms of treatment effect estimation and reducing imbalanc

arXiv.org e-Print Archive

UCL Discovery

Genetic relatedness through the lens of tree sequences

Author: Gorjanc Gregor
Kelleher Jerome
Lehmann Brieuc
Ralph Peter L.
Tsambos Georgia
Publication venue
Publication date: 28/03/2022
Field of study

Edinburgh Research Explorer

Bayesian imputation of COVID-19 positive test counts for nowcasting under reporting lag

Author: Briers Mark
Hetherington James
Holmes Chris
Jersakova Radka
Lehmann Brieuc
Lomax James
Nicholson George
Publication venue
Publication date: 23/03/2021
Field of study

Obtaining up to date information on the number of UK COVID-19 regional infections is hampered by the reporting lag in positive test results for people with COVID-19 symptoms. In the UK, for "Pillar 2" swab tests for those showing symptoms, it can take up to five days for results to be collated. We make use of the stability of the under reporting process over time to motivate a statistical temporal model that infers the final total count given the partial count information as it arrives. We adopt a Bayesian approach that provides for subjective priors on parameters and a hierarchical structure for an underlying latent intensity process for the infection counts. This results in a smoothed time-series representation now-casting the expected number of daily counts of positive tests with uncertainty bands that can be used to aid decision making. Inference is performed using sequential Monte Carlo

arXiv.org e-Print Archive

UCL Discovery

PubMed Central

Bayesian imputation of COVID-19 positive test counts for nowcasting under reporting lag

Author: Briers Mark
Hetherington James
Holmes Chris
Jersakova Radka
Lehmann Brieuc
Lomax James
Nicholson George
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 23/04/2022
Field of study

Obtaining up to date information on the number of UK COVID-19 regional infections is hampered by the reporting lag in positive test results for people with COVID-19 symptoms. In the UK, for ‘Pillar 2’ swab tests for those showing symptoms, it can take up to five days for results to be collated. We make use of the stability of the under reporting process over time to motivate a statistical temporal model that infers the final total count given the partial count information as it arrives. We adopt a Bayesian approach that provides for subjective priors on parameters and a hierarchical structure for an underlying latent intensity process for the infection counts. This results in a smoothed time-series representation nowcasting the expected number of daily counts of positive tests with uncertainty bands that can be used to aid decision making. Inference is performed using sequential Monte Carlo

UCL Discovery

Improving local prevalence estimates of SARS-CoV-2 infections using a causal debiasing framework.

Author: Blangiardo Marta
Diggle Peter J
Holmes Chris
Jersakova Radka
King Ruairidh E
Lehmann Brieuc
Lomax James
Mallon Ann-Marie
Nicholson George
Padellini Tullia
Pouwels Koen B
Richardson Sylvia
Publication venue: Nat Microbiol
Publication date: 18/11/2021
Field of study

Funder: Oxford University | Jesus College, University of OxfordFunder: Joint Biosecurity CentreGlobal and national surveillance of SARS-CoV-2 epidemiology is mostly based on targeted schemes focused on testing individuals with symptoms. These tested groups are often unrepresentative of the wider population and exhibit test positivity rates that are biased upwards compared with the true population prevalence. Such data are routinely used to infer infection prevalence and the effective reproduction number, Rt, which affects public health policy. Here, we describe a causal framework that provides debiased fine-scale spatiotemporal estimates by combining targeted test counts with data from a randomized surveillance study in the United Kingdom called REACT. Our probabilistic model includes a bias parameter that captures the increased probability of an infected individual being tested, relative to a non-infected individual, and transforms observed test counts to debiased estimates of the true underlying local prevalence and Rt. We validated our approach on held-out REACT data over a 7-month period. Furthermore, our local estimates of Rt are indicative of 1-week- and 2-week-ahead changes in SARS-CoV-2-positive case numbers. We also observed increases in estimated local prevalence and Rt that reflect the spread of the Alpha and Delta variants. Our results illustrate how randomized surveys can augment targeted testing to improve statistical accuracy in monitoring the spread of emerging and ongoing infectious disease

PubMed Central

Spiral - Imperial College Digital Repository

Apollo (Cambridge)

Lancaster E-Prints

Recommended from our members

Improving local prevalence estimates of SARS-CoV-2 infections using a causal debiasing framework.

Author: Blangiardo Marta
Diggle Peter J
Holmes Chris
Jersakova Radka
King Ruairidh E
Lehmann Brieuc
Lomax James
Mallon Ann-Marie
Nicholson George
Padellini Tullia
Pouwels Koen B
Richardson Sylvia
Publication venue: Nat Microbiol
Publication date: 02/02/2022
Field of study

Apollo (Cambridge)

Interoperability of Statistical Models in Pandemic Preparedness: Principles and Reality

Author: Blangiardo Marta
Briers Mark
Diggle Peter J
Fjelde Tor Erlend
Ge Hong
Goudie Robert JB
Holmes Chris
Jersakova Radka
King Ruairidh E
Lehmann Brieuc CL
Mallon Ann-Marie
Nicholson George
Padellini Tullia
Richardson Sylvia
Teh Yee Whye
Publication venue: INST MATHEMATICAL STATISTICS-IMS
Publication date: 01/05/2022
Field of study

We present "interoperability" as a guiding framework for statistical modelling to assist policy makers asking multiple questions using diverse datasets in the face of an evolving pandemic response. Interoperability provides an important set of principles for future pandemic preparedness, through the joint design and deployment of adaptable systems of statistical models for disease surveillance using probabilistic reasoning. We illustrate this through case studies for inferring spatial-temporal coronavirus disease 2019 (COVID-19) prevalence and reproduction numbers in England

UCL Discovery