Search CORE

8 research outputs found

Causal Similarity-Based Hierarchical Bayesian Models

Author: Kaski Samuel
Wharrie Sophie
Publication venue
Publication date: 19/10/2023
Field of study

The key challenge underlying machine learning is generalisation to new data. This work studies generalisation for datasets consisting of related tasks that may differ in causal mechanisms. For example, observational medical data for complex diseases suffers from heterogeneity in causal mechanisms of disease across patients, creating challenges for machine learning algorithms that need to generalise to new patients outside of the training dataset. Common approaches for learning supervised models with heterogeneous datasets include learning a global model for the entire dataset, learning local models for each tasks' data, or utilising hierarchical, meta-learning and multi-task learning approaches to learn how to generalise from data pooled across multiple tasks. In this paper we propose causal similarity-based hierarchical Bayesian models to improve generalisation to new tasks by learning how to pool data from training tasks with similar causal mechanisms. We apply this general modelling principle to Bayesian neural networks and compare a variety of methods for estimating causal task similarity (for both known and unknown causal models). We demonstrate the benefits of our approach and applicability to real world problems through a range of experiments on simulated and real data

arXiv.org e-Print Archive

Characterizing personalized effects of family information on disease risk using graph representation learning

Author: Ganna Andrea
Kaski Samuel
Wharrie Sophie
Yang Zhiyu
Publication venue
Publication date: 11/04/2023
Field of study

Family history is considered a risk factor for many diseases because it implicitly captures shared genetic, environmental and lifestyle factors. A nationwide electronic health record (EHR) system spanning multiple generations presents new opportunities for studying a connected network of medical histories for entire families. In this work we present a graph-based deep learning approach for learning explainable, supervised representations of how each family member's longitudinal medical history influences a patient's disease risk. We demonstrate that this approach is beneficial for predicting 10-year disease onset for 5 complex disease phenotypes, compared to clinically-inspired and deep learning baselines for a nationwide EHR system comprising 7 million individuals with up to third-degree relatives. Through the use of graph explainability techniques, we illustrate that a graph-based approach enables more personalized modeling of family information and disease risk by identifying important relatives and features for prediction

arXiv.org e-Print Archive

Characterizing personalized effects of family information on disease risk using graph representation learning

Author: Kaski Samuel
Wharrie Sophie
Publication venue
Publication date: 01/01/2023
Field of study

Family history is considered a risk factor for many diseases because it implicitly captures shared genetic, environmental and lifestyle factors. Finland’s nationwide electronic health record (EHR) system spanning multiple generations presents new opportunities for studying a connected network of medical histories for entire families. In this work we present a graph-based deep learning approach for learning explainable, supervised representations of how each family member’s longitudinal medical history influences a patient’s disease risk. We demonstrate that this approach is beneficial for predicting 10-year disease onset for 5 complex disease phenotypes, compared to clinically-inspired and deep learning baselines for Finland’s nationwide EHR system comprising 7 million individuals with up to third-degree relatives. Through the use of graph explainability techniques, we illustrate that a graph-based approach enables more personalized modeling of family information and disease risk by identifying important relatives and features for prediction

The University of Manchester - Institutional Repository

A fast method to generate hundreds of thousands of synthetic genomes and phenotypes

Author: Kaski Samuel
Rai Vishna
Wharrie Sophie
Yang Zhiyuan
Publication venue
Publication date: 01/05/2023
Field of study

The University of Manchester - Institutional Repository

HAPNEST : efficient, large-scale generation and evaluation of synthetic datasets for genotypes and phenotypes

Author: Ganna Andrea
Gupta Rahul
Kaski Samuel
Lippert Christoph
Martin Alicia
Marttinen Pekka
Monti Remo
O'Connor Luke J.
Palamara Pier Francesco
Raj Vishnu
Wang Ying
Wharrie Sophie
Yang Zhiyu
Publication venue: Oxford University Press
Publication date: 02/09/2023
Field of study

| openaire: EC/H2020/101016775/EU//INTERVENEMOTIVATION: Existing methods for simulating synthetic genotype and phenotype datasets have limited scalability, constraining their usability for large-scale analyses. Moreover, a systematic approach for evaluating synthetic data quality and a benchmark synthetic dataset for developing and evaluating methods for polygenic risk scores are lacking. RESULTS: We present HAPNEST, a novel approach for efficiently generating diverse individual-level genotypic and phenotypic data. In comparison to alternative methods, HAPNEST shows faster computational speed and a lower degree of relatedness with reference panels, while generating datasets that preserve key statistical properties of real data. These desirable synthetic data properties enabled us to generate 6.8 million common variants and nine phenotypes with varying degrees of heritability and polygenicity across 1 million individuals. We demonstrate how HAPNEST can facilitate biobank-scale analyses through the comparison of seven methods to generate polygenic risk scoring across multiple ancestry groups and different genetic architectures. AVAILABILITY AND IMPLEMENTATION: A synthetic dataset of 1 008 000 individuals and nine traits for 6.8 million common variants is available at https://www.ebi.ac.uk/biostudies/studies/S-BSST936. The HAPNEST software for generating synthetic datasets is available as Docker/Singularity containers and open source Julia and C code at https://github.com/intervene-EU-H2020/synthetic_data.Peer reviewe

Aaltodoc Publication Archive

HAPNEST: an efficient tool for generating large-scale genetics datasets from limited training data

Author: Ganna Andrea
Gupta Rahul
Kaski Samuel
Lippert Christoph
Martin Alicia R.
Marttinen Pekka
Monti Remo
O'Connor Luke J
Palamara Pier Francesco
Raj Vishnu
Wang Ying Ying
Wharrie Sophie
Yang Zhiyu
Publication venue
Publication date: 01/01/2022
Field of study

The University of Manchester - Institutional Repository

HAPNEST : efficient, large-scale generation and evaluation of synthetic datasets for genotypes and phenotypes

Author: Ganna Andrea
Gupta Rahul
Kaski Samuel
Lippert Christoph
Martin Alicia R.
Marttinen Pekka
Monti Remo
O'Connor Luke J
Palamara Pier Francesco
Raj Vishnu
Wang Ying Ying
Wharrie Sophie
Yang Zhiyu
Publication venue
Publication date: 01/09/2023
Field of study

Aaltodoc Publication Archive

The University of Manchester - Institutional Repository

Helsingin yliopiston digitaalinen arkisto