6 research outputs found
Graph Representation Forecasting of Patient's Medical Conditions: Toward a Digital Twin.
Objective: Modern medicine needs to shift from a wait and react, curative discipline to a preventative, interdisciplinary science aiming at providing personalized, systemic, and precise treatment plans to patients. To this purpose, we propose a "digital twin" of patients modeling the human body as a whole and providing a panoramic view over individuals' conditions. Methods: We propose a general framework that composes advanced artificial intelligence (AI) approaches and integrates mathematical modeling in order to provide a panoramic view over current and future pathophysiological conditions. Our modular architecture is based on a graph neural network (GNN) forecasting clinically relevant endpoints (such as blood pressure) and a generative adversarial network (GAN) providing a proof of concept of transcriptomic integrability. Results: We tested our digital twin model on two simulated clinical case studies combining information at organ, tissue, and cellular level. We provided a panoramic overview over current and future patient's conditions by monitoring and forecasting clinically relevant endpoints representing the evolution of patient's vital parameters using the GNN model. We showed how to use the GAN to generate multi-tissue expression data for blood and lung to find associations between cytokines conditioned on the expression of genes in the renin-angiotensin pathway. Our approach was to detect inflammatory cytokines, which are known to have effects on blood pressure and have previously been associated with SARS-CoV-2 infection (e.g., CXCR6, XCL1, and others). Significance: The graph representation of a computational patient has potential to solve important technological challenges in integrating multiscale computational modeling with AI. We believe that this work represents a step forward toward next-generation devices for precision and predictive medicine
Recommended from our members
Large-scale inference and imputation for multi-tissue gene expression
Integrating molecular information across tissues and cell types is essential for understanding the coordinated biological mechanisms that drive disease and characterise homoeostasis. Effective multi-tissue omics integration promises a system-wide view of human physiology, with potential to shed light on intra- and multi-tissue molecular phenomena, but faces many complexities arising from the intricacies of biomedical data. This integration problem challenges single-tissue and conventional techniques for omics analysis, often unable to model a variable number of tissues with sufficient statistical strength, necessitating the development of scalable, non-linear, and flexible methods.
This dissertation develops inference and imputation methods for the analysis of gene expression data, an immensely rich and complex biomedical data modality, enabling integration across multiple tissues. The imputation task can strongly influence downstream applications, including performing differential expression analysis, determining co-expression networks, and characterising cross-tissue associations. Inferring tissue-specific gene expression may also play a fundamental role in clinical settings, where gene expression is often profiled in accessible tissues such as whole blood. Due to the fact that gene expression is highly context-specific, imputation methods may facilitate the prediction of gene expression in inaccessible tissues, with applications in diagnosing and monitoring pathophysiological conditions.
The modelling approaches presented throughout the thesis address four important methodological problems. The first work introduces a flexible generative model for the in-silico generation of realistic gene expression data across multiple tissues and conditions, which may reveal tissue- and disease-specific differential expression patterns and may be useful for data augmentation. The second study proposes two deep learning methods to study whether the complete transcriptome of a tissue can be inferred from the expression of a minimal subset of genes, with potential application in the selection of tissue-specific biomarkers and the integration of large-scale biorepositories. The third work presents a novel method, hypergraph factorisation, for the joint imputation of multi-tissue and cell-type gene expression, providing a system-wide view of human physiology. The fourth study proposes a graph representation learning approach that leverages spatial information to improve the reconstruction of tissue architectures from spatial transcriptomic data. Collectively, this thesis develops flexible and powerful computational approaches for the analysis of tissue-specific gene expression data.Fundació "la Caixa"
Fundación Rafael del Pin
Recommended from our members
The impact of imputation quality on machine learning classifiers for datasets with missing values
Background: Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods, followed by classification of the now complete samples. The focus of the machine learning researcher is to optimise the classifier’s performance. Methods: We utilise three simulated and three real-world clinical datasets with different feature types and missingness patterns. Initially, we evaluate how the downstream classifier performance depends on the choice of classifier and imputation methods. We employ ANOVA to quantitatively evaluate how the choice of missingness rate, imputation method, and classifier method influences the performance. Additionally, we compare commonly used methods for assessing imputation quality and introduce a class of discrepancy scores based on the sliced Wasserstein distance. We also assess the stability of the imputations and the interpretability of model built on the imputed data. Results: The performance of the classifier is most affected by the percentage of missingness in the test data, with a considerable performance decline observed as the test missingness rate increases. We also show that the commonly used measures for assessing imputation quality tend to lead to imputed data which poorly matches the underlying data distribution, whereas our new class of discrepancy scores performs much better on this measure. Furthermore, we show that the interpretability of classifier models trained using poorly imputed data is compromised. Conclusions: It is imperative to consider the quality of the imputation when performing downstream classification as the effects on the classifier can be considerable
Recommended from our members
The impact of imputation quality on machine learning classifiers for datasets with missing values.
BACKGROUND: Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods, followed by classification of the now complete samples. The focus of the machine learning researcher is to optimise the classifier's performance. METHODS: We utilise three simulated and three real-world clinical datasets with different feature types and missingness patterns. Initially, we evaluate how the downstream classifier performance depends on the choice of classifier and imputation methods. We employ ANOVA to quantitatively evaluate how the choice of missingness rate, imputation method, and classifier method influences the performance. Additionally, we compare commonly used methods for assessing imputation quality and introduce a class of discrepancy scores based on the sliced Wasserstein distance. We also assess the stability of the imputations and the interpretability of model built on the imputed data. RESULTS: The performance of the classifier is most affected by the percentage of missingness in the test data, with a considerable performance decline observed as the test missingness rate increases. We also show that the commonly used measures for assessing imputation quality tend to lead to imputed data which poorly matches the underlying data distribution, whereas our new class of discrepancy scores performs much better on this measure. Furthermore, we show that the interpretability of classifier models trained using poorly imputed data is compromised. CONCLUSIONS: It is imperative to consider the quality of the imputation when performing downstream classification as the effects on the classifier can be considerable
Recommended from our members
The impact of imputation quality on machine learning classifiers for datasets with missing values.
BACKGROUND: Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods, followed by classification of the now complete samples. The focus of the machine learning researcher is to optimise the classifier's performance. METHODS: We utilise three simulated and three real-world clinical datasets with different feature types and missingness patterns. Initially, we evaluate how the downstream classifier performance depends on the choice of classifier and imputation methods. We employ ANOVA to quantitatively evaluate how the choice of missingness rate, imputation method, and classifier method influences the performance. Additionally, we compare commonly used methods for assessing imputation quality and introduce a class of discrepancy scores based on the sliced Wasserstein distance. We also assess the stability of the imputations and the interpretability of model built on the imputed data. RESULTS: The performance of the classifier is most affected by the percentage of missingness in the test data, with a considerable performance decline observed as the test missingness rate increases. We also show that the commonly used measures for assessing imputation quality tend to lead to imputed data which poorly matches the underlying data distribution, whereas our new class of discrepancy scores performs much better on this measure. Furthermore, we show that the interpretability of classifier models trained using poorly imputed data is compromised. CONCLUSIONS: It is imperative to consider the quality of the imputation when performing downstream classification as the effects on the classifier can be considerable
The impact of imputation quality on machine learning classifiers for datasets with missing values
Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods, followed by classification of the now complete samples. The focus of the machine learning researcher is to optimise the classifier’s performance.Peer reviewe