Search CORE

2,441 research outputs found

Advancing Precision Medicine: Unveiling Disease Trajectories, Decoding Biomarkers, and Tailoring Individual Treatments

Author: Wang Yanfei
Publication venue: DigitalCommons@TMC
Publication date: 01/10/2023
Field of study

Chronic diseases are not only prevalent but also exert a considerable strain on the healthcare system, individuals, and communities. Nearly half of all Americans suffer from at least one chronic disease, which is still growing. The development of machine learning has brought new directions to chronic disease analysis. Many data scientists have devoted themselves to understanding how a disease progresses over time, which can lead to better patient management, identification of disease stages, and targeted interventions. However, due to the slow progression of chronic disease, symptoms are barely noticed until the disease is advanced, challenging early detection. Meanwhile, chronic diseases often have diverse underlying causes and can manifest differently among patients. Besides the external factors, the development of chronic disease is also influenced by internal signals. The DNA sequence-level differences have been proven responsible for constant predisposition to chronic diseases. Given these challenges, data must be analyzed at various scales, ranging from single nucleotide polymorphisms (SNPs) to individuals and populations, to better understand disease mechanisms and provide precision medicine. Therefore, this research aimed to develop an automated pipeline from building predictive models and estimating individual treatment effects based on the structured data of general electronic health records (EHRs) to identifying genetic variations (e.g., SNPs) associated with diseases to unravel the genetic underpinnings of chronic diseases. First, we used structured EHRs to uncover chronic disease progression patterns and assess the dynamic contribution of clinical features. In this step, we employed causal inference methods (constraint-based and functional causal models) for feature selection and utilized Markov chains, attention long short-term memory (LSTM), and Gaussian process (GP). SHapley Additive exPlanations (SHAPs) and local interpretable model-agnostic explanations (LIMEs) further extended the work to identify important clinical features. Next, I developed a novel counterfactual-based method to predict individual treatment effects (ITE) from observational data. To discern a “balanced” representation so that treated and control distributions look similar, we disentangled the doctor’s preference from the covariance and rebuilt the representation of the treated and control groups. We use integral probability metrics to measure distances between distributions. The expected ITE estimation error of a representation was the sum of the standard generalization error of that representation and the distance between the distributions induced. Finally, we performed genome-wide association studies (GWAS) based on the stage information we extracted from our unsupervised disease progression model to identify the biomarkers and explore the genetic correction between the disease and its phenotypes

DigitalCommons@The Texas Medical Center

Recommended from our members

Modelling prognostic trajectories in Alzheimer’s disease

Author: Giorgio Joseph
Publication venue: University of Cambridge
Publication date: 25/05/2020
Field of study

Progression to dementia due to Alzheimer’s Disease (AD) is a long and protracted process that involves multiple pathways of disease pathophysiology. Predicting these dynamic changes has major implications for timely and effective clinical management in AD. There are two reasons why at present we lack appropriate tools to make such predictions. First, a key feature of AD is the interactive nature of the relationships between biomarkers, such as accumulation of β-amyloid -a peptide that builds plaques between nerve cells-, tau -a protein found in the axons of nerve cells- and widespread neurodegeneration. Current models fail to capture these relationships because they are unable to successfully reduce the high dimensionality of biomarkers while exploiting informative multivariate relationships. Second, current models focus on simply predicting in a binary manner whether an individual will develop dementia due to AD or not, without informing clinicians about their predicted disease trajectory. This can result in administering inefficient treatment plans and hindering appropriate stratification for clinical trials. In this thesis, we overcome these challenges by using applied machine learning to build predictive models of patient disease trajectories in the earliest stages of AD. Specifically, to exploit the multi-dimensionality of biomarker data, we used a novel feature generation methodology Partial Least Squares regression with recursive feature elimination (PLSr-RFE). This method applies a hybrid-feature selection and feature construction method that captures co-morbidities in cognition and pathophysiology, resulting in an index of Alzheimer’s disease atrophy from structural MRI. We validated our choice of biomarker and the efficacy of our methodology by showing that the learnt pattern of grey matter atrophy is highly predictive of tau accumulation in an independent sample. Next, to go beyond predicting binary outcomes to deriving individualised prognostic scores of cognitive decline due to AD, we used a novel trajectory modelling approach (Generalised Metric Learning Vector Quantization – Scalar projection) that mines multimodal data from large AD research cohorts. Using this approach, we derive individualised prognostic scores of cognitive decline due to AD, revealing interactive cognitive, and biological factors that improve prediction accuracy. Next, we extended our machine learning framework to classify and stage early AD individuals based on future pathological tau accumulation. Our results show that the characteristic spreading pattern of tau in early AD can be predicted by baseline biomarkers, particularly when stratifying groups using multimodal data. Further, we showed that our prognostic index predicts individualised rates of future tau accumulation with high accuracy and regional specificity in an independent sample of cognitively unimpaired individuals. Overall, our work used machine learning to combine continuous information from AD biomarkers predicting pathophysiological changes at different stages in the AD cascade. The approaches presented in this thesis provide an excellent framework to support personalised clinical interventions and guide effective drug discovery trials

Apollo (Cambridge)

Inferential stability in systems biology

Author: Kirk Paul
Kirk Paul
Publication venue: Division of Molecular Biosciences, Imperial College London
Publication date: 01/03/2011
Field of study

The modern biological sciences are fraught with statistical difficulties. Biomolecular stochasticity, experimental noise, and the “large p, small n” problem all contribute to the challenge of data analysis. Nevertheless, we routinely seek to draw robust, meaningful conclusions from observations. In this thesis, we explore methods for assessing the effects of data variability upon downstream inference, in an attempt to quantify and promote the stability of the inferences we make. We start with a review of existing methods for addressing this problem, focusing upon the bootstrap and similar methods. The key requirement for all such approaches is a statistical model that approximates the data generating process. We move on to consider biomarker discovery problems. We present a novel algorithm for proposing putative biomarkers on the strength of both their predictive ability and the stability with which they are selected. In a simulation study, we find our approach to perform favourably in comparison to strategies that select on the basis of predictive performance alone. We then consider the real problem of identifying protein peak biomarkers for HAM/TSP, an inflammatory condition of the central nervous system caused by HTLV-1 infection. We apply our algorithm to a set of SELDI mass spectral data, and identify a number of putative biomarkers. Additional experimental work, together with known results from the literature, provides corroborating evidence for the validity of these putative biomarkers. Having focused on static observations, we then make the natural progression to time course data sets. We propose a (Bayesian) bootstrap approach for such data, and then apply our method in the context of gene network inference and the estimation of parameters in ordinary differential equation models. We find that the inferred gene networks are relatively unstable, and demonstrate the importance of finding distributions of ODE parameter estimates, rather than single point estimates

Spiral - Imperial College Digital Repository

Discriminating active from latent tuberculosis in patients presenting to community clinics.

Author: Agranoff DD
Athanasakis D
Battaglia F
Ely BK
Evans CA
Fernandez-Reyes D
Friedland JS
Gilman RH
Montoya R
Sandhu G
Valencia T
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

BACKGROUND: Because of the high global prevalence of latent TB infection (LTBI), a key challenge in endemic settings is distinguishing patients with active TB from patients with overlapping clinical symptoms without active TB but with co-existing LTBI. Current methods are insufficiently accurate. Plasma proteomic fingerprinting can resolve this difficulty by providing a molecular snapshot defining disease state that can be used to develop point-of-care diagnostics. METHODS: Plasma and clinical data were obtained prospectively from patients attending community TB clinics in Peru and from household contacts. Plasma was subjected to high-throughput proteomic profiling by mass spectrometry. Statistical pattern recognition methods were used to define mass spectral patterns that distinguished patients with active TB from symptomatic controls with or without LTBI. RESULTS: 156 patients with active TB and 110 symptomatic controls (patients with respiratory symptoms without active TB) were investigated. Active TB patients were distinguishable from undifferentiated symptomatic controls with accuracy of 87% (sensitivity 84%, specificity 90%), from symptomatic controls with LTBI (accuracy of 87%, sensitivity 89%, specificity 82%) and from symptomatic controls without LTBI (accuracy 90%, sensitivity 90%, specificity 92%). CONCLUSIONS: We show that active TB can be distinguished accurately from LTBI in symptomatic clinic attenders using a plasma proteomic fingerprint. Translation of biomarkers derived from this study into a robust and affordable point-of-care format will have significant implications for recognition and control of active TB in high prevalence settings

Crossref

Directory of Open Access Journals

PubMed Central

Spiral - Imperial College Digital Repository

St George's Online Research Archive

Sussex Research Online

FigShare

The Scarface Score: Deciphering Response to DNA Damage Agents in High-Grade Serous Ovarian Cancer—A GEICO Study

Author: De Sande-González Luis Miguel
Fernandez-Serra Antonio
Gallego Alejandro
López-Reig Raquel
Márquez Raúl
OAKNIN ANA
Yubero Esteban Alfonso
Publication venue: MDPI
Publication date: 01/06/2023
Field of study

Genomic instability; Machine learningInestabilidad genómica; Aprendizaje automáticoInestabilitat genòmica; Aprenentatge automàticGenomic Instability (GI) is a transversal phenomenon shared by several tumor types that provide both prognostic and predictive information. In the context of high-grade serous ovarian cancer (HGSOC), response to DNA-damaging agents such as platinum-based and poly(ADP-ribose) polymerase inhibitors (PARPi) has been closely linked to deficiencies in the DNA repair machinery by homologous recombination repair (HRR) and GI. In this study, we have developed the Scarface score, an integrative algorithm based on genomic and transcriptomic data obtained from the NGS analysis of a prospective GEICO cohort of 190 formalin-fixed paraffin-embedded (FFPE) tumor samples from patients diagnosed with HGSOC with a median follow up of 31.03 months (5.87–159.27 months). In the first step, three single-source models, including the SNP-based model (accuracy = 0.8077), analyzing 8 SNPs distributed along the genome; the GI-based model (accuracy = 0.9038) interrogating 28 parameters of GI; and the HTG-based model (accuracy = 0.8077), evaluating the expression of 7 genes related with tumor biology; were proved to predict response. Then, an ensemble model called the Scarface score was found to predict response to DNA-damaging agents with an accuracy of 0.9615 and a kappa index of 0.9128 (p < 0.0001). The Scarface Score approaches the routine establishment of GI in the clinical setting, enabling its incorporation as a predictive and prognostic tool in the management of HGSOC.This research was partially funded by GVA Grants “Subvencions per a la realització de projectes d’i+d+i desenvolupats per grups d’investigació emergents (GV/2020/158)” and “Ayudas para la contratación de personal investigador en formación de carácter predoctoral” (ACIF/2016/008) and “Beca de investigación traslacional Andrés Poveda 2020” from GEICO group. This study was awarded the Prize “Antonio Llombart Rodriguez-FINCIVO 2020” from the Royal Academy of Medicine of the Valencian Community

Scientia, Dipòsit d’Informació Digital del Departament de Salut

Low-level visual processing and its relation to neurological disease

Author: Himmelberg Marc Mason
Publication venue: University of York
Publication date: 29/03/2019
Field of study

Retinal neurons extract changes in image intensity across space, time, and wavelength. Retinal signal is transmitted to the early visual cortex, where the processing of low-level visual information occurs. The fundamental nature of these early visual pathways means that they are often compromised by neurological disease. This thesis had two aims. First, it aimed to investigate changes in visual processing in response to Parkinson’s disease (PD) by using electrophysiological recordings from animal models. Second, it aimed to use functional magnetic resonance imaging (fMRI) to investigate how low-level visual processes are represented in healthy human visual cortex, focusing on two pathways often compromised in disease; the magnocellular pathway and chromatic S-cone pathway. First, we identified a pathological mechanism of excitotoxicity in the visual system of Drosophila PD models. Next, we found that we could apply machine learning classifiers to multivariate visual response profiles recorded from the eye and brain of Drosophila and rodent PD models to accurately classify these animals into their correct class. Using fMRI and psychophysics, found that measurements of temporal contrast sensitivity differ as a function of visual space, with peripherally tuned voxels in early visual areas showing increased contrast sensitivity at a high temporal frequency. Finally, we used 7T fMRI to investigate systematic differences in achromatic and S-cone population receptive field (pRF) size estimates in the visual cortex of healthy humans. Unfortunately, we could not replicate the fundamental effect of pRF size increasing with eccentricity, indicating complications with our data and stimulus

White Rose E-theses Online

Detection of Epigenomic Network Community Oncomarkers

Author: Bartlett Thomas E.
Zaikin Alexey
Publication venue
Publication date: 01/08/2016
Field of study

In this paper we propose network methodology to infer prognostic cancer biomarkers based on the epigenetic pattern DNA methylation. Epigenetic processes such as DNA methylation reflect environmental risk factors, and are increasingly recognised for their fundamental role in diseases such as cancer. DNA methylation is a gene-regulatory pattern, and hence provides a means by which to assess genomic regulatory interactions. Network models are a natural way to represent and analyse groups of such interactions. The utility of network models also increases as the quantity of data and number of variables increase, making them increasingly relevant to large-scale genomic studies. We propose methodology to infer prognostic genomic networks from a DNA methylation-based measure of genomic interaction and association. We then show how to identify prognostic biomarkers from such networks, which we term `network community oncomarkers'. We illustrate the power of our proposed methodology in the context of a large publicly available breast cancer dataset

arXiv.org e-Print Archive

UCL Discovery

Dual-intended deep learning model for breast cancer diagnosis in ultrasound imaging

Author: Akhloufi Moulay Abdellatif
Amini Arya
Barry Madeline
Ma Lan
Maldague X.
Ren Lei
Vigil Nicolle
Yousefi Bardia
Publication venue: 'MDPI AG'
Publication date: 01/05/2022
Field of study

Automated medical data analysis demonstrated a significant role in modern medicine, and cancer diagnosis/prognosis to achieve highly reliable and generalizable systems. In this study, an automated breast cancer screening method in ultrasound imaging is proposed. A convolutional deep autoencoder model is presented for simultaneous segmentation and radiomic extraction. The model segments the breast lesions while concurrently extracting radiomic features. With our deep model, we perform breast lesion segmentation, which is linked to low-dimensional deep-radiomic extraction (four features). Similarly, we used high dimensional conventional imaging throughputs and applied spectral embedding techniques to reduce its size from 354 to 12 radiomics. A total of 780 ultrasound images—437 benign, 210, malignant, and 133 normal—were used to train and validate the models in this study. To diagnose malignant lesions, we have performed training, hyperparameter tuning, crossvalidation, and testing with a random forest model. This resulted in a binary classification accuracy of 78.5% (65.1–84.1%) for the maximal (full multivariate) cross-validated model for a combination of radiomic groups

Directory of Open Access Journals

PubMed Central

CorpusUL

Digital Repository at the University of Maryland