10 research outputs found
Computational biology applications in the study of complex systems
In biomedicine, the advent of digitalization and big improvements in computing power and high throughput technologies has yielded an unprecedented amount of data. To harness this data’s full potential, research has become increasingly computational, with core tools of data science such as machine learning required to decipher patterns, help extract meaning and uncover underlying connections. In this work, we have explored key leading areas of computational research, from discovery science to translational medicine and complex system studies. First, a machine learning pipeline was developed to inform bioinformatic research. After validation, it was applied on a novel dataset generating insight onto the predictive power of immune features in ultra-early trauma injury, generating leads of possible biomarkers associated with the development of MultiOrgan Dysfunction. Then, in order to make our pipeline accessible, an interactive, simple, free and open-source supervised machine learning webtool was developed. Adapting this same framework to the realm of clinical translation, we then deployed two prognostic models for decision support in surgeries during the COVID-19 pandemic. This was done in collaboration with the NIHR surgical team. Lastly, we explored systems biology approaches by studying the complexity behind ageing and health to disease transitions. For this, we leveraged the UK Biobank data as a rich source of deeply phenotyped information. By calculating biological age (PhenoAge) longitudinally for nearly 400,000 participants, we identified four distinct ageing categories ranging from healthy to unhealthy ageing. These different trajectories were characterised by their chronic diseases and genetic makeup, revealing a strong association of metabolic dysfunction with unhealthier phenotypes and immune-related signals for healthier ones. Also, strong opposite-effect associations of longevity-related variants were found, with novel regulatory elements postulated as possible drivers of unhealthy phenotypes, opening new avenues for future study. Overall, our findings highlight the crucial role of computational methods in biomedicine and their potential to transform clinical practice
Explainable AI-prioritized plasma and fecal metabolites in inflammatory bowel disease and their dietary associations
Fecal metabolites effectively discriminate inflammatory bowel disease (IBD) and show differential associations with diet. Metabolomics and AI-based models, including explainable AI (XAI), play crucial roles in understanding IBD. Using datasets from the UK Biobank and the Human Microbiome Project Phase II IBD Multi’omics Database (HMP2 IBDMDB), this study uses multiple machine learning (ML) classifiers and Shapley additive explanations (SHAP)-based XAI to prioritize plasma and fecal metabolites and analyze their diet correlations. Key findings include the identification of discriminative metabolites like glycoprotein acetyl and albumin in plasma, as well as nicotinic acid metabolites andurobilin in feces. Fecal metabolites provided a more robust disease predictor model (AUC [95%]: 0.93 [0.87–0.99]) compared to plasma metabolites (AUC [95%]: 0.74 [0.69–0.79]), with stronger and more group-differential diet-metabolite associations in feces. The study validates known metabolite associations and highlights the impact of IBD on the interplay between gut microbial metabolites and diet
Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference
Inferring the topology of a gene regulatory network (GRN) from gene expression data is a challenging but important undertaking for gaining a better understanding of gene regulation. Key challenges include working with noisy data and dealing with a higher number of genes than samples. Although a number of different methods have been proposed to infer the structure of a GRN, there are large discrepancies among the different inference algorithms they adopt, rendering their meaningful comparison challenging. In this study, we used two methods, namely the MIDER (Mutual Information Distance and Entropy Reduction) and the PLSNET (Partial least square based feature selection) methods, to infer the structure of a GRN directly from data and computationally validated our results. Both methods were applied to different gene expression datasets resulting from inflammatory bowel disease (IBD), pancreatic ductal adenocarcinoma (PDAC), and acute myeloid leukaemia (AML) studies. For each case, gene regulators were successfully identified. For example, for the case of the IBD dataset, the UGT1A family genes were identified as key regulators while upon analysing the PDAC dataset, the SULF1 and THBS2 genes were depicted. We further demonstrate that an ensemble-based approach, that combines the output of the MIDER and PLSNET algorithms, can infer the structure of a GRN from data with higher accuracy. We have also estimated the number of the samples required for potential future validation studies. Here, we presented our proposed analysis framework that caters not only to candidate regulator genes prediction for potential validation experiments but also an estimation of the number of samples required for these experiments
-Omics biomarker identification pipeline for translational medicine
Abstract Background Translational medicine (TM) is an emerging domain that aims to facilitate medical or biological advances efficiently from the scientist to the clinician. Central to the TM vision is to narrow the gap between basic science and applied science in terms of time, cost and early diagnosis of the disease state. Biomarker identification is one of the main challenges within TM. The identification of disease biomarkers from -omics data will not only help the stratification of diverse patient cohorts but will also provide early diagnostic information which could improve patient management and potentially prevent adverse outcomes. However, biomarker identification needs to be robust and reproducible. Hence a robust unbiased computational framework that can help clinicians identify those biomarkers is necessary. Methods We developed a pipeline (workflow) that includes two different supervised classification techniques based on regularization methods to identify biomarkers from -omics or other high dimension clinical datasets. The pipeline includes several important steps such as quality control and stability of selected biomarkers. The process takes input files (outcome and independent variables or -omics data) and pre-processes (normalization, missing values) them. After a random division of samples into training and test sets, Least Absolute Shrinkage and Selection Operator and Elastic Net feature selection methods are applied to identify the most important features representing potential biomarker candidates. The penalization parameters are optimised using 10-fold cross validation and the process undergoes 100 iterations and a combinatorial analysis to select the best performing multivariate model. An empirical unbiased assessment of their quality as biomarkers for clinical use is performed through a Receiver Operating Characteristic curve and its Area Under the Curve analysis on both permuted and real data for 1000 different randomized training and test sets. We validated this pipeline against previously published biomarkers. Results We applied this pipeline to three different datasets with previously published biomarkers: lipidomics data by Acharjee et al. (Metabolomics 13:25, 2017) and transcriptomics data by Rajamani and Bhasin (Genome Med 8:38, 2016) and Mills et al. (Blood 114:1063–1072, 2009). Our results demonstrate that our method was able to identify both previously published biomarkers as well as new variables that add value to the published results. Conclusions We developed a robust pipeline to identify clinically relevant biomarkers that can be applied to different -omics datasets. Such identification reveals potentially novel drug targets and can be used as a part of a machine-learning based patient stratification framework in the translational medicine settings
Link prediction in complex network using information flow
Abstract Link prediction in complex networks has recently attracted a great deal of attraction in diverse scientific domains, including social and biological sciences. Given a snapshot of a network, the goal is to predict links that are missing in the network or that are likely to occur in the near future. This problem has both theoretical and practical significance; it not only helps us to identify missing links in a network more efficiently by avoiding the expensive and time consuming experimental processes, but also allows us to study the evolution of a network with time. To address the problem of link prediction, numerous attempts have been made over the recent years that exploit the local and the global topological properties of the network to predict missing links in the network. In this paper, we use parametrised matrix forest index (PMFI) to predict missing links in a network. We show that, for small parameter values, this index is linked to a heat diffusion process on a graph and therefore encodes geometric properties of the network. We then develop a framework that combines the PMFI with a local similarity index to predict missing links in the network. The framework is applied to numerous networks obtained from diverse domains such as social network, biological network, and transport network. The results show that the proposed method can predict missing links with higher accuracy when compared to other state-of-the-art link prediction methods
Multimorbidity prediction using link prediction
Abstract Multimorbidity, frequently associated with aging, can be operationally defined as the presence of two or more chronic conditions. Predicting the likelihood of a patient with multimorbidity to develop a further particular disease in the future is one of the key challenges in multimorbidity research. In this paper we are using a network-based approach to analyze multimorbidity data and develop methods for predicting diseases that a patient is likely to develop. The multimorbidity data is represented using a temporal bipartite network whose nodes represent patients and diseases and a link between these nodes indicates that the patient has been diagnosed with the disease. Disease prediction then is reduced to a problem of predicting those missing links in the network that are likely to appear in the future. We develop a novel link prediction method for static bipartite network and validate the performance of the method on benchmark datasets. By using a probabilistic framework, we then report on the development of a method for predicting future links in the network, where links are labelled with a time-stamp. We apply the proposed method to three different multimorbidity datasets and report its performance measured by different performance metrics including AUC, Precision, Recall, and F-Score
A causal web between chronotype and metabolic health traits
Observational and experimental evidence has linked chronotype to both psychological and cardiometabolic traits. Recent Mendelian randomization (MR) studies have investigated direct links between chronotype and several of these traits, often in isolation of outside potential mediating or moderating traits. We mined the EpiGraphDB MR database for calculated chronotype–trait associations (p-value < 5 × 10(−8)). We then re-analyzed those relevant to metabolic or mental health and investigated for statistical evidence of horizontal pleiotropy. Analyses passing multiple testing correction were then investigated for confounders, colliders, intermediates, and reverse intermediates using the EpiGraphDB database, creating multiple chronotype–trait interactions among each of the the traits studied. We revealed 10 significant chronotype–exposure associations (false discovery rate < 0.05) exposed to 111 potential previously known confounders, 52 intermediates, 18 reverse intermediates, and 31 colliders. Chronotype–lipid causal associations collided with treatment and diabetes effects; chronotype–bipolar associations were mediated by breast cancer; and chronotype–alcohol intake associations were impacted by confounders and intermediate variables including known zeitgebers and molecular traits. We have reported the influence of chronotype on several cardiometabolic and behavioural traits, and identified potential confounding variables not reported on in studies while discovering new associations to drugs and disease
Evaluating the detection ability of a range of epistasis detection methods on simulated data for pure and impure epistatic models
BACKGROUND: Numerous approaches have been proposed for the detection of epistatic interactions within GWAS datasets in order to better understand the drivers of disease and genetics. METHODS: A selection of state-of-the-art approaches were assessed. These included the statistical tests, fast-epistasis, BOOST, logistic regression and wtest; swarm intelligence methods, namely AntEpiSeeker, epiACO and CINOEDV; and data mining approaches, including MDR, GSS, SNPRuler and MPI3SNP. Data were simulated to provide randomly generated models with no individual main effects at different heritabilities (pure epistasis) as well as models based on penetrance tables with some main effects (impure epistasis). Detection of both two and three locus interactions were assessed across a total of 1,560 simulated datasets. The different methods were also applied to a section of the UK biobank cohort for Atrial Fibrillation. RESULTS: For pure, two locus interactions, PLINK’s implementation of BOOST recovered the highest number of correct interactions, with 53.9% and significantly better performing than the other methods (p = 4.52e − 36). For impure two locus interactions, MDR exhibited the best performance, recovering 62.2% of the most significant impure epistatic interactions (p = 6.31e − 90 for all but one test). The assessment of three locus interaction prediction revealed that wtest recovered the highest number (17.2%) of pure epistatic interactions(p = 8.49e − 14). wtest also recovered the highest number of three locus impure epistatic interactions (p = 6.76e − 48) while AntEpiSeeker ranked as the most significant the highest number of such interactions (40.5%). Finally, when applied to a real dataset for Atrial Fibrillation, most notably finding an interaction between SYNE2 and DTNB
Early onset of immune-mediated diseases in minority ethnic groups in the UK
BACKGROUND: The prevalence of some immune-mediated diseases (IMDs) shows distinct differences between populations of different ethnicities. The aim of this study was to determine if the age at diagnosis of common IMDs also differed between different ethnic groups in the UK, suggestive of distinct influences of ethnicity on disease pathogenesis. METHODS: This was a population-based retrospective primary care study. Linear regression provided unadjusted and adjusted estimates of age at diagnosis for common IMDs within the following ethnic groups: White, South Asian, African-Caribbean and Mixed-race/Other. Potential disease risk confounders in the association between ethnicity and diagnosis age including sex, smoking, body mass index and social deprivation (Townsend quintiles) were adjusted for. The analysis was replicated using data from UK Biobank (UKB). RESULTS: After adjusting for risk confounders, we observed that individuals from South Asian, African-Caribbean and Mixed-race/Other ethnicities were diagnosed with IMDs at a significantly younger age than their White counterparts for almost all IMDs. The difference in the diagnosis age (ranging from 2 to 30 years earlier) varied for each disease and by ethnicity. For example, rheumatoid arthritis was diagnosed at age 49, 48 and 47 years in individuals of African-Caribbean, South Asian and Mixed-race/Other ethnicities respectively, compared to 56 years in White ethnicities. The earlier diagnosis of most IMDs observed was validated in UKB although with a smaller effect size. CONCLUSION: Individuals from non-White ethnic groups in the UK had an earlier age at diagnosis for several IMDs than White adults. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12916-022-02544-5
30-day Morbidity and Mortality after Cholecystectomy for Benign Gallbladder Disease (AMBROSE)
Objective: This study aimed to assess 30-day morbidity and mortality rates following cholecystectomy for benign gallbladder disease and identify the factors associated with complications. Summary background data: Although cholecystectomy is common for benign gallbladder disease, there is a gap in the knowledge of the current practice and variations on a global level. Methods: A prospective, international, observational collaborative cohort study of consecutive patients undergoing cholecystectomy for benign gallbladder disease from participating hospitals in 57 countries between January 1 and June 30, 2022, was performed. Univariate and multivariate logistic regression models were used to identify preoperative and operative variables associated with 30-day postoperative outcomes. Results: Data of 21,706 surgical patients from 57 countries were included in the analysis. A total of 10,821 (49.9%), 4,263 (19.7%), and 6,622 (30.5%) cholecystectomies were performed in the elective, emergency, and delayed settings, respectively. Thirty-day postoperative complications were observed in 1,738 patients (8.0%), including mortality in 83 patients (0.4%). Bile leaks (Strasberg grade A) were reported in 278 (1.3%) patients and severe bile duct injuries (Strasberg grades B-E) were reported in 48 (0.2%) patients. Patient age, ASA physical status class, surgical setting, operative approach and Nassar operative difficulty grade were identified as the five predictors demonstrating the highest relative importance in predicting postoperative complications. Conclusion: This multinational observational collaborative cohort study presents a comprehensive report of the current practices and outcomes of cholecystectomy for benign gallbladder disease. Ongoing global collaborative evaluations and initiatives are needed to promote quality assurance and improvement in cholecystectomy