180 research outputs found
Artificial intelligence for predictive biomarker discovery in immuno-oncology: a systematic review
Background: The widespread use of immune checkpoint inhibitors (ICIs) has revolutionised treatment of multiple cancer types. However, selecting patients who may benefit from ICI remains challenging. Artificial intelligence (AI) approaches allow exploitation of high-dimension oncological data in research and development of precision immuno-oncology. Materials and methods: We conducted a systematic literature review of peer-reviewed original articles studying the ICI efficacy prediction in cancer patients across five data modalities: genomics (including genomics, transcriptomics, and epigenomics), radiomics, digital pathology (pathomics), and real-world and multimodality data. Results: A total of 90 studies were included in this systematic review, with 80% published in 2021-2022. Among them, 37 studies included genomic, 20 radiomic, 8 pathomic, 20 real-world, and 5 multimodal data. Standard machine learning (ML) methods were used in 72% of studies, deep learning (DL) methods in 22%, and both in 6%. The most frequently studied cancer type was non-small-cell lung cancer (36%), followed by melanoma (16%), while 25% included pan-cancer studies. No prospective study design incorporated AI-based methodologies from the outset; rather, all implemented AI as a post hoc analysis. Novel biomarkers for ICI in radiomics and pathomics were identified using AI approaches, and molecular biomarkers have expanded past genomics into transcriptomics and epigenomics. Finally, complex algorithms and new types of AI-based markers, such as meta-biomarkers, are emerging by integrating multimodal/multi-omics data. Conclusion: AI-based methods have expanded the horizon for biomarker discovery, demonstrating the power of integrating multimodal data from existing datasets to discover new meta-biomarkers. While most of the included studies showed promise for AI-based prediction of benefit from immunotherapy, none provided high-level evidence for immediate practice change. A priori planned prospective trial designs are needed to cover all lifecycle steps of these software biomarkers, from development and validation to integration into clinical practice
Machine Learning Approaches for the Prioritisation of Cardiovascular Disease Genes Following Genome- wide Association Study
Genome-wide association studies (GWAS) have revealed thousands of genetic loci, establishing itself as a valuable method for unravelling the complex biology of many diseases. As GWAS has grown in size and improved in study design to detect effects, identifying real causal signals, disentangling from other highly correlated markers associated by linkage disequilibrium (LD) remains challenging. This has severely limited GWAS findings and brought the method’s value into question. Although thousands of disease susceptibility loci have been reported, causal variants and genes at these loci remain elusive. Post-GWAS analysis aims to dissect the heterogeneity of variant and gene signals. In recent years, machine learning (ML) models have been developed for post-GWAS prioritisation. ML models have ranged from using logistic regression to more complex ensemble models such as random forests and gradient boosting, as well as deep learning models (i.e., neural networks). When combined with functional validation, these methods have shown important translational insights, providing a strong evidence-based approach to direct post-GWAS research. However, ML approaches are in their infancy across biological applications, and as they continue to evolve an evaluation of their robustness for GWAS prioritisation is needed. Here, I investigate the landscape of ML across: selected models, input features, bias risk, and output model performance, with a focus on building a prioritisation framework that is applied to blood pressure GWAS results and tested on re-application to blood lipid traits
Investigating the metabolomics of treatment response in patients with inflammatory rheumatic diseases
Background:
Rheumatic and musculoskeletal diseases (RMDs) are autoimmune-mediated chronic diseases affecting the joints around the body, involving an inappropriate immune response being launched against the tissues of the joint. These devastating diseases include rheumatoid arthritis (RA) and psoriatic arthritis (PsA). If insufficiently managed – or indeed in severe cases – these diseases can substantially impact a patient’s quality of life, leading to joint damage, dysfunction, and disability. However, numerous treatments exist for these diseases that control the immune-mediated factors driving disease, described as disease modifying anti-rheumatic drugs (DMARDs). Despite the success of these drugs for patients in achieving remission, they are not effective in all patients, and those who do not respond well to first-line treatments will typically be given an alternative drug on a trial-and-error basis until they respond successfully. Given the rapid and irreversible damage these diseases can induce even in the early stages, the need for early and aggressive treatment is fundamental for reaching a good outcome for the patient. Biomarkers can be employed to identify the most suitable drug to administer on a patient-to-patient basis, using these to predict who will respond to which drug. Incorporating biomarkers into the clinical management of these diseases is expected to be fundamental for precision medicine. These may come from multiple molecular sources. For example, currently used biomarkers include autoantibodies while this project primarily focuses on discovering biomarkers from the metabolome.
Methodology:
This project involved the secondary analyses of metabolomic and transcriptomic datasets generated from patients enrolled on multiple clinical studies. These include data from the Targeting Synovitis in Early Rheumatoid Arthritis (TaSER) (n=72), Treatment in the Rotterdam Early Arthritis Cohort (tREACH) (n=82), Characterising the Centralised Pain Phenotype in Chronic Rheumatic Disease (CENTAUR) (n=50) and Mayo Clinic - Hur et al. (2021) (n=64) – cohorts. The metabolic findings' translatability across cohorts was evaluated by incorporating datasets from various regions, including the United Kingdom, the Netherlands, and the United States of America.
These multi-omic datasets were analysed using an in-house workflow developed throughout this project’s duration, involving the use of the R environment to perform exploratory data analysis, supervised machine learning and an investigation of the biological relevance of the findings. Other methods were also employed, notably an exploration and evaluation of data integration methods.
Supervised machine learning was included to generate molecular profiles of treatment responses from multiple datasets. Doing so showed the value of combining multiple weakly-associated analytes in a model that could predict patient responses. However, an important component, the validation of these models, could not be performed in this work, although suggestions were made throughout of possible next steps.
Results and Discussion:
The analysis of the TaSER metabolomic data showed metabolites associated with methotrexate response after 3 months of treatment. Tryptophan and argininerelated metabolites were included in the metabolic model predictive of the 3-month response. While the model was not directly validated using subsequent datasets, including the tREACH and Mayo Clinic cohorts, additional features from these pathways were associated with treatment response. Included across cohorts were several tryptophan metabolites, including those derived from indole. Since these are largely produced via the gut microbiome it was suggested that the gut microbiome may influence the effectiveness of RMD treatments. Since RA and PsA were considered in this work as two archetypal RMDs, part of the project intended to investigate whether there were shared metabolic features found in association to treatment response in both diseases. These common metabolites were not clearly identified, although arginine-related metabolites were observed in models generated from the TaSER and CENTAUR cohorts in association with response to treatment in both conditions.
Owing to the limitations of the untargeted metabolomic approach, this work was expected to provide an initial step in understanding the involvement of arginine and tryptophan related pathways in influencing treatment response in RMDs. Not performed in this work, it was expected that targeted metabolomics would provide clearer insights into these metabolites, providing absolute quantification with the identification of these features of interest in the patient samples. It was expected that expanding the cohort sizes and incorporating other omics platforms would provide a greater understanding of the mechanisms of the resolution of RMDs and inform future therapeutic targets.
An important output from this project was the analytical pipeline developed and employed throughout for the omics analysis to inform biomarker discovery. Later work will involve generating a package in the R environment called markerHuntR. The R scripts for the functions with example datasets can be found at https://github.com/cambest202/markerHuntR.git. It is anticipated that the package will soon be described in more detail in a publication. The package will be available for researchers familiar with R to perform similar analyses as those described in this work
Recommended from our members
Molecular Signatures of Severe Acute Infections in Hospitalised African Children
Despite the improvement in global health over the last three decades, infectious diseases are still a major cause of morbidity and mortality globally, with the highest burden in developing countries and in children under 5 years. The heterogeneity in the clinical presentation of severely ill patients and the lack of rapid diagnostic and prognostic tools for the aetiological distinction of infecting pathogens complicates care decisions, leading to the indiscriminate use of antibiotics and consequently, increased antimicrobial resistance and mortality. Reducing childhood mortality and morbidity due to infectious diseases requires better diagnostics and prognostics, particularly in low-resource settings. Understanding the molecular processes that underlie different aetiologies and survival outcomes would enable the initiation of appropriate and timely treatment.
This thesis aimed to characterise protein and transcriptomic signatures associated with (i) different microbial aetiologies of severe acute infection and (ii) an elevated risk of death in African children hospitalised with severe acute infections. An untargeted high-performance liquid chromatography-tandem mass spectrometry (HPLC-MS/MS) approach was used to characterise the plasma proteome of children admitted to hospital with different infectious aetiologies. The resulting data was analysed using a collection of machine learning algorithms. The data was used to identify a protein signature with the highest power to discriminate between children with bacterial or viral infections. A novel protein microarray chip was then developed to validate the discovered signature in an independent cohort of hospitalised children with severe acute infections. Lastly, an RNA seq-based transcriptomics approach was used to characterise gene expression changes associated with post-admission outcomes. Using gene set enrichment and modular analysis, the correlation between gene expression and impending inpatient mortality was characterised.
Key findings from this work include a validated protein signature that could classify children according to bacterial or viral aetiology and a description of the immune response to severe disease that could be correlated with an increased likelihood of death in children admitted to hospital with severe acute infection. In the proteomics analysis, a random forest-derived protein signature made up of CRP, LBP, AGT, SERPINA1, SERPINA3, PON1 and HRG was identified that could correctly classify bacterial infection from viral infection in a held-out test set with AUC of 0.84 (95%CI 0.72 – 0.95). The performance of individual proteins in the signature was assessed in an independent cohort of children using a novel protein microarray chip and AGT had the best discriminatory power of 0.7 (95%CI 0.64 – 0.75) relative to a clinically-approved CRP test whose AUC was 0.76 (95%CI 0.68 - 0.84). In addition, there was an association between the expression levels of AGT, SERPINA1, PON1 and HRG with mortality. AGT was also associated with severe acute malnutrition in children. Functional analysis of the proteomics data showed that children with bacterial infections had an enrichment of acute phase responses and neutrophil degranulation pathways while platelet degranulation was negatively associated with bacterial infections.
Analysis of the transcriptomics data showed that imminent inpatient death was marked by a down regulation of CD8 T cell activation, type I interferon signalling and an over expression of the unfolded protein response and heme metabolism pathways
Exploring Gender Bias in Semantic Representations for Occupational Classification in NLP: Techniques and Mitigation Strategies
Gender bias in Natural Language Processing (NLP) models is a non-trivial problem that can perpetuate and amplify existing societal biases. This thesis investigates gender bias in occupation classification and explores the effectiveness of different debiasing methods for language models to reduce the impact of bias in the model’s representations. The study employs a data-driven empirical methodology focusing heavily on experimentation and result investigation. The study uses five distinct semantic representations and models with varying levels of complexity to classify the occupation of individuals based on their biographies
- …