1,160 research outputs found
The efficacy of various machine learning models for multi-class classification of RNA-seq expression data
Late diagnosis and high costs are key factors that negatively impact the care
of cancer patients worldwide. Although the availability of biological markers
for the diagnosis of cancer type is increasing, costs and reliability of tests
currently present a barrier to the adoption of their routine use. There is a
pressing need for accurate methods that enable early diagnosis and cover a
broad range of cancers. The use of machine learning and RNA-seq expression
analysis has shown promise in the classification of cancer type. However,
research is inconclusive about which type of machine learning models are
optimal. The suitability of five algorithms were assessed for the
classification of 17 different cancer types. Each algorithm was fine-tuned and
trained on the full array of 18,015 genes per sample, for 4,221 samples (75 %
of the dataset). They were then tested with 1,408 samples (25 % of the dataset)
for which cancer types were withheld to determine the accuracy of prediction.
The results show that ensemble algorithms achieve 100% accuracy in the
classification of 14 out of 17 types of cancer. The clustering and
classification models, while faster than the ensembles, performed poorly due to
the high level of noise in the dataset. When the features were reduced to a
list of 20 genes, the ensemble algorithms maintained an accuracy above 95% as
opposed to the clustering and classification models.Comment: 12 pages, 4 figures, 3 tables, conference paper: Computing Conference
2019, published at
https://link.springer.com/chapter/10.1007/978-3-030-22871-2_6
Computational Pathology: A Survey Review and The Way Forward
Computational Pathology CPath is an interdisciplinary science that augments
developments of computational approaches to analyze and model medical
histopathology images. The main objective for CPath is to develop
infrastructure and workflows of digital diagnostics as an assistive CAD system
for clinical pathology, facilitating transformational changes in the diagnosis
and treatment of cancer that are mainly address by CPath tools. With
evergrowing developments in deep learning and computer vision algorithms, and
the ease of the data flow from digital pathology, currently CPath is witnessing
a paradigm shift. Despite the sheer volume of engineering and scientific works
being introduced for cancer image analysis, there is still a considerable gap
of adopting and integrating these algorithms in clinical practice. This raises
a significant question regarding the direction and trends that are undertaken
in CPath. In this article we provide a comprehensive review of more than 800
papers to address the challenges faced in problem design all-the-way to the
application and implementation viewpoints. We have catalogued each paper into a
model-card by examining the key works and challenges faced to layout the
current landscape in CPath. We hope this helps the community to locate relevant
works and facilitate understanding of the field's future directions. In a
nutshell, we oversee the CPath developments in cycle of stages which are
required to be cohesively linked together to address the challenges associated
with such multidisciplinary science. We overview this cycle from different
perspectives of data-centric, model-centric, and application-centric problems.
We finally sketch remaining challenges and provide directions for future
technical developments and clinical integration of CPath
(https://github.com/AtlasAnalyticsLab/CPath_Survey).Comment: Accepted in Elsevier Journal of Pathology Informatics (JPI) 202
INTEGRATIVE ANALYSIS OF OMICS DATA IN ADULT GLIOMA AND OTHER TCGA CANCERS TO GUIDE PRECISION MEDICINE
Transcriptomic profiling and gene expression signatures have been widely applied as effective approaches for enhancing the molecular classification, diagnosis, prognosis or prediction of therapeutic response towards personalized therapy for cancer patients. Thanks to modern genome-wide profiling technology, scientists are able to build engines leveraging massive genomic variations and integrating with clinical data to identify “at risk” individuals for the sake of prevention, diagnosis and therapeutic interventions. In my graduate work for my Ph.D. thesis, I have investigated genomic sequencing data mining to comprehensively characterise molecular classifications and aberrant genomic events associated with clinical prognosis and treatment response, through applying high-dimensional omics genomic data to promote the understanding of gene signatures and somatic molecular alterations contributing to cancer progression and clinical outcomes. Following this motivation, my dissertation has been focused on the following three topics in translational genomics.
1) Characterization of transcriptomic plasticity and its association with the tumor microenvironment in glioblastoma (GBM). I have integrated transcriptomic, genomic, protein and clinical data to increase the accuracy of GBM classification, and identify the association between the GBM mesenchymal subtype and reduced tumorpurity, accompanied with increased presence of tumor-associated microglia. Then I have tackled the sole source of microglial as intrinsic tumor bulk but not their corresponding neurosphere cells through both transcriptional and protein level analysis using a panel of sphere-forming glioma cultures and their parent GBM samples.FurthermoreI have demonstrated my hypothesis through longitudinal analysis of paired primary and recurrent GBM samples that the phenotypic alterations of GBM subtypes are not due to intrinsic proneural-to-mesenchymal transition in tumor cells, rather it is intertwined with increased level of microglia upon disease recurrence. Collectively I have elucidated the critical role of tumor microenvironment (Microglia and macrophages from central nervous system) contributing to the intra-tumor heterogeneity and accurate classification of GBM patients based on transcriptomic profiling, which will not only significantly impact on clinical perspective but also pave the way for preclinical cancer research.
2) Identification of prognostic gene signatures that stratify adult diffuse glioma patientsharboring1p/19q co-deletions. I have compared multiple statistical methods and derived a gene signature significantly associated with survival by applying a machine learning algorithm. Then I have identified inflammatory response and acetylation activity that associated with malignant progression of 1p/19q co-deleted glioma. In addition, I showed this signature translates to other types of adult diffuse glioma, suggesting its universality in the pathobiology of other subset gliomas. My efforts on integrative data analysis of this highly curated data set usingoptimizedstatistical models will reflect the pending update to WHO classification system oftumorsin the central nervous system (CNS).
3) Comprehensive characterization of somatic fusion transcripts in Pan-Cancers. I have identified a panel of novel fusion transcripts across all of TCGA cancer types through transcriptomic profiling. Then I have predicted fusion proteins with kinase activity and hub function of pathway network based on the annotation of genetically mobile domains and functional domain architectures. I have evaluated a panel of in -frame gene fusions as potential driver mutations based on network fusion centrality hypothesis. I have also characterised the emerging complexity of genetic architecture in fusion transcripts through integrating genomic structure and somatic variants and delineating the distinct genomic patterns of fusion events across different cancer types. Overall my exploration of the pathogenetic impact and clinical relevance of candidate gene fusions have provided fundamental insights into the management of a subset of cancer patients by predicting the oncogenic signalling and specific drug targets encoded by these fusion genes.
Taken together, the translational genomic research I have conducted during my Ph.D. study will shed new light on precision medicine and contribute to the cancer research community. The novel classification concept, gene signature and fusion transcripts I have identified will address several hotly debated issues in translational genomics, such as complex interactions between tumor bulks and their adjacent microenvironments, prognostic markers for clinical diagnostics and personalized therapy, distinct patterns of genomic structure alterations and oncogenic events in different cancer types, therefore facilitating our understanding of genomic alterations and moving us towards the development of precision medicine
Machine learning and computational methods to identify molecular and clinical markers for complex diseases – case studies in cancer and obesity
In biomedical research, applied machine learning and bioinformatics are the essential disciplines heavily involved in translating data-driven findings into medical practice. This task is especially accomplished by developing computational tools and algorithms assisting in detection and clarification of underlying causes of the diseases. The continuous advancements in high-throughput technologies coupled with the recently promoted data sharing policies have contributed to presence of a massive wealth of data with remarkable potential to improve human health care. In concordance with this massive boost in data production, innovative data analysis tools and methods are required to meet the growing demand. The data analyzed by bioinformaticians and computational biology experts can be broadly divided into molecular and conventional clinical data categories. The aim of this thesis was to develop novel statistical and machine learning tools and to incorporate the existing state-of-the-art methods to analyze bio-clinical data with medical applications. The findings of the studies demonstrate the impact of computational approaches in clinical decision making by improving patients risk stratification and prediction of disease outcomes.
This thesis is comprised of five studies explaining method development for 1) genomic data, 2) conventional clinical data and 3) integration of genomic and clinical data. With genomic data, the main focus is detection of differentially expressed genes as the most common task in transcriptome profiling projects. In addition to reviewing available differential expression tools, a data-adaptive statistical method called Reproducibility Optimized Test Statistic (ROTS) is proposed for detecting differential expression in RNA-sequencing studies. In order to prove the efficacy of ROTS in real biomedical applications, the method is used to identify prognostic markers in clear cell renal cell carcinoma (ccRCC). In addition to previously known markers, novel genes with potential prognostic and therapeutic role in ccRCC are detected. For conventional clinical data, ensemble based predictive models are developed to provide clinical decision support in treatment of patients with metastatic castration resistant prostate cancer (mCRPC). The proposed predictive models cover treatment and survival stratification tasks for both trial-based and realworld patient cohorts. Finally, genomic and conventional clinical data are integrated to demonstrate the importance of inclusion of genomic data in predictive ability of clinical models. Again, utilizing ensemble-based learners, a novel model is proposed to predict adulthood obesity using both genetic and social-environmental factors.
Overall, the ultimate objective of this work is to demonstrate the importance of clinical bioinformatics and machine learning for bio-clinical marker discovery in complex disease with high heterogeneity. In case of cancer, the interpretability of clinical models strongly depends on predictive markers with high reproducibility supported by validation data. The discovery of these markers would increase chance of early detection and improve prognosis assessment and treatment choice
Artificial intelligence in digital pathology: a diagnostic test accuracy systematic review and meta-analysis
Ensuring diagnostic performance of AI models before clinical use is key to
the safe and successful adoption of these technologies. Studies reporting AI
applied to digital pathology images for diagnostic purposes have rapidly
increased in number in recent years. The aim of this work is to provide an
overview of the diagnostic accuracy of AI in digital pathology images from all
areas of pathology. This systematic review and meta-analysis included
diagnostic accuracy studies using any type of artificial intelligence applied
to whole slide images (WSIs) in any disease type. The reference standard was
diagnosis through histopathological assessment and / or immunohistochemistry.
Searches were conducted in PubMed, EMBASE and CENTRAL in June 2022. We
identified 2976 studies, of which 100 were included in the review and 48 in the
full meta-analysis. Risk of bias and concerns of applicability were assessed
using the QUADAS-2 tool. Data extraction was conducted by two investigators and
meta-analysis was performed using a bivariate random effects model. 100 studies
were identified for inclusion, equating to over 152,000 whole slide images
(WSIs) and representing many disease types. Of these, 48 studies were included
in the meta-analysis. These studies reported a mean sensitivity of 96.3% (CI
94.1-97.7) and mean specificity of 93.3% (CI 90.5-95.4) for AI. There was
substantial heterogeneity in study design and all 100 studies identified for
inclusion had at least one area at high or unclear risk of bias. This review
provides a broad overview of AI performance across applications in whole slide
imaging. However, there is huge variability in study design and available
performance data, with details around the conduct of the study and make up of
the datasets frequently missing. Overall, AI offers good accuracy when applied
to WSIs but requires more rigorous evaluation of its performance.Comment: 26 pages, 5 figures, 8 tables + Supplementary material
- …