299 research outputs found
Biologically Interpretable, Integrative Deep Learning for Cancer Survival Analysis
Identifying complex biological processes associated to patients\u27 survival time at the cellular and molecular level is critical not only for developing new treatments for patients but also for accurate survival prediction. However, highly nonlinear and high-dimension, low-sample size (HDLSS) data cause computational challenges in survival analysis. We developed a novel family of pathway-based, sparse deep neural networks (PASNet) for cancer survival analysis. PASNet family is a biologically interpretable neural network model where nodes in the network correspond to specific genes and pathways, while capturing nonlinear and hierarchical effects of biological pathways associated with certain clinical outcomes. Furthermore, integration of heterogeneous types of biological data from biospecimen holds promise of improving survival prediction and personalized therapies in cancer. Specifically, the integration of genomic data and histopathological images enhances survival predictions and personalized treatments in cancer study, while providing an in-depth understanding of genetic mechanisms and phenotypic patterns of cancer. Two proposed models will be introduced for integrating multi-omics data and pathological images, respectively. Each model in PASNet family was evaluated by comparing the performance of current cutting-edge models with The Cancer Genome Atlas (TCGA) cancer data. In the extensive experiments, PASNet family outperformed the benchmarking methods, and the outstanding performance was statistically assessed. More importantly, PASNet family showed the capability to interpret a multi-layered biological system. A number of biological literature in GBM supported the biological interpretation of the proposed models. The open-source software of PASNet family in PyTorch is publicly available at https://github.com/DataX-JieHao
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Radiomics analyses for outcome prediction in patients with locally advanced rectal cancer and glioblastoma multiforme using multimodal imaging data
Personalized treatment strategies for oncological patient management can improve outcomes of patient populations with heterogeneous treatment response. The implementation of such a concept requires the identification of biomarkers that can precisely predict treatment outcome. In the context of this thesis, we develop and validate biomarkers from multimodal imaging data for the outcome prediction after treatment in patients with locally advanced rectal cancer (LARC) and in patients with newly diagnosed glioblastoma multiforme (GBM), using conventional feature-based radiomics and deep-learning (DL) based radiomics. For LARC patients, we identify promising radiomics signatures combining computed tomography (CT) and T2-weighted (T2-w) magnetic resonance imaging (MRI) with clinical parameters to predict tumour response to neoadjuvant chemoradiotherapy (nCRT). Further, the analyses of externally available radiomics models for LARC reveal a lack of reproducibility and the need for standardization of the radiomics process. For patients with GBM, we use postoperative [11C] methionine positron emission tomography (MET-PET) and gadolinium-enhanced T1-w MRI for the detection of the residual tumour status and to prognosticate time-to-recurrence (TTR) and overall survival (OS). We show that DL models built on MET-PET have an improved diagnostic and prognostic value as compared to MRI
Optimization of treatment strategy
The purpose of this study was to predict the survival time of patients with malignant glioma after radiotherapy with high accuracy by considering additional clinical factors and optimize the prescription dose and treatment duration for individual patient by using a machine learning model. A total of 35 patients with malignant glioma were included in this study. The candidate features included 12 clinical features and 192 dose–volume histogram (DVH) features. The appropriate input features and parameters of the support vector machine (SVM) were selected using the genetic algorithm based on Akaike’s information criterion, i.e. clinical, DVH, and both clinical and DVH features. The prediction accuracy of the SVM models was evaluated through a leave-one-out cross-validation test with residual error, which was defined as the absolute difference between the actual and predicted survival times after radiotherapy. Moreover, the influences of various values of prescription dose and treatment duration on the predicted survival time were evaluated. The prediction accuracy was significantly improved with the combined use of clinical and DVH features compared with the separate use of both features (P < 0.01, Wilcoxon signed rank test). Mean ± standard deviation of the leave-one-out cross-validation using the combined clinical and DVH features, only clinical features and only DVH features were 104.7 ± 96.5, 144.2 ± 126.1 and 204.5 ± 186.0 days, respectively. The prediction accuracy could be improved with the combination of clinical and DVH features, and our results show the potential to optimize the treatment strategy for individual patients based on a machine learning model
Recommended from our members
Machine Learning Decision Tree Models for Differentiation of Posterior Fossa Tumors Using Diffusion Histogram Analysis and Structural MRI Findings.
We applied machine learning algorithms for differentiation of posterior fossa tumors using apparent diffusion coefficient (ADC) histogram analysis and structural MRI findings. A total of 256 patients with intra-axial posterior fossa tumors were identified, of whom 248 were included in machine learning analysis, with at least 6 representative subjects per each tumor pathology. The ADC histograms of solid components of tumors, structural MRI findings, and patients' age were applied to construct decision models using Classification and Regression Tree analysis. We also compared different machine learning classification algorithms (i.e., naïve Bayes, random forest, neural networks, support vector machine with linear and polynomial kernel) for dichotomized differentiation of the 5 most common tumors in our cohort: metastasis (n = 65), hemangioblastoma (n = 44), pilocytic astrocytoma (n = 43), ependymoma (n = 27), and medulloblastoma (n = 26). The decision tree model could differentiate seven tumor histopathologies with terminal nodes yielding up to 90% accurate classification rates. In receiver operating characteristics (ROC) analysis, the decision tree model achieved greater area under the curve (AUC) for differentiation of pilocytic astrocytoma (p = 0.020); and atypical teratoid/rhabdoid tumor ATRT (p = 0.001) from other types of neoplasms compared to the official clinical report. However, neuroradiologists' interpretations had greater accuracy in differentiating metastases (p = 0.001). Among different machine learning algorithms, random forest models yielded the highest accuracy in dichotomized classification of the 5 most common tumor types; and in multiclass differentiation of all tumor types random forest yielded an averaged AUC of 0.961 in training datasets, and 0.873 in validation samples. Our study demonstrates the potential application of machine learning algorithms and decision trees for accurate differentiation of brain tumors based on pretreatment MRI. Using easy to apply and understandable imaging metrics, the proposed decision tree model can help radiologists with differentiation of posterior fossa tumors, especially in tumors with similar qualitative imaging characteristics. In particular, our decision tree model provided more accurate differentiation of pilocytic astrocytomas from ATRT than by neuroradiologists in clinical reads
Machine learning analytics of resting-state functional connectivity predicts survival outcomes of glioblastoma multiforme patients
Glioblastoma multiforme (GBM) is the most frequently occurring brain malignancy. Due to its poor prognosis with currently available treatments, there is a pressing need for easily accessible, non-invasive techniques to help inform pre-treatment planning, patient counseling, and improve outcomes. In this study we determined the feasibility of resting-state functional connectivity (rsFC) to classify GBM patients into short-term and long-term survival groups with respect to reported median survival (14.6 months). We used a support vector machine with rsFC between regions of interest as predictive features. We employed a novel hybrid feature selection method whereby features were first filtered using correlations between rsFC and OS, and then using the established method of recursive feature elimination (RFE) to select the optimal feature subset. Leave-one-subject-out cross-validation evaluated the performance of models. Classification between short- and long-term survival accuracy was 71.9%. Sensitivity and specificity were 77.1 and 65.5%, respectively. The area under the receiver operating characteristic curve was 0.752 (95% CI, 0.62-0.88). These findings suggest that highly specific features of rsFC may predict GBM survival. Taken together, the findings of this study support that resting-state fMRI and machine learning analytics could enable a radiomic biomarker for GBM, augmenting care and planning for individual patients
Doctor of Philosophy
dissertationFor decades, researchers have explored the e ects of clinical and biomolecular factors on disease outcomes and have identi ed several candidate prognostic markers. Now, thanks to technological advances, researchers have at their disposal unprecedented quantities of biomolecular data that may add to existing knowledge about prognosis. However, commensurate challenges accompany these advances. For example, sophisticated informatics techniques are necessary to store, retrieve, and analyze large data sets. Additionally, advanced algorithms may be necessary to account for the joint e ects of tens, hundreds, or thousands of variables. Moreover, it is essential that analyses evaluating such algorithms be conducted in a systematic and consistent way to ensure validity, repeatability, and comparability across studies. For this study, a novel informatics framework was developed to address these needs. Within this framework, the user can apply existing, general-purpose algorithms that are designed to make multivariate predictions for large, hetergeneous data sets. The framework also contains logic for aggregating evidence across multiple algorithms and data categories via ensemble-learning approaches. In this study, this informatics framework was applied to developing multivariate prognisis models for human glioblastoma multiforme, a highly aggressive form of brain cancer that results in a median survival of only 12-15 months. Data for this study came from The Cancer Genome Atlas, a publicly available repository containing clinical, treatment, histological, and biomolecular variables for hundreds of patients. A variety of variable-selection approaches and multivariate algorithms were applied in a cross-validated design, and the quality of the resulting models was measured using the error rate, area under the receiver operating characteristic curve, and log-rank statistic. Although performance of the algorithms varied substantially across the data categories, some models performed well for all three metrics|particularly models based on age, treatments, and DNA methylation. Also encouragingly, the performance of ensemble-learning methods often approximated the best individual results. As multimodal data sets become more prevalent, analytic approaches that account for multiple data categories and algorithms will be increasingly relevant. This study suggests that such approaches hold promise to guide researchers and clinicians in their quest to improve outcomes for devastating diseases like GBM
- …