8 research outputs found

    Identification of hyperparameters with high effects on performance of deep neural networks: application to clinicopathological data of ovarian cancer

    No full text
    Recent advances in deep learning have emerged as an effective approach for precision medicine. The applications of deep learning to medicine have been applied mainly to medical image data but not clinicopathological data. One of challenges of deep learning model to clinicopathological data is to optimize hyperparameters to get high predictive power. In this study, we identified hyperparameters of deep learning model that have large effects on power. Specifically, we focused on predicting platinum-based chemotherapy response for ovarian cancer patients. As a performance metric, we used the area under the curve. We optimized six hyperparameters: the number of hidden layers, number of hidden units, optimization algorithm, weight initialization, activation function, and dropout rate. We also identified significant interaction effects between hyperparameters. We successfully found the combination of hyperparameters having large effects on prediction. These optimal combinations are expected to increase the prediction accuracy for the response to chemotherapy for a variety of cancer patients.N

    Kernel-based hierarchical structural component models for pathway analysis

    No full text
    Motivation: Pathway analyses have led to more insight into the underlying biological functions related to the phenotype of interest in various types of omics data. Pathway-based statistical approaches have been actively developed, but most of them do not consider correlations among pathways. Because it is well known that there are quite a few biomarkers that overlap between pathways, these approaches may provide misleading results. In addition, most pathway-based approaches tend to assume that biomarkers within a pathway have linear associations with the phenotype of interest, even though the relationships are more complex. Results: To model complex effects including non-linear effects, we propose a new approach, Hierarchical structural CoMponent analysis using Kernel (HisCoM-Kernel). The proposed method models non-linear associations between biomarkers and phenotype by extending the kernel machine regression and analyzes entire pathways simultaneously by using the biomarker-pathway hierarchical structure. HisCoM-Kernel is a flexible model that can be applied to various omics data. It was successfully applied to three omics datasets generated by different technologies. Our simulation studies showed that HisCoM-Kernel provided higher statistical power than other existing pathway-based methods in all datasets. The application of HisCoM-Kernel to three types of omics dataset showed its superior performance compared to existing methods in identifying more biologically meaningful pathways, including those reported in previous studies.N

    Clinicopathologic and protein markers distinguishing the "polymerase epsilon exonuclease" from the "copy number low" subtype of endometrial cancer

    No full text
    (https:// properly Objective: The need to perform genetic sequencing to diagnose the polymerase epsilon exonuclease (POLE) subtype of endometrial cancer (EC) hinders the adoption of molecular classification. We investigated clinicopathologic and protein markers that distinguish the POLE from the copy number (CN)-low subtype in EC. Methods: Ninety-one samples (15 POLE, 76 CN-low) were selected from The Cancer Genome Atlas EC dataset. Clinicopathologic and normalized reverse phase protein array expression data were analyzed for associations with the subtypes. A logistic model including selected markers was constructed by stepwise selection using area under the curve (AUC) from 5-fold cross-validation (CV). The selected markers were validated using immunohistochemistry (IHC) in a separate cohort. Results: Body mass index (BMI) and tumor grade were significantly associated with the POLE subtype. With BMI and tumor grade as covariates, 5 proteins were associated with the EC subtypes. The stepwise selection method identified BMI, cyclin B1, caspase 8, and X-box binding protein 1 (XBP1) as markers distinguishing the POLE from the CN-low subtype. The mean of CV AUC, sensitivity, specificity, and balanced accuracy of the selected model were 0.97, 0.91, 0.87, and 0.89, respectively. IHC validation showed that cyclin B1 expression was significantly higher in the POLE than in the CN-low subtype and receiver operating characteristic curve of cyclin B1 expression in IHC revealed AUC of 0.683. Conclusion: BMI and expression of cyclin B1, caspase 8, and XBP1 are candidate markers distinguishing the POLE from the CN-low subtype. Cyclin B1 IHC may replace POLE sequencing in molecular classification of EC.N

    Nonalcoholic fatty liver disease and early prediction of gestational diabetes mellitus using machine learning methods

    No full text
    Background/Aims: To develop an early prediction model for gestational diabetes mellitus (GDM) using machine learning and to evaluate whether the inclusion of nonalcoholic fatty liver disease (NAFLD)-associated variables increases the performance of model. Methods: This prospective cohort study evaluated pregnant women for NAFLD using ultrasound at 10-14 weeks and screened them for GDM at 24-28 weeks of gestation. The clinical variables before 14 weeks were used to develop prediction models for GDM (setting 1, conventional risk factors; setting 2, addition of new risk factors in recent guidelines; setting 3, addition of routine clinical variables; setting 4, addition of NALFD-associated variables, including the presence of NAFLD and laboratory results; and setting 5, top 11 variables identified from a stepwise variable selection method). The predictive models were constructed using machine learning methods, including logistic regression, random forest, support vector machine, and deep neural networks. Results: Among 1,443 women, 86 (6.0%) were diagnosed with GDM. The highest performing prediction model among settings 1-4 was setting 4, which included both clinical and NAFLD-associated variables (area under the receiver operating characteristic curve [AUC] 0.563-0.697 in settings 1-3 vs. 0.740-0.781 in setting 4). Setting 5, with top 11 variables (which included NAFLD and hepatic steatosis index), showed similar predictive power to setting 4 (AUC 0.719-0.819 in setting 5, P=not significant between settings 4 and 5). Conclusions: We developed an early prediction model for GDM using machine learning. The inclusion of NAFLD-associated variables significantly improved the performance of GDM prediction.Y
    corecore