10 research outputs found
MACHINE LEARNING APPROACHES FOR BIOMARKER IDENTIFICATION AND SUBGROUP DISCOVERY FOR POST-TRAUMATIC STRESS DISORDER
Post-traumatic stress disorder (PTSD) is a psychiatric disorder caused by environmental and genetic factors resulting from alterations in genetic variation, epigenetic changes and neuroimaging characteristics. There is a pressing need to identify reliable molecular and physiological biomarkers for accurate diagnosis, prognosis, and treatment, as well to deepen the understanding of PTSD pathophysiology. Machine learning methods are widely used to infer patterns from biological data, identify biomarkers, and make predictions. The objective of this research is to apply machine learning methods for the accurate classification of human diseases from genome-scale datasets, focusing primarily on PTSD.The DoD-funded Systems Biology of PTSD Consortium has recruited combat veterans with and without PTSD for measurement of molecular and physiological data from blood or urine samples with the goal of identifying accurate and specific PTSD biomarkers. As a member of the Consortium with access to these PTSD multiple omics datasets, we first completed a project titled Clinical Subgroup-Specific PTSD Classification and Biomarker Discovery. We applied machine learning approaches to these data to build classification models consisting of molecular and clinical features to predict PTSD status. We also identified candidate biomarkers for diagnosis, which improves our understanding of PTSD pathogenesis. In a second project, entitled Multi-Omic PTSD Subgroup Identification and Clinical Characterization, we applied methods for integrating multiple omics datasets to investigate the complex, multivariate nature of the biological systems underlying PTSD. We identified an optimal 2 PTSD subgroups using two different machine learning approaches from 82 PTSD positive samples, and we found that the subgroups exhibited different remitting behavior as inferred from subjects recalled at a later time point. The results from our association, differential expression, and classification analyses demonstrated the distinct clinical and molecular features characterizing these subgroups.Taken together, our work has advanced our understanding of PTSD biomarkers and subgroups through the use of machine learning approaches. Results from our work should strongly contribute to the precise diagnosis and eventual treatment of PTSD, as well as other diseases. Future work will involve continuing to leverage these results to enable precision medicine for PTSD
Multi-Omic Data Integration to Stratify Population in Hepatocellular Carcinoma
M.S. University of Hawaii at Manoa 2016.Includes bibliographical references
Prognostic analysis of histopathological images using pre-trained convolutional neural networks: Application to hepatocellular carcinoma
Histopathological images contain rich phenotypic descriptions of the molecular processes underlying disease progression. Convolutional neural networks, state-of-the-art image analysis techniques in computer vision, automatically learn representative features from such images which can be useful for disease diagnosis, prognosis, and subtyping. Hepatocellular carcinoma (HCC) is the sixth most common type of primary liver malignancy. Despite the high mortality rate of HCC, little previous work has made use of CNN models to explore the use of histopathological images for prognosis and clinical survival prediction of HCC. We applied three pre-trained CNN models-VGG 16, Inception V3 and ResNet 50-to extract features from HCC histopathological images. Sample visualization and classification analyses based on these features showed a very clear separation between cancer and normal samples. In a univariate Cox regression analysis, 21.4% and 16% of image features on average were significantly associated with overall survival (OS) and disease-free survival (DFS), respectively. We also observed significant correlations between these features and integrated biological pathways derived from gene expression and copy number variation. Using an elastic net regularized Cox Proportional Hazards model of OS constructed from Inception image features, we obtained a concordance index (C-index) of 0.789 and a significant log-rank test (p = 7.6E-18). We also performed unsupervised classification to identify HCC subgroups from image features. The optimal two subgroups discovered using Inception model image features showed significant differences in both overall (C-index = 0.628 and p = 7.39E-07) and DFS (C-index = 0.558 and p = 0.012). Our work demonstrates the utility of extracting image features using pre-trained models by using them to build accurate prognostic models of HCC as well as highlight significant correlations between these features, clinical survival, and relevant biological pathways. Image features extracted from HCC histopathological images using the pre-trained CNN models VGG 16, Inception V3 and ResNet 50 can accurately distinguish normal and cancer samples. Furthermore, these image features are significantly correlated with survival and relevant biological pathways
GEOlimma: differential expression analysis and feature selection using pre-existing microarray data
Background: Differential expression and feature selection analyses are essential steps for the development of accurate diagnostic/prognostic classifiers of complicated human diseases using transcriptomics data. These steps are particularly challenging due to the curse of dimensionality and the presence of technical and biological noise. A promising strategy for overcoming these challenges is the incorporation of pre-existing transcriptomics data in the identification of differentially expressed (DE) genes. This approach has the potential to improve the quality of selected genes, increase classification performance, and enhance biological interpretability. While a number of methods have been developed that use pre-existing data for differential expression analysis, existing methods do not leverage the identities of experimental conditions to create a robust metric for identifying DE genes. Results: In this study, we propose a novel differential expression and feature selection methodâGEOlimmaâwhich combines pre-existing microarray data from the Gene Expression Omnibus (GEO) with the widely-applied Limma method for differential expression analysis. We first quantify differential gene expression across 2481 pairwise comparisons from 602 curated GEO Datasets, and we convert differential expression frequencies to DE prior probabilities. Genes with high DE prior probabilities show enrichment in cell growth and death, signal transduction, and cancer-related biological pathways, while genes with low prior probabilities were enriched in sensory system pathways. We then applied GEOlimma to four differential expression comparisons within two human disease datasets and performed differential expression, feature selection, and supervised classification analyses. Our results suggest that use of GEOlimma provides greater experimental power to detect DE genes compared to Limma, due to its increased effective sample size. Furthermore, in a supervised classification analysis using GEOlimma as a feature selection method, we observed similar or better classification performance than Limma given small, noisy subsets of an asthma dataset. Conclusions: Our results demonstrate that GEOlimma is a more effective method for differential gene expression and feature selection analyses compared to the standard Limma method. Due to its focus on gene-level differential expression, GEOlimma also has the potential to be applied to other high-throughput biological datasets
Multi-omic biomarker identification and validation for diagnosing warzone-related post-traumatic stress disorder
Post-traumatic stress disorder (PTSD) impacts many veterans and active duty soldiers, but diagnosis can be problematic due to biases in self-disclosure of symptoms, stigma within military populations, and limitations identifying those at risk. Prior studies suggest that PTSD may be a systemic illness, affecting not just the brain, but the entire body. Therefore, disease signals likely span multiple biological domains, including genes, proteins, cells, tissues, and organism-level physiological changes. Identification of these signals could aid in diagnostics, treatment decision-making, and risk evaluation. In the search for PTSD diagnostic biomarkers, we ascertained over one million molecular, cellular, physiological, and clinical features from three cohorts of male veterans. In a discovery cohort of 83 warzone-related PTSD cases and 82 warzone-exposed controls, we identified a set of 343 candidate biomarkers. These candidate biomarkers were selected from an integrated approach using (1) data-driven methods, including Support Vector Machine with Recursive Feature Elimination and other standard or published methodologies, and (2) hypothesis-driven approaches, using previous genetic studies for polygenic risk, or other PTSD-related literature. After reassessment of ~30% of these participants, we refined this set of markers from 343 to 28, based on their performance and ability to track changes in phenotype over time. The final diagnostic panel of 28 features was validated in an independent cohort (26 cases, 26 controls) with good performance (AUC = 0.80, 81% accuracy, 85% sensitivity, and 77% specificity). The identification and validation of this diverse diagnostic panel represents a powerful and novel approach to improve accuracy and reduce bias in diagnosing combat-related PTSD
Prognosis and antiplatelet therapy of small single subcortical infarcts in penetrating artery territory: a post hoc analysis of the Third China National Stroke Registry
Background Small single subcortical infarction (SSSI) may be classified as parent artery disease-related or only branch involved according to the stenosis of parent artery. The study aimed to evaluate short-term and long-term prognoses and the effectiveness of antiplatelet therapy in SSSI.Methods We prospectively enrolled 2890 patients with SSSI from the Third China National Stroke Registry (CNSR-III) database from August 2015 to March 2018. We assessed clinical outcomes and antiplatelet treatment effects in patients with SSSI with and without parent artery stenosis (PAS) identified by magnetic resonance angiography.Results Among 2890 patients with SSSI in the perforator territory of the middle cerebral artery and the basilar artery, there were 680 (23.53%) patients with PAS and 2210 (76.47%) patients without PAS, respectively. After adjusting for potential confounders, the PAS group had a greater initial stroke severity (OR 1.262, 95%âCI 1.058 to 1.505; p=0.0097) and a higher risk of ischaemic stroke recurrence at 3 months (OR 2.266, 95%âCI 1.631 to 3.149; p<0.0001) and 1âyear (OR 2.054, 95%âCI 1.561 to 2.702; p<0.0001), as well as composite vascular events at 3 months (OR 2.306, 95%âCI 1.674 to 3.178; p<0.0001) and 1âyear (OR 1.983, 95%âCI 1.530 to 2.570; p<0.0001), compared with the non-PAS group. In both groups, dual antiplatelet therapy was not superior to single antiplatelet therapy in preventing stroke recurrence, composite vascular events and disability.Conclusion PAS related to significantly higher rates of short-term and long-term stroke recurrence and composite vascular events, suggesting heterogeneous mechanisms in SSSI subgroups. The effectiveness of antiplatelet therapy for SSSI needs further investigation