635 research outputs found

    Identification of a gene signature for discriminating metastatic from primary melanoma using a molecular interaction network approach

    Get PDF
    Understanding the biological factors that are characteristic of metastasis in melanoma remains a key approach to improving treatment. In this study, we seek to identify a gene signature of metastatic melanoma. We configured a new network-based computational pipeline, combined with a machine learning method, to mine publicly available transcriptomic data from melanoma patient samples. Our method is unbiased and scans a genome-wide protein-protein interaction network using a novel formulation for network scoring. Using this, we identify the most influential, differentially expressed nodes in metastatic as compared to primary melanoma. We evaluated the shortlisted genes by a machine learning method to rank them by their discriminatory capacities. From this, we identified a panel of 6 genes, ALDH1A1, HSP90AB1, KIT, KRT16, SPRR3 and TMEM45B whose expression values discriminated metastatic from primary melanoma (87% classification accuracy). In an independent transcriptomic data set derived from 703 primary melanomas, we showed that all six genes were significant in predicting melanoma specific survival (MSS) in a univariate analysis, which was also consistent with AJCC staging. Further, 3 of these genes, HSP90AB1, SPRR3 and KRT16 remained significant predictors of MSS in a joint analysis (HR = 2.3, P = 0.03) although, HSP90AB1 (HR = 1.9, P = 2 × 10−4) alone remained predictive after adjusting for clinical predictors

    Pathway-Based Multi-Omics Data Integration for Breast Cancer Diagnosis and Prognosis.

    Get PDF
    Ph.D. Thesis. University of Hawaiʻi at Mānoa 2017

    Typing tumors using pathways selected by somatic evolution.

    Get PDF
    Many recent efforts to analyze cancer genomes involve aggregation of mutations within reference maps of molecular pathways and protein networks. Here, we find these pathway studies are impeded by molecular interactions that are functionally irrelevant to cancer or the patient's tumor type, as these interactions diminish the contrast of driver pathways relative to individual frequently mutated genes. This problem can be addressed by creating stringent tumor-specific networks of biophysical protein interactions, identified by signatures of epistatic selection during tumor evolution. Using such an evolutionarily selected pathway (ESP) map, we analyze the major cancer genome atlases to derive a hierarchical classification of tumor subtypes linked to characteristic mutated pathways. These pathways are clinically prognostic and predictive, including the TP53-AXIN-ARHGEF17 combination in liver and CYLC2-STK11-STK11IP in lung cancer, which we validate in independent cohorts. This ESP framework substantially improves the definition of cancer pathways and subtypes from tumor genome data

    Computational methods for breast cancer diagnosis, prognosis, and treatment prediction

    Full text link
    The research presented here develops a robust reliability algorithm for the identification of reliable protein interactions that can be incorporated with a gene expression dataset to improve the algorithm performance, and novel breast cancer based diagnostic, prognostic and treatment prediction algorithms, respectively, which take into account the existing issues in order to provide a fair estimation of their performance

    Machine Learning Approaches for Breast Cancer Survivability Prediction

    Get PDF
    Breast cancer is one of the leading causes of cancer death in women. If not diagnosed early, the 5-year survival rate of patients is just about 26\%. Furthermore, patients with similar phenotypes can respond differently to the same therapies, which means the therapies might not work well for some of them. Identifying biomarkers that can help predict a cancer class with high accuracy is at the heart of breast cancer studies because they are targets of the treatments and drug development. Genomics data have been shown to carry useful information for breast cancer diagnosis and prognosis, as well as uncovering the disease’s mechanism. Machine learning methods are powerful tools to find such information. Feature selection methods are often utilized in supervised learning and unsupervised learning tasks to deal with data containing a large number of features in which only a small portion of them are useful to the classification task. On the other hand, analyzing only one type of data, without reference to the existing knowledge about the disease and the therapies, might mislead the findings. Effective data integration approaches are necessary to uncover this complex disease. In this thesis, we apply and develop machine learning methods to identify meaningful biomarkers for breast cancer survivability prediction after a certain treatment. They include applying feature selection methods on gene-expression data to derived gene-signatures, where the initial genes are collected concerning the mechanism of some drugs used breast cancer therapies. We also propose a new feature selection method, named PAFS, and apply it to discover accurate biomarkers. In addition, it has been increasingly supported that, sub-network biomarkers are more robust and accurate than gene biomarkers. We proposed two network-based approaches to identify sub-network biomarkers for breast cancer survivability prediction after a treatment. They integrate gene-expression data with protein-protein interactions during the optimal sub-network searching process and use cancer-related genes and pathways to prioritize the extracted sub-networks. The sub-network search space is usually huge and many proteins interact with thousands of other proteins. Thus, we apply some heuristics to avoid generating and evaluating redundant sub-networks

    Optimisation Models for Pathway Activity Inference in Cancer

    Get PDF
    BACKGROUND: With advances in high-throughput technologies, there has been an enormous increase in data related to profiling the activity of molecules in disease. While such data provide more comprehensive information on cellular actions, their large volume and complexity pose difficulty in accurate classification of disease phenotypes. Therefore, novel modelling methods that can improve accuracy while offering interpretable means of analysis are required. Biological pathways can be used to incorporate a priori knowledge of biological interactions to decrease data dimensionality and increase the biological interpretability of machine learning models. METHODOLOGY: A mathematical optimisation model is proposed for pathway activity inference towards precise disease phenotype prediction and is applied to RNA-Seq datasets. The model is based on mixed-integer linear programming (MILP) mathematical optimisation principles and infers pathway activity as the linear combination of pathway member gene expression, multiplying expression values with model-determined gene weights that are optimised to maximise discrimination of phenotype classes and minimise incorrect sample allocation. RESULTS: The model is evaluated on the transcriptome of breast and colorectal cancer, and exhibits solution results of good optimality as well as good prediction performance on related cancer subtypes. Two baseline pathway activity inference methods and three advanced methods are used for comparison. Sample prediction accuracy, robustness against noise expression data, and survival analysis suggest competitive prediction performance of our model while providing interpretability and insight on key pathways and genes. Overall, our work demonstrates that the flexible nature of mathematical programming lends itself well to developing efficient computational strategies for pathway activity inference and disease subtype prediction

    Identification of triple negative breast cancer genes using rough set based feature selection algorithm & ensemble classifier

    Get PDF
    In recent decades, microarray datasets have played an important role in triple negative breast cancer (TNBC) detection. Microarray data classification is a challenging process due to the presence of numerous redundant and irrelevant features. Therefore, feature selection becomes irreplaceable in this research field that eliminates non-required feature vectors from the system. The selection of an optimal number of features significantly reduces the NP hard problem, so a rough set-based feature selection algorithm is used in this manuscript for selecting the optimal feature values. Initially, the datasets related to TNBC are acquired from gene expression omnibuses like GSE45827, GSE76275, GSE65194, GSE3744, GSE21653, and GSE7904. Then, a robust multi-array average technique is used for eliminating the outlier samples of TNBC/non-TNBC which helps enhancing classification performance. Further, the pre-processed microarray data are fed to a rough set theory for optimal gene selection, and then the selected genes are given as the inputs to the ensemble classification technique for classifying low-risk genes (non-TNBC) and high-risk genes (TNBC). The experimental evaluation showed that the ensemble-based rough set model obtained a mean accuracy of 97.24%, which superior related to other comparative machine learning techniques.Web of Science12art. no. 5

    Doctor of Philosophy

    Get PDF
    dissertationCancer is extremely challenging to treat as every patient responds differently to treatments, depending on the specific molecular aberrations and deregulated signaling pathways driving their tumors. To address this heterogeneity and improve patient outcomes, therapies targeting specific pathways have been developed. The use of computational pathway analysis tools and genomic data can help guide the use of targeted therapies by assessing which pathways are deregulated in patient subpopulations and individual tumors. However, most pathway analysis tools do not account for complex interactions inherent to signaling pathways, and are not capable of integrating different types of genomic data (multiomic data). To address these limitations, this dissertation focuses on developing user-friendly multiomic gene set analysis tools, and utilizing bioinformatics tools to measure pathway activation for multiple pathways simultaneously in cancer. Chapter 2 first describes the need for genomics and pathway-based analyses in cancer using the commonly aberrant RAS pathway as an example. Chapter 3 utilizes pathway-based gene expression signatures and the pathway analysis toolkit ASSIGN to interrogate pathways from the growth factor receptor network (GFRN) in breast cancer. Two discrete phenotypes, which correlated with mechanisms of apoptosis and drug response, were characterized from GFRN activity. These phenotypes have the potential to pinpoint more effective breast cancer treatments. Chapter 4 describes the development of Gene Set Omic Analysis (GSOA), a novel gene set analysis tool which uses machine learning to identify pathway differences between two given biologicalconditions from multiomic data. GSOA demonstrated its capacity to identify pathways known to play a role in various cancers, and improves upon other methods because of its ability to decipher complex multigene and multiomic patterns. Chapter 5 describes GSOA-shiny, a novel web application for GSOA, which provides biologists with lack of bioinformatics experience access to multiomic gene set analysis from an easy-to-use interface. Overall, this dissertation presents novel breast cancer phenotypes with clinical implications, provides the research community with gene expression signatures for GFRN components, and presents an innovative method and web application for gene set analysisâ€"all contributing to furthering the field of personalized oncology

    Integrating genetics and epigenetics in breast cancer: biological insights, experimental, computational methods and therapeutic potential

    Get PDF

    Doctor of Philosophy

    Get PDF
    dissertationDespite the advancements in therapies, next-generation sequencing, and our knowledge, breast cancer is claiming hundreds of thousands of lives around the world every year. We have therapy options that work for only a fraction of the population due to the heterogeneity of the disease. It is still overwhelmingly challenging to match a patient with the appropriate available therapy for the optimal outcome. This dissertation work focuses on using biomedical informatics approaches to development of pathwaybased biomarkers to predict personalized drug response in breast cancer and assessment of feasibility integrating such biomarkers in current electronic health records to better implement genomics-based personalized medicine. The uncontrolled proliferation in breast cancer is frequently driven by HER2/PI3K/AKT/mTOR pathway. In this pathway, the AKT node plays an important role in controlling the signal transduction. In normal breast cells, the proliferation of cells is tightly maintained at a stable rate via AKT. However, in cancer, the balance is disrupted by amplification of the upstream growth factor receptors (GFR) such as HER2, IGF1R and/or deleterious mutations in PTEN, PI3KCA. Overexpression of AKT leads to increased proliferation and decreased apoptosis and autophagy, leading to cancer. Often these known amplifications and the mutation status associated with the disease progression are used as biomarkers for determining targeting therapies. However, downstream known or unknown mutations and activations in the pathways, crosstalk iv between the pathways, can make the targeted therapies ineffective. For example, one third of HER2 amplified breast cancer patients do not respond to HER2-targeting therapies such as trastuzumab, possibly due to downstream PTEN loss of mutation or PIK3CA mutations. To identify pathway aberration with better sensitivity and specificity, I first developed gene-expression-based pathway biomarkers that can identify the deregulation status of the pathway activation status in the sample of interest. Second, I developed drug response prediction models primarily based on the pathway activity, breast cancer subtype, proteomics and mutation data. Third, I assessed the feasibility of including gene expression data or transcriptomics data in current electronic health record so that we can implement such biomarkers in routine clinical care
    corecore