85 research outputs found

    Improvement of Reproducibility in Cancer Classification Based on Pathway Markers and Subnetwork Markers

    Get PDF
    Identification of robust biomarkers for cancer prognosis based on gene expression data is an important research problem in translational genomics. The high-dimensional and small-sample-size data setting makes the prediction of biomarkers very challenging. Biomarkers have been identified based solely on gene expression data in the early stage. However, very few of them are jointly shared among independent studies. To overcome this irreproducibility, the integrative approach has been proposed to identify better biomarkers by overlaying gene expression data with available biological knowledge and investigating genes at the modular level. These module-based markers jointly analyze the gene expression activities of closely associated genes; for example, those that belong to a common biological pathway or genes whose protein products form a subnetwork module in a protein-protein interaction network. Several studies have shown that modular biomarkers lead to more accurate and reproducible prognostic predictions than single-gene markers and also provide the better understanding of the disease mechanisms. We propose novel methods for identifying modular markers which can be used to predict breast cancer prognosis. First, to improve identification of pathway markers, we propose using probabilistic pathway activity inference and relative expression analysis. Then, we propose a new method to identify subnetwork markers based on a message-passing clustering algorithm, and we further improve this method by incorporating topological attribute using association coefficients. Through extensive evaluations using multiple publicly available datasets, we demonstrate that all of the proposed methods can identify modular markers that are more reliable and reproducible across independent datasets compared to those identified by existing methods, hence they have the potential to become more effective prognostic cancer classifiers

    Pre-Clinical Drug Prioritization via Prognosis-Guided Genetic Interaction Networks

    Get PDF
    The high rates of failure in oncology drug clinical trials highlight the problems of using pre-clinical data to predict the clinical effects of drugs. Patient population heterogeneity and unpredictable physiology complicate pre-clinical cancer modeling efforts. We hypothesize that gene networks associated with cancer outcome in heterogeneous patient populations could serve as a reference for identifying drug effects. Here we propose a novel in vivo genetic interaction which we call ‘synergistic outcome determination’ (SOD), a concept similar to ‘Synthetic Lethality’. SOD is defined as the synergy of a gene pair with respect to cancer patients' outcome, whose correlation with outcome is due to cooperative, rather than independent, contributions of genes. The method combines microarray gene expression data with cancer prognostic information to identify synergistic gene-gene interactions that are then used to construct interaction networks based on gene modules (a group of genes which share similar function). In this way, we identified a cluster of important epigenetically regulated gene modules. By projecting drug sensitivity-associated genes on to the cancer-specific inter-module network, we defined a perturbation index for each drug based upon its characteristic perturbation pattern on the inter-module network. Finally, by calculating this index for compounds in the NCI Standard Agent Database, we significantly discriminated successful drugs from a broad set of test compounds, and further revealed the mechanisms of drug combinations. Thus, prognosis-guided synergistic gene-gene interaction networks could serve as an efficient in silico tool for pre-clinical drug prioritization and rational design of combinatorial therapies

    Incorporating topological information for predicting robust cancer subnetwork markers in human protein-protein interaction network

    Get PDF
    BACKGROUND: Discovering robust markers for cancer prognosis based on gene expression data is an important yet challenging problem in translational bioinformatics. By integrating additional information in biological pathways or a protein-protein interaction (PPI) network, we can find better biomarkers that lead to more accurate and reproducible prognostic predictions. In fact, recent studies have shown that, “modular markers,” that integrate multiple genes with potential interactions can improve disease classification and also provide better understanding of the disease mechanisms. RESULTS: In this work, we propose a novel algorithm for finding robust and effective subnetwork markers that can accurately predict cancer prognosis. To simultaneously discover multiple synergistic subnetwork markers in a human PPI network, we build on our previous work that uses affinity propagation, an efficient clustering algorithm based on a message-passing scheme. Using affinity propagation, we identify potential subnetwork markers that consist of discriminative genes that display coherent expression patterns and whose protein products are closely located on the PPI network. Furthermore, we incorporate the topological information from the PPI network to evaluate the potential of a given set of proteins to be involved in a functional module. Primarily, we adopt widely made assumptions that densely connected subnetworks may likely be potential functional modules and that proteins that are not directly connected but interact with similar sets of other proteins may share similar functionalities. CONCLUSIONS: Incorporating topological attributes based on these assumptions can enhance the prediction of potential subnetwork markers. We evaluate the performance of the proposed subnetwork marker identification method by performing classification experiments using multiple independent breast cancer gene expression datasets and PPI networks. We show that our method leads to the discovery of robust subnetwork markers that can improve cancer classification. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1224-1) contains supplementary material, which is available to authorized users

    From Correlation to Causality: Does Network Information improve Cancer Outcome Prediction?

    Get PDF
    Motivation: Disease progression in cancer can vary substantially between patients. Yet, patients often receive the same treatment. Recently, there has been much work on predicting disease progression and patient outcome variables from gene expression in order to personalize treatment options. A widely used approach is high-throughput experiments that aim to explore predictive signature genes which would provide identification of clinical outcome of diseases. Microarray data analysis helps to reveal underlying biological mechanisms of tumor progression, metastasis, and drug-resistance in cancer studies. Despite first diagnostic kits in the market, there are open problems such as the choice of random gene signatures or noisy expression data. The experimental or computational noise in data and limited tissue samples collected from patients might furthermore reduce the predictive power and biological interpretability of such signature genes. Nevertheless, signature genes predicted by different studies generally represent poor similarity; even for the same type of cancer. Integration of network information with gene expression data could provide more efficient signatures for outcome prediction in cancer studies. One approach to deal with these problems employs gene-gene relationships and ranks genes using the random surfer model of Google's PageRank algorithm. Unfortunately, the majority of published network-based approaches solely tested their methods on a small amount of datasets, questioning the general applicability of network-based methods for outcome prediction. Methods: In this thesis, I provide a comprehensive and systematically evaluation of a network-based outcome prediction approach -- NetRank - a PageRank derivative -- applied on several types of gene expression cancer data and four different types of networks. The algorithm identifies a signature gene set for a specific cancer type by incorporating gene network information with given expression data. To assess the performance of NetRank, I created a benchmark dataset collection comprising 25 cancer outcome prediction datasets from literature and one in-house dataset. Results: NetRank performs significantly better than classical methods such as foldchange or t-test as it improves the prediction performance in average for 7%. Besides, we are approaching the accuracy level of the authors' signatures by applying a relatively unbiased but fully automated process for biomarker discovery. Despite an order of magnitude difference in network size, a regulatory, a protein-protein interaction and two predicted networks perform equally well. Signatures as published by the authors and the signatures generated with classical methods do not overlap -- not even for the same cancer type -- whereas the network-based signatures strongly overlap. I analyze and discuss these overlapping genes in terms of the Hallmarks of cancer and in particular single out six transcription factors and seven proteins and discuss their specific role in cancer progression. Furthermore several tests are conducted for the identification of a Universal Cancer Signature. No Universal Cancer Signature could be identified so far, but a cancer-specific combination of general master regulators with specific cancer genes could be discovered that achieves the best results for all cancer types. As NetRank offers a great value for cancer outcome prediction, first steps for a secure usage of NetRank in a public cloud are described. Conclusion: Experimental evaluation of network-based methods on a gene expression benchmark dataset suggests that these methods are especially suited for outcome prediction as they overcome the problems of random gene signatures and noisy expression data. Through the combination of network information with gene expression data, network-based methods identify highly similar signatures over all cancer types, in contrast to classical methods that fail to identify highly common gene sets across the same cancer types. In general allows the integration of additional information in gene expression analysis the identification of more reliable, accurate and reproducible biomarkers and provides a deeper understanding of processes occurring in cancer development and progression.:1 Definition of Open Problems 2 Introduction 2.1 Problems in cancer outcome prediction 2.2 Network-based cancer outcome prediction 2.3 Universal Cancer Signature 3 Methods 3.1 NetRank algorithm 3.2 Preprocessing and filtering of the microarray data 3.3 Accuracy 3.4 Signature similarity 3.5 Classical approaches 3.6 Random signatures 3.7 Networks 3.8 Direct neighbor method 3.9 Dataset extraction 4 Performance of NetRank 4.1 Benchmark dataset for evaluation 4.2 The influence of NetRank parameters 4.3 Evaluation of NetRank 4.4 General findings 4.5 Computational complexity of NetRank 4.6 Discussion 5 Universal Cancer Signature 5.1 Signature overlap – a sign for Universal Cancer Signature 5.2 NetRank genes are highly connected and confirmed in literature 5.3 Hallmarks of Cancer 5.4 Testing possible Universal Cancer Signatures 5.5 Conclusion 6 Cloud-based Biomarker Discovery 6.1 Introduction to secure Cloud computing 6.2 Cancer outcome prediction 6.3 Security analysis 6.4 Conclusion 7 Contributions and Conclusion

    Computational identification of genetic subnetwork modules associated with maize defense response to Fusarium verticillioides

    Get PDF
    BACKGROUND: Maize, a crop of global significance, is vulnerable to a variety of biotic stresses resulting in economic losses. Fusarium verticillioides (teleomorph Gibberella moniliformis) is one of the key fungal pathogens of maize, causing ear rots and stalk rots. To better understand the genetic mechanisms involved in maize defense as well as F. verticillioides virulence, a systematic investigation of the host-pathogen interaction is needed. The aim of this study was to computationally identify potential maize subnetwork modules associated with its defense response against F. verticillioides. RESULTS: We obtained time-course RNA-seq data from B73 maize inoculated with wild type F. verticillioides and a loss-of-virulence mutant, and subsequently established a computational pipeline for network-based comparative analysis. Specifically, we first analyzed the RNA-seq data by a cointegration-correlation-expression approach, where maize genes were jointly analyzed with known F. verticillioides virulence genes to find candidate maize genes likely associated with the defense mechanism. We predicted maize co-expression networks around the selected maize candidate genes based on partial correlation, and subsequently searched for subnetwork modules that were differentially activated when inoculated with two different fungal strains. Based on our analysis pipeline, we identified four potential maize defense subnetwork modules. Two were directly associated with maize defense response and were associated with significant GO terms such as GO:0009817 (defense response to fungus) and GO:0009620 (response to fungus). The other two predicted modules were indirectly involved in the defense response, where the most significant GO terms associated with these modules were GO:0046914 (transition metal ion binding) and GO:0046686 (response to cadmium ion). CONCLUSION: Through our RNA-seq data analysis, we have shown that a network-based approach can enhance our understanding of the complicated host-pathogen interactions between maize and F. verticillioides by interpreting the transcriptome data in a system-oriented manner. We expect that the proposed analytic pipeline can also be adapted for investigating potential functional modules associated with host defense response in diverse plant-pathogen interactions

    Role of network topology based methods in discovering novel gene-phenotype associations

    Get PDF
    The cell is governed by the complex interactions among various types of biomolecules. Coupled with environmental factors, variations in DNA can cause alterations in normal gene function and lead to a disease condition. Often, such disease phenotypes involve coordinated dysregulation of multiple genes that implicate inter-connected pathways. Towards a better understanding and characterization of mechanisms underlying human diseases, here, I present GUILD, a network-based disease-gene prioritization framework. GUILD associates genes with diseases using the global topology of the protein-protein interaction network and an initial set of genes known to be implicated in the disease. Furthermore, I investigate the mechanistic relationships between disease-genes and explain the robustness emerging from these relationships. I also introduce GUILDify, an online and user-friendly tool which prioritizes genes for their association to any user-provided phenotype. Finally, I describe current state-of-the-art systems-biology approaches where network modeling has helped extending our view on diseases such as cancer.La cèl•lula es regeix per interaccions complexes entre diferents tipus de biomolècules. Juntament amb factors ambientals, variacions en el DNA poden causar alteracions en la funció normal dels gens i provocar malalties. Sovint, aquests fenotips de malaltia involucren una desregulació coordinada de múltiples gens implicats en vies interconnectades. Per tal de comprendre i caracteritzar millor els mecanismes subjacents en malalties humanes, en aquesta tesis presento el programa GUILD, una plataforma que prioritza gens relacionats amb una malaltia en concret fent us de la topologia de xarxe. A partir d’un conjunt conegut de gens implicats en una malaltia, GUILD associa altres gens amb la malaltia mitjancant la topologia global de la xarxa d’interaccions de proteïnes. A més a més, analitzo les relacions mecanístiques entre gens associats a malalties i explico la robustesa es desprèn d’aquesta anàlisi. També presento GUILDify, un servidor web de fácil ús per la priorització de gens i la seva associació a un determinat fenotip. Finalment, descric els mètodes més recents en què el model•latge de xarxes ha ajudat extendre el coneixement sobre malalties complexes, com per exemple a càncer

    Computational Identification of Functional Modules and Hub Genes Involved in Pathogenicity-Associated or Defense Response on Fusarium Verticillioides-Maize Interactions

    Get PDF
    Fusarium verticillioides is one of the key pathogens for stalk rot and ear rot on maize. While several genes associated with F. verticillioides pathogenicity and mycotoxin biosynthesis have been characterized, our knowledge of the cellular and genetic networks for these events is still very limited. Also, underlying molecular and cellular mechanisms associated with the maize defense response against the F. verticillioides pathogenicity are complex. Therefore, in order to better understand maize defense as well as F. verticillioides pathogenicity, an approach systematically investigating the host-pathogen interactions is needed. In this PhD study, a systematic network-based comparative analysis approach using large-scale F. verticillioides-maize RNA-seq data was applied to identify F. verticillioides pathogenicity-associated subnetwork modules and also key pathogenicity genes as well as maize subnetwork modules involved in the defense response. For each study, we constructed corresponding co-expression networks through partial correlation based on the given comparable conditions. For the first work, predicting F. verticillioides pathogenicity-associated subnetwork modules, we established a pipeline identifying the functional modules by a branch-out technique with probabilistic subnetwork activity inference. For identifying maize defense modules, we first collected candidate maize genes by comparing expression pattern of maize genes and that of the selected four F. verticillioides pathogenicity genes through cointegration, correlation, and expression level change. Then, we inferred potential subnetwork modules among the candidate genes by adopting the previously established pipeline. For identifying specific key F. verticillioides pathogenicity genes based on the predicted subnetwork modules, we analytically investigated on each gene in its predicted subnetwork module. In this investigation, we considered its influence on others, association to pathogenicity, and distinctive differentiation between the two conditions. Through our systematic investigation of the F. verticillioides–maize RNA-seq data, we identified pathogenicity-associated or defensive subnetwork modules, where the member genes were harmoniously coordinated and significantly differentially activated between the two different conditions. Also, we identified specific F. verticillioides pathogenicity genes playing a key role in the predicted pathogenicity-associated subnetwork modules

    Characterization of Fsr1-Interacting Complex and Its Downstream Pathogenic Subnetwork Modules in Fusarium verticillioides

    Get PDF
    Fusarium verticillioides is an ascomycete fungus responsible for stalk and ear rots of maize. Previously, we identified a striatin-like protein Fsr1 that plays a key role in stalk rot pathogenesis. In mammals, striatin interacts with multiple proteins to form a STRIPAK (striatin-interacting phosphatase and kinase) complex that regulates a variety of developmental processes and cellular mechanisms. In this study, we identified the homolog of a key mammalian STRIPAK component STRIP1/2 in F. verticillioides, FvStp1, that interacts with Fsr1 in vivo. Gene deletion analysis showed that FvStp1 is critical for F. verticillioides stalk rot virulence. In addition, we identified three proteins, designated FvCyp1, FvScp1 and FvSel1, that interact with the Fsr1 CC domain by yeast-two-hybrid screen. Importantly, FvCyp1, FvScp1, and FvSel1 co-localize to endomembrane structures, each having preferred localization in the cell, and they are all required for F. verticillioides virulence in stalk rot. Moreover, these proteins are necessary for proper localization of Fsr1 to endoplasmic reticulum (ER) and nuclear envelope. To further characterize genetic networks downstream of Fsr1, we performed RNA-Seq with maize B73 stalks inoculated with wild type and fsr1 mutant. We used a computationally efficient branch-out technique, along with an adopted probabilistic pathway activity inference method, to identify functional subnetwork modules likely involved in F. verticillioides virulence. We identified two putative hub genes, i.e., FvSYN1 and FvEBP1 identified from the potential virulence-associated subnetwork modules for functional validation and network robustness studies, such as gene knockout, virulence assays and qPCR studies. Our results provide evidence that FvSYN1 and FvEBP1 are important virulence genes that can infulence the expression of closely correlated genes, providing evidence that these are important hub genes of their respective subnetworks. Further characterization of FvSYN1 showed that FvSyn1 is important for regulating spore germination and hyphal morphology. Furthermore, FvSyn1 is localized to vacuoles, plasma membranes, and septa, and has been shown to play a role in the response to cell wall stressors. Motif-deletion studies showed that both N-terminal SynN domain and C-terminal SNARE domain of FvSyn1 are required for pathogenicity but dispensable for fumonisin production and sexual mating

    Characterization of Fsr1-Interacting Complex and Its Downstream Pathogenic Subnetwork Modules in Fusarium verticillioides

    Get PDF
    Fusarium verticillioides is an ascomycete fungus responsible for stalk and ear rots of maize. Previously, we identified a striatin-like protein Fsr1 that plays a key role in stalk rot pathogenesis. In mammals, striatin interacts with multiple proteins to form a STRIPAK (striatin-interacting phosphatase and kinase) complex that regulates a variety of developmental processes and cellular mechanisms. In this study, we identified the homolog of a key mammalian STRIPAK component STRIP1/2 in F. verticillioides, FvStp1, that interacts with Fsr1 in vivo. Gene deletion analysis showed that FvStp1 is critical for F. verticillioides stalk rot virulence. In addition, we identified three proteins, designated FvCyp1, FvScp1 and FvSel1, that interact with the Fsr1 CC domain by yeast-two-hybrid screen. Importantly, FvCyp1, FvScp1, and FvSel1 co-localize to endomembrane structures, each having preferred localization in the cell, and they are all required for F. verticillioides virulence in stalk rot. Moreover, these proteins are necessary for proper localization of Fsr1 to endoplasmic reticulum (ER) and nuclear envelope. To further characterize genetic networks downstream of Fsr1, we performed RNA-Seq with maize B73 stalks inoculated with wild type and fsr1 mutant. We used a computationally efficient branch-out technique, along with an adopted probabilistic pathway activity inference method, to identify functional subnetwork modules likely involved in F. verticillioides virulence. We identified two putative hub genes, i.e., FvSYN1 and FvEBP1 identified from the potential virulence-associated subnetwork modules for functional validation and network robustness studies, such as gene knockout, virulence assays and qPCR studies. Our results provide evidence that FvSYN1 and FvEBP1 are important virulence genes that can infulence the expression of closely correlated genes, providing evidence that these are important hub genes of their respective subnetworks. Further characterization of FvSYN1 showed that FvSyn1 is important for regulating spore germination and hyphal morphology. Furthermore, FvSyn1 is localized to vacuoles, plasma membranes, and septa, and has been shown to play a role in the response to cell wall stressors. Motif-deletion studies showed that both N-terminal SynN domain and C-terminal SNARE domain of FvSyn1 are required for pathogenicity but dispensable for fumonisin production and sexual mating
    corecore