9,059 research outputs found

    From data towards knowledge: Revealing the architecture of signaling systems by unifying knowledge mining and data mining of systematic perturbation data

    Get PDF
    Genetic and pharmacological perturbation experiments, such as deleting a gene and monitoring gene expression responses, are powerful tools for studying cellular signal transduction pathways. However, it remains a challenge to automatically derive knowledge of a cellular signaling system at a conceptual level from systematic perturbation-response data. In this study, we explored a framework that unifies knowledge mining and data mining approaches towards the goal. The framework consists of the following automated processes: 1) applying an ontology-driven knowledge mining approach to identify functional modules among the genes responding to a perturbation in order to reveal potential signals affected by the perturbation; 2) applying a graph-based data mining approach to search for perturbations that affect a common signal with respect to a functional module, and 3) revealing the architecture of a signaling system organize signaling units into a hierarchy based on their relationships. Applying this framework to a compendium of yeast perturbation-response data, we have successfully recovered many well-known signal transduction pathways; in addition, our analysis have led to many hypotheses regarding the yeast signal transduction system; finally, our analysis automatically organized perturbed genes as a graph reflecting the architect of the yeast signaling system. Importantly, this framework transformed molecular findings from a gene level to a conceptual level, which readily can be translated into computable knowledge in the form of rules regarding the yeast signaling system, such as "if genes involved in MAPK signaling are perturbed, genes involved in pheromone responses will be differentially expressed"

    How to understand the cell by breaking it: network analysis of gene perturbation screens

    Get PDF
    Modern high-throughput gene perturbation screens are key technologies at the forefront of genetic research. Combined with rich phenotypic descriptors they enable researchers to observe detailed cellular reactions to experimental perturbations on a genome-wide scale. This review surveys the current state-of-the-art in analyzing perturbation screens from a network point of view. We describe approaches to make the step from the parts list to the wiring diagram by using phenotypes for network inference and integrating them with complementary data sources. The first part of the review describes methods to analyze one- or low-dimensional phenotypes like viability or reporter activity; the second part concentrates on high-dimensional phenotypes showing global changes in cell morphology, transcriptome or proteome.Comment: Review based on ISMB 2009 tutorial; after two rounds of revisio

    Network-based analysis of gene expression data

    Get PDF
    The methods of molecular biology for the quantitative measurement of gene expression have undergone a rapid development in the past two decades. High-throughput assays with the microarray and RNA-seq technology now enable whole-genome studies in which several thousands of genes can be measured at a time. However, this has also imposed serious challenges on data storage and analysis, which are subject of the young, but rapidly developing field of computational biology. To explain observations made on such a large scale requires suitable and accordingly scaled models of gene regulation. Detailed models, as available for single genes, need to be extended and assembled in larger networks of regulatory interactions between genes and gene products. Incorporation of such networks into methods for data analysis is crucial to identify molecular mechanisms that are drivers of the observed expression. As methods for this purpose emerge in parallel to each other and without knowing the standard of truth, results need to be critically checked in a competitive setup and in the context of the available rich literature corpus. This work is centered on and contributes to the following subjects, each of which represents important and distinct research topics in the field of computational biology: (i) construction of realistic gene regulatory network models; (ii) detection of subnetworks that are significantly altered in the data under investigation; and (iii) systematic biological interpretation of detected subnetworks. For the construction of regulatory networks, I review existing methods with a focus on curation and inference approaches. I first describe how literature curation can be used to construct a regulatory network for a specific process, using the well-studied diauxic shift in yeast as an example. In particular, I address the question how a detailed understanding, as available for the regulation of single genes, can be scaled-up to the level of larger systems. I subsequently inspect methods for large-scale network inference showing that they are significantly skewed towards master regulators. A recalibration strategy is introduced and applied, yielding an improved genome-wide regulatory network for yeast. To detect significantly altered subnetworks, I introduce GGEA as a method for network-based enrichment analysis. The key idea is to score regulatory interactions within functional gene sets for consistency with the observed expression. Compared to other recently published methods, GGEA yields results that consistently and coherently align expression changes with known regulation types and that are thus easier to explain. I also suggest and discuss several significant enhancements to the original method that are improving its applicability, outcome and runtime. For the systematic detection and interpretation of subnetworks, I have developed the EnrichmentBrowser software package. It implements several state-of-the-art methods besides GGEA, and allows to combine and explore results across methods. As part of the Bioconductor repository, the package provides a unified access to the different methods and, thus, greatly simplifies the usage for biologists. Extensions to this framework, that support automating of biological interpretation routines, are also presented. In conclusion, this work contributes substantially to the research field of network-based analysis of gene expression data with respect to regulatory network construction, subnetwork detection, and their biological interpretation. This also includes recent developments as well as areas of ongoing research, which are discussed in the context of current and future questions arising from the new generation of genomic data

    Controllability of protein-protein interaction phosphorylation-based networks: Participation of the hub 14-3-3 protein family

    Get PDF
    Posttranslational regulation of protein function is an ubiquitous mechanism in eukaryotic cells. Here, we analyzed biological properties of nodes and edges of a human protein-protein interaction phosphorylation-based network, especially of those nodes critical for the network controllability. We found that the minimal number of critical nodes needed to control the whole network is 29%, which is considerably lower compared to other real networks. These critical nodes are more regulated by posttranslational modifications and contain more binding domains to these modifications than other kinds of nodes in the network, suggesting an intra-group fast regulation. Also, when we analyzed the edges characteristics that connect critical and non-critical nodes, we found that the former are enriched in domain-to-eukaryotic linear motif interactions, whereas the later are enriched in domain-domain interactions. Our findings suggest a possible structure for protein-protein interaction networks with a densely interconnected and self-regulated central core, composed of critical nodes with a high participation in the controllability of the full network, and less regulated peripheral nodes. Our study offers a deeper understanding of complex network control and bridges the controllability theorems for complex networks and biological protein-protein interaction phosphorylation-based networked systems.Fil: Uhart, Marina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza. Instituto de Histología y Embriología de Mendoza Dr. Mario H. Burgos. Universidad Nacional de Cuyo. Facultad de Cienicas Médicas. Instituto de Histología y Embriología de Mendoza Dr. Mario H. Burgos; ArgentinaFil: Flores, Gabriel. Eventioz/eventbrite Company; ArgentinaFil: Bustos, Diego Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mendoza. Instituto de Histología y Embriología de Mendoza Dr. Mario H. Burgos. Universidad Nacional de Cuyo. Facultad de Cienicas Médicas. Instituto de Histología y Embriología de Mendoza Dr. Mario H. Burgos; Argentin

    Network Approaches to the Study of Genomic Variation in Cancer

    Get PDF
    Advances in genomic sequencing technologies opened the door for a wider study of cancer etiology. By analyzing datasets with thousands of exomes (or genomes), researchers gained a better understanding of the genomic alterations that confer a selective advantage towards cancerous growth. A predominant narrative in the field has been based on a dichotomy of alterations that confer a strong selective advantage, called cancer drivers, and the bulk of other alterations assumed to have a neutral effect, called passengers. Yet, a series of studies questioned this narrative and assigned potential roles to passengers, be it in terms of facilitating tumorigenesis or countering the effect of drivers. Consequently, the passenger mutational landscape received a higher level of attention in attempt to prioritize the possible effects of its alterations and to identify new therapeutic targets. In this dissertation, we introduce interpretable network approaches to the study of genomic variation in cancer. We rely on two types of networks, namely functional biological networks and artificial neural nets. In the first chapter, we describe a propagation method that prioritizes 230 infrequently mutated genes with respect to their potential contribution to cancer development. In the second chapter, we further transcend the driver-passenger dichotomy and demonstrate a gradient of cancer relevance across human genes. In the last two chapters, we present methods that simplify neural network models to render them more interpretable with a focus on functional genomic applications in cancer and beyond

    Exploration of large molecular datasets using global gene networks : computational methods and tools

    Get PDF
    Defining gene expression profiles and mapping complex interactions between molecular regulators and proteins is a key for understanding biological processes and the functional properties of cells, which is therefore, the focus on numerous experimental studies. Small-scale biochemical analyses deliver high-quality data, but lack coverage, whereas high throughput sequencing reveals thousands of interactions which can be error-prone and require proper computational methods to discover true relations. Furthermore, all these approaches usually focus on one type of interaction at a time. This makes experimental mapping of the genome-wide network a cost and time-intensive procedure. In the first part of the thesis, I present the developed network analysis tools for exploring large- scale datasets in the context of a global network of functional coupling. Paper I introduces NEArender, a method for performing pathway analysis and determines the relations between gene sets using a global network. Traditionally, pathway analysis did not consider network relations, thereby covering a minor part of the whole picture. Placing the gene sets in the context of a network provides additional information for pathway analysis, which reveals a more comprehensive picture. Paper II presents EviNet, a user-friendly web interface for using NEArender algorithm. The user can either input gene lists or manage and integrate highly complex experimental designs via the interactive Venn diagram-based interface. The web resource provides access to biological networks and pathways from multiple public or users’ own resources. The analysis typically takes seconds or minutes, and the results are presented in a graphic and tabular format. Paper III describes NEAmarker, a method to predict anti-cancer drug targets from enrichment scores calculated by NEArender, thus presenting a practical usage of network enrichment tool. The method can integrate data from multiple omics platforms to model drug sensitivity with enrichment variables. In parallel, alternative methods for pathway enrichment analysis were benchmarked in the paper. The second part of the thesis is focused on identifying spatial and temporal mechanisms that govern the formation of neural cell diversity in the developing brain. High-throughput platforms for RNA- and ChIP-sequencing were applied to provide data for studying the underlying biological hypothesis at the genome-wide scale. In Paper IV, I defined the role of the transcription factor Foxa2 during the specification and differentiation of floor plate cells of the ventral neural tube. By RNA-seq analyses of Foxa2-/- cells, a large set of candidate genes involved in floor plate differentiation were identified. Analysis of Foxa2 ChIP-seq dataset suggested that Foxa2 directly regulated more than 250 genes expressed by the floor plate and identified Rfx4 and Ascl1 as co-regulators of many floor plate genes. Experimental studies suggested a cooperative activator function for Foxa2 and Rfx4 and a suppressive role for Ascl1 in spatially constraining floor plate induction. Paper V addresses how time is measured during sequential specification of neurons from multipotent progenitor cells during the development of ventral hindbrain. An underlying timer circuitry which leads to the sequential generation of motor neurons and serotonergic neurons has been identified by integrating experimental and computational data modeling

    Unsupervised Extraction of Stable Expression Signatures from Public Compendia with an Ensemble of Neural Networks

    Get PDF
    Cross-experiment comparisons in public data compendia are challenged by unmatched conditions and technical noise. The ADAGE method, which performs unsupervised integration with denoising autoencoder neural networks, can identify biological patterns, but because ADAGE models, like many neural networks, are over-parameterized, different ADAGE models perform equally well. To enhance model robustness and better build signatures consistent with biological pathways, we developed an ensemble ADAGE (eADAGE) that integrated stable signatures across models. We applied eADAGE to a compendium of Pseudomonas aeruginosa gene expression profiling experiments performed in 78 media. eADAGE revealed a phosphate starvation response controlled by PhoB in media with moderate phosphate and predicted that a second stimulus provided by the sensor kinase, KinB, is required for this PhoB activation. We validated this relationship using both targeted and unbiased genetic approaches. eADAGE, which captures stable biological patterns, enables cross-experiment comparisons that can highlight measured but undiscovered relationships.Gordon and Betty Moore Foundation (GBMF 4552)National Institutes of Health (U.S.) (grant R01-AI091702)Cystic Fibrosis Foundation (STANTO15R0

    Pathway and network analysis in proteomics

    Get PDF
    Proteomics is inherently a systems science that studies not only measured protein and their expressions in a cell, but also the interplay of proteins, protein complexes, signaling pathways, and network modules. There is a rapid accumulation of Proteomics data in recent years. However, Proteomics data are highly variable, with results sensitive to data preparation methods, sample condition, instrument types, and analytical methods. To address the challenge in Proteomics data analysis, we review current tools being developed to incorporate biological function and network topological information. We categorize these tools into four types: tools with basic functional information and little topological features (e.g., GO category analysis), tools with rich functional information and little topological features (e.g., GSEA), tools with basic functional information and rich topological features (e.g., Cytoscape), and tools with rich functional information and rich topological features (e.g., PathwayExpress). We first review the potential application of these tools to Proteomics; then we review tools that can achieve automated learning of pathway modules and features, and tools that help perform integrated network visual analytics
    • …
    corecore