1,807 research outputs found

    Integration of Gene Expression and Methylation to unravel Biological Networks in Glioblastoma Patients

    Full text link
    peer reviewedThe vast amount of heterogeneous omics data, encompassing a broad range of biomolecular information, requires novel methods of analysis, including those that integrate the available levels of information. In this work we describe Regression2Net, a computational approach that is able to integrate gene expression and genomic or methylome data in two steps. First, penalized regressions are used to build Expression-Expression (EEnet) and Expression-Genome or –Methylome (EMnet) networks. Second, network theory is used to highlight important communities of genes. When applying our approach Regression2Net to gene expression and methylation profiles for individuals with glioblastoma multiforme, we identified respectively 284 and 447 potentially interesting genes in relation to glioblastoma pathology. These genes showed at least one connection in the integrated networks ANDnet and XORnet derived from aforementioned EEnet and EMnet networks. Whereas the edges in ANDnet occur in both EEnet and EMnet, the edges in XORnet occur in EMnet but not in EEnet. In-depth biological analysis of connected genes in ANDnet and XORnet revealed genes that are related to energy metabolism, cell cycle control (AATF), immune system response and several cancer types. Importantly, we observed significant over-representation of cancer related pathways including glioma, especially in the XORnet network, suggesting a non-ignorable role of methylation in glioblastoma multiforma. In the ANDnet, we furthermore identified potential glioma suppressor genes ACCN3 and ACCN4 linked to the NBPF1 neuroblastoma breakpoint family, as well as numerous ABC transporter genes (ABCA1, ABCB1) suggesting drug resistance of glioblastoma tumors

    ANIMA: Association Network Integration for Multiscale Analysis

    Get PDF
    Contextual functional interpretation of -omics data derived from clinical samples is a classical and difficult problem in computational systems biology. The measurement of thousands of datapoints on single samples has become routine but relating ‘big data’ datasets to the complexities of human pathobiology is an area of ongoing research. Complicating this is the fact that many publically available datasets use bulk transcriptomics data from complex tissues like blood. The most prevalent analytic approaches derive molecular ‘signatures’ of disease states or apply modular analysis frameworks to the data. Here we show, using a network-based data integration method using clinical phenotype and microarray data as inputs, that we can reconstruct multiple features (or endophenotypes) of disease states at various scales of organization, from transcript abundance patterns of individual genes through co-expression patterns of groups of genes to patterns of cellular behavior in whole blood samples, both in single experiments as well as in a meta-analysis of multiple datasets

    ANIMA: Association network integration for multiscale analysis [version 1; referees: 2 approved with reservations]

    No full text
    Contextual functional interpretation of -omics data derived from clinical samples is a classical and difficult problem in computational systems biology. The measurement of thousands of data points on single samples has become routine but relating ‘big data’ datasets to the complexities of human pathobiology is an area of ongoing research. Complicating this is the fact that many publically available datasets use bulk transcriptomics data from complex tissues like blood. The most prevalent analytic approaches derive molecular ‘signatures’ of disease states or apply modular analysis frameworks to the data. Here we describe ANIMA (association network integration for multiscale analysis), a network-based data integration method using clinical phenotype and microarray data as inputs. ANIMA is implemented in R and Neo4j and runs in Docker containers. In short, the build algorithm iterates over one or more transcriptomics datasets to generate a large, multipartite association network by executing multiple independent analytic steps (differential expression, deconvolution, modular analysis based on co-expression, pathway analysis) and integrating the results. Once the network is built, it can be queried directly using Cypher, or via custom functions that communicate with the graph database via language-specific APIs. We developed a web application using Shiny, which provides fully interactive, multiscale views of the data. Using our approach, we show that we can reconstruct multiple features of disease states at various scales of organization, from transcript abundance patterns of individual genes through co-expression patterns of groups of genes to patterns of cellular behaviour in whole blood samples, both in single experiments as well as in a meta-analysis of multiple datasets

    Updates in metabolomics tools and resources: 2014-2015

    Get PDF
    Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table

    Exploration of large molecular datasets using global gene networks : computational methods and tools

    Get PDF
    Defining gene expression profiles and mapping complex interactions between molecular regulators and proteins is a key for understanding biological processes and the functional properties of cells, which is therefore, the focus on numerous experimental studies. Small-scale biochemical analyses deliver high-quality data, but lack coverage, whereas high throughput sequencing reveals thousands of interactions which can be error-prone and require proper computational methods to discover true relations. Furthermore, all these approaches usually focus on one type of interaction at a time. This makes experimental mapping of the genome-wide network a cost and time-intensive procedure. In the first part of the thesis, I present the developed network analysis tools for exploring large- scale datasets in the context of a global network of functional coupling. Paper I introduces NEArender, a method for performing pathway analysis and determines the relations between gene sets using a global network. Traditionally, pathway analysis did not consider network relations, thereby covering a minor part of the whole picture. Placing the gene sets in the context of a network provides additional information for pathway analysis, which reveals a more comprehensive picture. Paper II presents EviNet, a user-friendly web interface for using NEArender algorithm. The user can either input gene lists or manage and integrate highly complex experimental designs via the interactive Venn diagram-based interface. The web resource provides access to biological networks and pathways from multiple public or users’ own resources. The analysis typically takes seconds or minutes, and the results are presented in a graphic and tabular format. Paper III describes NEAmarker, a method to predict anti-cancer drug targets from enrichment scores calculated by NEArender, thus presenting a practical usage of network enrichment tool. The method can integrate data from multiple omics platforms to model drug sensitivity with enrichment variables. In parallel, alternative methods for pathway enrichment analysis were benchmarked in the paper. The second part of the thesis is focused on identifying spatial and temporal mechanisms that govern the formation of neural cell diversity in the developing brain. High-throughput platforms for RNA- and ChIP-sequencing were applied to provide data for studying the underlying biological hypothesis at the genome-wide scale. In Paper IV, I defined the role of the transcription factor Foxa2 during the specification and differentiation of floor plate cells of the ventral neural tube. By RNA-seq analyses of Foxa2-/- cells, a large set of candidate genes involved in floor plate differentiation were identified. Analysis of Foxa2 ChIP-seq dataset suggested that Foxa2 directly regulated more than 250 genes expressed by the floor plate and identified Rfx4 and Ascl1 as co-regulators of many floor plate genes. Experimental studies suggested a cooperative activator function for Foxa2 and Rfx4 and a suppressive role for Ascl1 in spatially constraining floor plate induction. Paper V addresses how time is measured during sequential specification of neurons from multipotent progenitor cells during the development of ventral hindbrain. An underlying timer circuitry which leads to the sequential generation of motor neurons and serotonergic neurons has been identified by integrating experimental and computational data modeling

    Bioinformatics for RNA‐Seq Data Analysis

    Get PDF
    While RNA sequencing (RNA‐seq) has become increasingly popular for transcriptome profiling, the analysis of the massive amount of data generated by large‐scale RNA‐seq still remains a challenge. RNA‐seq data analyses typically consist of (1) accurate mapping of millions of short sequencing reads to a reference genome, including the identification of splicing events; (2) quantifying expression levels of genes, transcripts, and exons; (3) differential analysis of gene expression among different biological conditions; and (4) biological interpretation of differentially expressed genes. Despite the fact that multiple algorithms pertinent to basic analyses have been developed, there are still a variety of unresolved questions. In this chapter, we review the main tools and algorithms currently available for RNA‐seq data analyses, and our goal is to help RNA‐seq data analysts to make an informed choice of tools in practical RNA‐seq data analysis. In the meantime, RNA‐seq is evolving rapidly, and newer sequencing technologies are briefly introduced, including stranded RNA‐seq, targeted RNA‐seq, and single‐cell RNA‐seq

    Global Functional Atlas of \u3cem\u3eEscherichia coli\u3c/em\u3e Encompassing Previously Uncharacterized Proteins

    Get PDF
    One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans’ biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a “systems-wide” functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins
    corecore