600 research outputs found

    Assessing the functional coherence of modules found in multiple-evidence networks from Arabidopsis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Combining multiple evidence-types from different information sources has the potential to reveal new relationships in biological systems. The integrated information can be represented as a relationship network, and clustering the network can suggest possible functional modules. The value of such modules for gaining insight into the underlying biological processes depends on their functional coherence. The challenges that we wish to address are to define and quantify the functional coherence of modules in relationship networks, so that they can be used to infer function of as yet unannotated proteins, to discover previously unknown roles of proteins in diseases as well as for better understanding of the regulation and interrelationship between different elements of complex biological systems.</p> <p>Results</p> <p>We have defined the functional coherence of modules with respect to the Gene Ontology (GO) by considering two complementary aspects: (i) the fragmentation of the GO functional categories into the different modules and (ii) the most representative functions of the modules. We have proposed a set of metrics to evaluate these two aspects and demonstrated their utility in <it>Arabidopsis thaliana</it>. We selected 2355 proteins for which experimentally established protein-protein interaction (PPI) data were available. From these we have constructed five relationship networks, four based on single types of data: PPI, co-expression, co-occurrence of protein names in scientific literature abstracts and sequence similarity and a fifth one combining these four evidence types. The ability of these networks to suggest biologically meaningful grouping of proteins was explored by applying Markov clustering and then by measuring the functional coherence of the clusters.</p> <p>Conclusions</p> <p>Relationship networks integrating multiple evidence-types are biologically informative and allow more proteins to be assigned to a putative functional module. Using additional evidence types concentrates the functional annotations in a smaller number of modules without unduly compromising their consistency. These results indicate that integration of more data sources improves the ability to uncover functional association between proteins, both by allowing more proteins to be linked and producing a network where modular structure more closely reflects the hierarchy in the gene ontology.</p

    Systematic identification of functional plant modules through the integration of complementary data sources

    Get PDF
    A major challenge is to unravel how genes interact and are regulated to exert specific biological functions. The integration of genome-wide functional genomics data, followed by the construction of gene networks, provides a powerful approach to identify functional gene modules. Large-scale expression data, functional gene annotations, experimental protein-protein interactions, and transcription factor-target interactions were integrated to delineate modules in Arabidopsis (Arabidopsis thaliana). The different experimental input data sets showed little overlap, demonstrating the advantage of combining multiple data types to study gene function and regulation. In the set of 1,563 modules covering 13,142 genes, most modules displayed strong coexpression, but functional and cis-regulatory coherence was less prevalent. Highly connected hub genes showed a significant enrichment toward embryo lethality and evidence for cross talk between different biological processes. Comparative analysis revealed that 58% of the modules showed conserved coexpression across multiple plants. Using module-based functional predictions, 5,562 genes were annotated, and an evaluation experiment disclosed that, based on 197 recently experimentally characterized genes, 38.1% of these functions could be inferred through the module context. Examples of confirmed genes of unknown function related to cell wall biogenesis, xylem and phloem pattern formation, cell cycle, hormone stimulus, and circadian rhythm highlight the potential to identify new gene functions. The module-based predictions offer new biological hypotheses for functionally unknown genes in Arabidopsis (1,701 genes) and six other plant species (43,621 genes). Furthermore, the inferred modules provide new insights into the conservation of coexpression and coregulation as well as a starting point for comparative functional annotation

    A functional and regulatory perspective on Arabidopsis thaliana

    Get PDF

    Transcriptome-based Gene Networks for Systems-level Analysis of Plant Gene Functions

    Get PDF
    Present day genomic technologies are evolving at an unprecedented rate, allowing interrogation of cellular activities with increasing breadth and depth. However, we know very little about how the genome functions and what the identified genes do. The lack of functional annotations of genes greatly limits the post-analytical interpretation of new high throughput genomic datasets. For plant biologists, the problem is much severe. Less than 50% of all the identified genes in the model plant Arabidopsis thaliana, and only about 20% of all genes in the crop model Oryza sativa have some aspects of their functions assigned. Therefore, there is an urgent need to develop innovative methods to predict and expand on the currently available functional annotations of plant genes. With open-access catching the ‘pulse’ of modern day molecular research, an integration of the copious amount of transcriptome datasets allows rapid prediction of gene functions in specific biological contexts, which provide added evidence over traditional homology-based functional inference. The main goal of this dissertation was to develop data analysis strategies and tools broadly applicable in systems biology research. Two user friendly interactive web applications are presented: The Rice Regulatory Network (RRN) captures an abiotic-stress conditioned gene regulatory network designed to facilitate the identification of transcription factor targets during induction of various environmental stresses. The Arabidopsis Seed Active Network (SANe) is a transcriptional regulatory network that encapsulates various aspects of seed formation, including embryogenesis, endosperm development and seed-coat formation. Further, an edge-set enrichment analysis algorithm is proposed that uses network density as a parameter to estimate the gain or loss in correlation of pathways between two conditionally independent coexpression networks

    Discovering study-specific gene regulatory networks

    Get PDF
    This article has been made available through the Brunel Open Access Publishing Fund.Microarrays are commonly used in biology because of their ability to simultaneously measure thousands of genes under different conditions. Due to their structure, typically containing a high amount of variables but far fewer samples, scalable network analysis techniques are often employed. In particular, consensus approaches have been recently used that combine multiple microarray studies in order to find networks that are more robust. The purpose of this paper, however, is to combine multiple microarray studies to automatically identify subnetworks that are distinctive to specific experimental conditions rather than common to them all. To better understand key regulatory mechanisms and how they change under different conditions, we derive unique networks from multiple independent networks built using glasso which goes beyond standard correlations. This involves calculating cluster prediction accuracies to detect the most predictive genes for a specific set of conditions. We differentiate between accuracies calculated using cross-validation within a selected cluster of studies (the intra prediction accuracy) and those calculated on a set of independent studies belonging to different study clusters (inter prediction accuracy). Finally, we compare our method's results to related state-of-the art techniques. We explore how the proposed pipeline performs on both synthetic data and real data (wheat and Fusarium). Our results show that subnetworks can be identified reliably that are specific to subsets of studies and that these networks reflect key mechanisms that are fundamental to the experimental conditions in each of those subsets

    A collection of conserved noncoding sequences to study gene regulation in flowering plants

    Get PDF
    Transcription factors (TFs) regulate gene expression by binding cis-regulatory elements, of which the identification remains an ongoing challenge owing to the prevalence of large numbers of nonfunctional TF binding sites. Powerful comparative genomics methods, such as phylogenetic footprinting, can be used for the detection of conserved noncoding sequences (CNSs), which are functionally constrained and can greatly help in reducing the number of false-positive elements. In this study, we applied a phylogenetic footprinting approach for the identification of CNSs in 10 dicot plants, yielding 1,032,291 CNSs associated with 243,187 genes. To annotate CNSs with TF binding sites, we made use of binding site information for 642 TFs originating from 35 TF families in Arabidopsis (Arabidopsis thaliana). In three species, the identified CNSs were evaluated using TF chromatin immunoprecipitation sequencing data, resulting in significant overlap for the majority of data sets. To identify ultraconserved CNSs, we included genomes of additional plant families and identified 715 binding sites for 501 genes conserved in dicots, monocots, mosses, and green algae. Additionally, we found that genes that are part of conserved mini-regulons have a higher coherence in their expression profile than other divergent gene pairs. All identified CNSs were integrated in the PLAZA 3.0 Dicots comparative genomics platform (http://bioinformatics.psb.ugent.be/plaza/versions/plaza_v3_dicots/) together with new functionalities facilitating the exploration of conserved cis-regulatory elements and their associated genes. The availability of this data set in a user-friendly platform enables the exploration of functional noncoding DNA to study gene regulation in a variety of plant species, including crops

    Studying the regulatory landscape of flowering plants

    Get PDF

    Development of mathematical methods for modeling biological systems

    Get PDF

    Integration strategies and data analysis methods for plant systems biology

    Get PDF
    Understanding how function relates to multiple layers of inactions between biological entities is one of the key goals of bioinformatics research, in particular in such areas as systems biology. However, the realisation of this objective is hampered by the sheer volume and multi-level heterogeneity of potentially relevant information. This work addressed this issue by developing a set of integration pipelines and analysis methods as part of an Ondex data integration framework. The integration process incorporated both relevant data from a set of publically available databases and information derived from predicted approaches, which were also implemented as part of this work. These methods were used to assemble integrated datasets that were of relevance to the study of the model plant species Arabidopsis thaliana and applicable for the network-driven analysis. A particular attention was paid to the evaluation and comparison of the different sources of these data. Approaches were implemented for the identification and characterisation of functional modules in integrated networks and used to study and compare networks constructed from different types of data. The benefits of data integration were also demonstrated in three different bioinformatics research scenarios. The analysis of the constructed datasets has also resulted in a better understanding of the functional role of genes identified in a study of a nitrogen uptake mutant and allowed to select candidate genes for further exploration
    corecore