1,149 research outputs found

    Systematic identification of functional plant modules through the integration of complementary data sources

    Get PDF
    A major challenge is to unravel how genes interact and are regulated to exert specific biological functions. The integration of genome-wide functional genomics data, followed by the construction of gene networks, provides a powerful approach to identify functional gene modules. Large-scale expression data, functional gene annotations, experimental protein-protein interactions, and transcription factor-target interactions were integrated to delineate modules in Arabidopsis (Arabidopsis thaliana). The different experimental input data sets showed little overlap, demonstrating the advantage of combining multiple data types to study gene function and regulation. In the set of 1,563 modules covering 13,142 genes, most modules displayed strong coexpression, but functional and cis-regulatory coherence was less prevalent. Highly connected hub genes showed a significant enrichment toward embryo lethality and evidence for cross talk between different biological processes. Comparative analysis revealed that 58% of the modules showed conserved coexpression across multiple plants. Using module-based functional predictions, 5,562 genes were annotated, and an evaluation experiment disclosed that, based on 197 recently experimentally characterized genes, 38.1% of these functions could be inferred through the module context. Examples of confirmed genes of unknown function related to cell wall biogenesis, xylem and phloem pattern formation, cell cycle, hormone stimulus, and circadian rhythm highlight the potential to identify new gene functions. The module-based predictions offer new biological hypotheses for functionally unknown genes in Arabidopsis (1,701 genes) and six other plant species (43,621 genes). Furthermore, the inferred modules provide new insights into the conservation of coexpression and coregulation as well as a starting point for comparative functional annotation

    Are there laws of genome evolution?

    Get PDF
    Research in quantitative evolutionary genomics and systems biology led to the discovery of several universal regularities connecting genomic and molecular phenomic variables. These universals include the log-normal distribution of the evolutionary rates of orthologous genes; the power law-like distributions of paralogous family size and node degree in various biological networks; the negative correlation between a gene's sequence evolution rate and expression level; and differential scaling of functional classes of genes with genome size. The universals of genome evolution can be accounted for by simple mathematical models similar to those used in statistical physics, such as the birth-death-innovation model. These models do not explicitly incorporate selection, therefore the observed universal regularities do not appear to be shaped by selection but rather are emergent properties of gene ensembles. Although a complete physical theory of evolutionary biology is inconceivable, the universals of genome evolution might qualify as 'laws of evolutionary genomics' in the same sense 'law' is understood in modern physics.Comment: 17 pages, 2 figure

    Similarities and Differences in Genome-Wide Expression Data of Six Organisms

    Get PDF
    Comparing genomic properties of different organisms is of fundamental importance in the study of biological and evolutionary principles. Although differences among organisms are often attributed to differential gene expression, genome-wide comparative analysis thus far has been based primarily on genomic sequence information. We present a comparative study of large datasets of expression profiles from six evolutionarily distant organisms: S. cerevisiae, C. elegans, E. coli, A. thaliana, D. melanogaster, and H. sapiens. We use genomic sequence information to connect these data and compare global and modular properties of the transcription programs. Linking genes whose expression profiles are similar, we find that for all organisms the connectivity distribution follows a power-law, highly connected genes tend to be essential and conserved, and the expression program is highly modular. We reveal the modular structure by decomposing each set of expression data into coexpressed modules. Functionally related sets of genes are frequently coexpressed in multiple organisms. Yet their relative importance to the transcription program and their regulatory relationships vary among organisms. Our results demonstrate the potential of combining sequence and expression data for improving functional gene annotation and expanding our understanding of how gene expression and diversity evolved

    Modular reorganization of the global network of gene regulatory interactions during perinatal human brain development.

    Get PDF
    BACKGROUND During early development of the nervous system, gene expression patterns are known to vary widely depending on the specific developmental trajectories of different structures. Observable changes in gene expression profiles throughout development are determined by an underlying network of precise regulatory interactions between individual genes. Elucidating the organizing principles that shape this gene regulatory network is one of the central goals of developmental biology. Whether the developmental programme is the result of a dynamic driven by a fixed architecture of regulatory interactions, or alternatively, the result of waves of regulatory reorganization is not known. RESULTS Here we contrast these two alternative models by examining existing expression data derived from the developing human brain in prenatal and postnatal stages. We reveal a sharp change in gene expression profiles at birth across brain areas. This sharp division between foetal and postnatal profiles is not the result of pronounced changes in level of expression of existing gene networks. Instead we demonstrate that the perinatal transition is marked by the widespread regulatory rearrangement within and across existing gene clusters, leading to the emergence of new functional groups. This rearrangement is itself organized into discrete blocks of genes, each targeted by a distinct set of transcriptional regulators and associated to specific biological functions. CONCLUSIONS Our results provide evidence of an acute modular reorganization of the regulatory architecture of the brain transcriptome occurring at birth, reflecting the reassembly of new functional associations required for the normal transition from prenatal to postnatal brain development

    A general co-expression network-based approach to gene expression analysis: comparison and applications

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Co-expression network-based approaches have become popular in analyzing microarray data, such as for detecting functional gene modules. However, co-expression networks are often constructed by ad hoc methods, and network-based analyses have not been shown to outperform the conventional cluster analyses, partially due to the lack of an unbiased evaluation metric.</p> <p>Results</p> <p>Here, we develop a general co-expression network-based approach for analyzing both genes and samples in microarray data. Our approach consists of a simple but robust rank-based network construction method, a parameter-free module discovery algorithm and a novel reference network-based metric for module evaluation. We report some interesting topological properties of rank-based co-expression networks that are very different from that of value-based networks in the literature. Using a large set of synthetic and real microarray data, we demonstrate the superior performance of our approach over several popular existing algorithms. Applications of our approach to yeast, Arabidopsis and human cancer microarray data reveal many interesting modules, including a fatal subtype of lymphoma and a gene module regulating yeast telomere integrity, which were missed by the existing methods.</p> <p>Conclusions</p> <p>We demonstrated that our novel approach is very effective in discovering the modular structures in microarray data, both for genes and for samples. As the method is essentially parameter-free, it may be applied to large data sets where the number of clusters is difficult to estimate. The method is also very general and can be applied to other types of data. A MATLAB implementation of our algorithm can be downloaded from <url>http://cs.utsa.edu/~jruan/Software.html</url>.</p

    Extracting expression modules from perturbational gene expression compendia

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Compendia of gene expression profiles under chemical and genetic perturbations constitute an invaluable resource from a systems biology perspective. However, the perturbational nature of such data imposes specific challenges on the computational methods used to analyze them. In particular, traditional clustering algorithms have difficulties in handling one of the prominent features of perturbational compendia, namely partial coexpression relationships between genes. Biclustering methods on the other hand are specifically designed to capture such partial coexpression patterns, but they show a variety of other drawbacks. For instance, some biclustering methods are less suited to identify overlapping biclusters, while others generate highly redundant biclusters. Also, none of the existing biclustering tools takes advantage of the staple of perturbational expression data analysis: the identification of differentially expressed genes.</p> <p>Results</p> <p>We introduce a novel method, called ENIGMA, that addresses some of these issues. ENIGMA leverages differential expression analysis results to extract expression modules from perturbational gene expression data. The core parameters of the ENIGMA clustering procedure are automatically optimized to reduce the redundancy between modules. In contrast to the biclusters produced by most other methods, ENIGMA modules may show internal substructure, i.e. subsets of genes with distinct but significantly related expression patterns. The grouping of these (often functionally) related patterns in one module greatly aids in the biological interpretation of the data. We show that ENIGMA outperforms other methods on artificial datasets, using a quality criterion that, unlike other criteria, can be used for algorithms that generate overlapping clusters and that can be modified to take redundancy between clusters into account. Finally, we apply ENIGMA to the Rosetta compendium of expression profiles for <it>Saccharomyces cerevisiae </it>and we analyze one pheromone response-related module in more detail, demonstrating the potential of ENIGMA to generate detailed predictions.</p> <p>Conclusion</p> <p>It is increasingly recognized that perturbational expression compendia are essential to identify the gene networks underlying cellular function, and efforts to build these for different organisms are currently underway. We show that ENIGMA constitutes a valuable addition to the repertoire of methods to analyze such data.</p
    corecore