12,983 research outputs found

    Are there laws of genome evolution?

    Get PDF
    Research in quantitative evolutionary genomics and systems biology led to the discovery of several universal regularities connecting genomic and molecular phenomic variables. These universals include the log-normal distribution of the evolutionary rates of orthologous genes; the power law-like distributions of paralogous family size and node degree in various biological networks; the negative correlation between a gene's sequence evolution rate and expression level; and differential scaling of functional classes of genes with genome size. The universals of genome evolution can be accounted for by simple mathematical models similar to those used in statistical physics, such as the birth-death-innovation model. These models do not explicitly incorporate selection, therefore the observed universal regularities do not appear to be shaped by selection but rather are emergent properties of gene ensembles. Although a complete physical theory of evolutionary biology is inconceivable, the universals of genome evolution might qualify as 'laws of evolutionary genomics' in the same sense 'law' is understood in modern physics.Comment: 17 pages, 2 figure

    Universal Features in the Genome-level Evolution of Protein Domains

    Get PDF
    Protein domains are found on genomes with notable statistical distributions, which bear a high degree of similarity. Previous work has shown how these distributions can be accounted for by simple models, where the main ingredients are probabilities of duplication, innovation, and loss of domains. However, no one so far has addressed the issue that these distributions follow definite trends depending on protein-coding genome size only. We present a stochastic duplication/innovation model, falling in the class of so-called Chinese Restaurant Processes, able to explain this feature of the data. Using only two universal parameters, related to a minimal number of domains and to the relative weight of innovation to duplication, the model reproduces two important aspects: (a) the populations of domain classes (the sets, related to homology classes, containing realizations of the same domain in different proteins) follow common power-laws whose cutoff is dictated by genome size, and (b) the number of domain families is universal and markedly sublinear in genome size. An important ingredient of the model is that the innovation probability decreases with genome size. We propose the possibility to interpret this as a global constraint given by the cost of expanding an increasingly complex interactome. Finally, we introduce a variant of the model where the choice of a new domain relates to its occurrence in genomic data, and thus accounts for fold specificity. Both models have general quantitative agreement with data from hundreds of genomes, which indicates the coexistence of the well-known specificity of proteomes with robust self-organizing phenomena related to the basic evolutionary ``moves'' of duplication and innovation

    Evolution signatures in genome network properties

    Get PDF
    Genomes maybe organized as networks where protein-protein association plays the role of network links. The resulting networks are far from being random and their topological properties are a consequence of the underlying mechanisms for genome evolution. Considering data on protein-protein association networks from STRING database, we present experimental evidence that degree distribution is not scale free, presenting an increased probability for high degree nodes. We also show that the degree distribution approaches a scale invariant state as the number of genes in the network increases, although real genomes still present finite size effects. Based on the experimental evidence unveiled by these data analyses, we propose a simulation model for genome evolution, where genes in a network are either acquired de novo using a preferential attachment rule, or duplicated, with a duplication probability that linearly grows with gene degree and decreases with its clustering coefficient. The results show that topological distributions are better described than in previous genome evolution models. This model correctly predicts that, in order to produce protein-protein association networks with number of links and number of nodes in the observed range, it is necessary 90% of gene duplication and 10% of de novo gene acquisition. If this scenario is true, it implies a universal mechanism for genome evolution

    Exact reconciliation of undated trees

    Full text link
    Reconciliation methods aim at recovering macro evolutionary events and at localizing them in the species history, by observing discrepancies between gene family trees and species trees. In this article we introduce an Integer Linear Programming (ILP) approach for the NP-hard problem of computing a most parsimonious time-consistent reconciliation of a gene tree with a species tree when dating information on speciations is not available. The ILP formulation, which builds upon the DTL model, returns a most parsimonious reconciliation ranging over all possible datings of the nodes of the species tree. By studying its performance on plausible simulated data we conclude that the ILP approach is significantly faster than a brute force search through the space of all possible species tree datings. Although the ILP formulation is currently limited to small trees, we believe that it is an important proof-of-concept which opens the door to the possibility of developing an exact, parsimony based approach to dating species trees. The software (ILPEACE) is freely available for download

    A model of large-scale proteome evolution

    Get PDF
    The next step in the understanding of the genome organization, after the determination of complete sequences, involves proteomics. The proteome includes the whole set of protein-protein interactions, and two recent independent studies have shown that its topology displays a number of surprising features shared by other complex networks, both natural and artificial. In order to understand the origins of this topology and its evolutionary implications, we present a simple model of proteome evolution that is able to reproduce many of the observed statistical regularities reported from the analysis of the yeast proteome. Our results suggest that the observed patterns can be explained by a process of gene duplication and diversification that would evolve proteome networks under a selection pressure, favoring robustness against failure of its individual components

    Graph Theory and Networks in Biology

    Get PDF
    In this paper, we present a survey of the use of graph theoretical techniques in Biology. In particular, we discuss recent work on identifying and modelling the structure of bio-molecular networks, as well as the application of centrality measures to interaction networks and research on the hierarchical structure of such networks and network motifs. Work on the link between structural network properties and dynamics is also described, with emphasis on synchronization and disease propagation.Comment: 52 pages, 5 figures, Survey Pape

    The evolution of genetic architectures underlying quantitative traits

    Full text link
    In the classic view introduced by R. A. Fisher, a quantitative trait is encoded by many loci with small, additive effects. Recent advances in QTL mapping have begun to elucidate the genetic architectures underlying vast numbers of phenotypes across diverse taxa, producing observations that sometimes contrast with Fisher's blueprint. Despite these considerable empirical efforts to map the genetic determinants of traits, it remains poorly understood how the genetic architecture of a trait should evolve, or how it depends on the selection pressures on the trait. Here we develop a simple, population-genetic model for the evolution of genetic architectures. Our model predicts that traits under moderate selection should be encoded by many loci with highly variable effects, whereas traits under either weak or strong selection should be encoded by relatively few loci. We compare these theoretical predictions to qualitative trends in the genetics of human traits, and to systematic data on the genetics of gene expression levels in yeast. Our analysis provides an evolutionary explanation for broad empirical patterns in the genetic basis of traits, and it introduces a single framework that unifies the diversity of observed genetic architectures, ranging from Mendelian to Fisherian.Comment: Minor changes in the text; Added supplementary materia
    • …
    corecore