12,983 research outputs found
Are there laws of genome evolution?
Research in quantitative evolutionary genomics and systems biology led to the
discovery of several universal regularities connecting genomic and molecular
phenomic variables. These universals include the log-normal distribution of the
evolutionary rates of orthologous genes; the power law-like distributions of
paralogous family size and node degree in various biological networks; the
negative correlation between a gene's sequence evolution rate and expression
level; and differential scaling of functional classes of genes with genome
size. The universals of genome evolution can be accounted for by simple
mathematical models similar to those used in statistical physics, such as the
birth-death-innovation model. These models do not explicitly incorporate
selection, therefore the observed universal regularities do not appear to be
shaped by selection but rather are emergent properties of gene ensembles.
Although a complete physical theory of evolutionary biology is inconceivable,
the universals of genome evolution might qualify as 'laws of evolutionary
genomics' in the same sense 'law' is understood in modern physics.Comment: 17 pages, 2 figure
Universal Features in the Genome-level Evolution of Protein Domains
Protein domains are found on genomes with notable statistical distributions, which bear a high degree of similarity. Previous work has shown how these distributions can be accounted for by simple models, where the main ingredients are probabilities of duplication, innovation, and loss of domains. However, no one so far has addressed the issue that these distributions follow definite trends depending on protein-coding genome size only. We present a stochastic duplication/innovation model, falling in the class of so-called Chinese Restaurant Processes, able to explain this feature of the data. Using only two universal parameters, related to a minimal number of domains and to the relative weight of innovation to duplication, the model reproduces two important aspects: (a) the populations of domain classes (the sets, related to homology classes, containing realizations of the same domain in different proteins) follow common power-laws whose cutoff is dictated by genome size, and (b) the number of domain families is universal and markedly sublinear in genome size. An important ingredient of the model is that the innovation probability decreases with genome size. We propose the possibility to interpret this as a global constraint given by the cost of expanding an increasingly complex interactome. Finally, we introduce a variant of the model where the choice of a new domain relates to its occurrence in genomic data, and thus accounts for fold specificity. Both models have general quantitative agreement with data from hundreds of genomes, which indicates the coexistence of the well-known specificity of proteomes with robust self-organizing phenomena related to the basic evolutionary ``moves'' of duplication and innovation
Evolution signatures in genome network properties
Genomes maybe organized as networks where protein-protein association plays the role of network links. The resulting networks are far from being random and their topological properties are a consequence of the underlying mechanisms for genome evolution. Considering data on protein-protein association networks from STRING database, we present experimental evidence that degree distribution is not scale free, presenting an increased probability for high degree nodes. We also show that the degree distribution approaches a scale invariant state as the number of genes in the network increases, although real genomes still present finite size effects. Based on the experimental evidence unveiled by these data analyses, we propose a simulation model for genome evolution, where genes in a network are either acquired de novo using a preferential attachment rule, or duplicated, with a duplication probability that linearly grows with gene degree and decreases with its clustering coefficient. The results show that topological distributions are better described than in previous genome evolution models. This model correctly predicts that, in order to produce protein-protein association networks with number of links and number of nodes in the observed range, it is necessary 90% of gene duplication and 10% of de novo gene acquisition. If this scenario is true, it implies a universal mechanism for genome evolution
Exact reconciliation of undated trees
Reconciliation methods aim at recovering macro evolutionary events and at
localizing them in the species history, by observing discrepancies between gene
family trees and species trees. In this article we introduce an Integer Linear
Programming (ILP) approach for the NP-hard problem of computing a most
parsimonious time-consistent reconciliation of a gene tree with a species tree
when dating information on speciations is not available. The ILP formulation,
which builds upon the DTL model, returns a most parsimonious reconciliation
ranging over all possible datings of the nodes of the species tree. By studying
its performance on plausible simulated data we conclude that the ILP approach
is significantly faster than a brute force search through the space of all
possible species tree datings. Although the ILP formulation is currently
limited to small trees, we believe that it is an important proof-of-concept
which opens the door to the possibility of developing an exact, parsimony based
approach to dating species trees. The software (ILPEACE) is freely available
for download
A model of large-scale proteome evolution
The next step in the understanding of the genome organization, after the
determination of complete sequences, involves proteomics. The proteome includes
the whole set of protein-protein interactions, and two recent independent
studies have shown that its topology displays a number of surprising features
shared by other complex networks, both natural and artificial. In order to
understand the origins of this topology and its evolutionary implications, we
present a simple model of proteome evolution that is able to reproduce many of
the observed statistical regularities reported from the analysis of the yeast
proteome. Our results suggest that the observed patterns can be explained by a
process of gene duplication and diversification that would evolve proteome
networks under a selection pressure, favoring robustness against failure of its
individual components
Graph Theory and Networks in Biology
In this paper, we present a survey of the use of graph theoretical techniques
in Biology. In particular, we discuss recent work on identifying and modelling
the structure of bio-molecular networks, as well as the application of
centrality measures to interaction networks and research on the hierarchical
structure of such networks and network motifs. Work on the link between
structural network properties and dynamics is also described, with emphasis on
synchronization and disease propagation.Comment: 52 pages, 5 figures, Survey Pape
The evolution of genetic architectures underlying quantitative traits
In the classic view introduced by R. A. Fisher, a quantitative trait is
encoded by many loci with small, additive effects. Recent advances in QTL
mapping have begun to elucidate the genetic architectures underlying vast
numbers of phenotypes across diverse taxa, producing observations that
sometimes contrast with Fisher's blueprint. Despite these considerable
empirical efforts to map the genetic determinants of traits, it remains poorly
understood how the genetic architecture of a trait should evolve, or how it
depends on the selection pressures on the trait. Here we develop a simple,
population-genetic model for the evolution of genetic architectures. Our model
predicts that traits under moderate selection should be encoded by many loci
with highly variable effects, whereas traits under either weak or strong
selection should be encoded by relatively few loci. We compare these
theoretical predictions to qualitative trends in the genetics of human traits,
and to systematic data on the genetics of gene expression levels in yeast. Our
analysis provides an evolutionary explanation for broad empirical patterns in
the genetic basis of traits, and it introduces a single framework that unifies
the diversity of observed genetic architectures, ranging from Mendelian to
Fisherian.Comment: Minor changes in the text; Added supplementary materia
- …