8,331 research outputs found
A Unifying Model of Genome Evolution Under Parsimony
We present a data structure called a history graph that offers a practical
basis for the analysis of genome evolution. It conceptually simplifies the
study of parsimonious evolutionary histories by representing both substitutions
and double cut and join (DCJ) rearrangements in the presence of duplications.
The problem of constructing parsimonious history graphs thus subsumes related
maximum parsimony problems in the fields of phylogenetic reconstruction and
genome rearrangement. We show that tractable functions can be used to define
upper and lower bounds on the minimum number of substitutions and DCJ
rearrangements needed to explain any history graph. These bounds become tight
for a special type of unambiguous history graph called an ancestral variation
graph (AVG), which constrains in its combinatorial structure the number of
operations required. We finally demonstrate that for a given history graph ,
a finite set of AVGs describe all parsimonious interpretations of , and this
set can be explored with a few sampling moves.Comment: 52 pages, 24 figure
Average-case analysis of perfect sorting by reversals (Journal Version)
Perfect sorting by reversals, a problem originating in computational
genomics, is the process of sorting a signed permutation to either the identity
or to the reversed identity permutation, by a sequence of reversals that do not
break any common interval. B\'erard et al. (2007) make use of strong interval
trees to describe an algorithm for sorting signed permutations by reversals.
Combinatorial properties of this family of trees are essential to the algorithm
analysis. Here, we use the expected value of certain tree parameters to prove
that the average run-time of the algorithm is at worst, polynomial, and
additionally, for sufficiently long permutations, the sorting algorithm runs in
polynomial time with probability one. Furthermore, our analysis of the subclass
of commuting scenarios yields precise results on the average length of a
reversal, and the average number of reversals.Comment: A preliminary version of this work appeared in the proceedings of
Combinatorial Pattern Matching (CPM) 2009. See arXiv:0901.2847; Discrete
Mathematics, Algorithms and Applications, vol. 3(3), 201
Why genes evolve faster on secondary chromosomes in bacteria
In bacterial genomes composed of more than one chromosome, one replicon is typically larger, harbors more essential genes than the others, and is considered primary. The greater variability of secondary chromosomes among related taxa has led to the theory that they serve as an accessory genome for specific niches or conditions. By this rationale, purifying selection should be weaker on genes on secondary chromosomes because of their reduced necessity or usage. To test this hypothesis we selected bacterial genomes composed of multiple chromosomes from two genera, Burkholderia and Vibrio, and quantified the evolutionary rates (dN and dS) of all orthologs within each genus. Both evolutionary rate parameters were faster among orthologs found on secondary chromosomes than those on the primary chromosome. Further, in every bacterial genome with multiple chromosomes that we studied, genes on secondary chromosomes exhibited significantly weaker codon usage bias than those on primary chromosomes. Faster evolution and reduced codon bias could in turn result from global effects of chromosome position, as genes on secondary chromosomes experience reduced dosage and expression due to their delayed replication, or selection on specific gene attributes. These alternatives were evaluated using orthologs common to genomes with multiple chromosomes and genomes with single chromosomes. Analysis of these ortholog sets suggested that inherently fast-evolving genes tend to be sorted to secondary chromosomes when they arise; however, prolonged evolution on a secondary chromosome further accelerated substitution rates. In summary, secondary chromosomes in bacteria are evolutionary test beds where genes are weakly preserved and evolve more rapidly, likely because they are used less frequently
The inference of gene trees with species trees
Molecular phylogeny has focused mainly on improving models for the
reconstruction of gene trees based on sequence alignments. Yet, most
phylogeneticists seek to reveal the history of species. Although the histories
of genes and species are tightly linked, they are seldom identical, because
genes duplicate, are lost or horizontally transferred, and because alleles can
co-exist in populations for periods that may span several speciation events.
Building models describing the relationship between gene and species trees can
thus improve the reconstruction of gene trees when a species tree is known, and
vice-versa. Several approaches have been proposed to solve the problem in one
direction or the other, but in general neither gene trees nor species trees are
known. Only a few studies have attempted to jointly infer gene trees and
species trees. In this article we review the various models that have been used
to describe the relationship between gene trees and species trees. These models
account for gene duplication and loss, transfer or incomplete lineage sorting.
Some of them consider several types of events together, but none exists
currently that considers the full repertoire of processes that generate gene
trees along the species tree. Simulations as well as empirical studies on
genomic data show that combining gene tree-species tree models with models of
sequence evolution improves gene tree reconstruction. In turn, these better
gene trees provide a better basis for studying genome evolution or
reconstructing ancestral chromosomes and ancestral gene sequences. We predict
that gene tree-species tree methods that can deal with genomic data sets will
be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational
Evolutionary Biology" conference, Montpellier, 201
Bacterial microevolution and the Pangenome
The comparison of multiple genome sequences sampled from a bacterial population reveals considerable diversity in both the core and the accessory parts of the pangenome. This diversity can be analysed in terms of microevolutionary events that took place since the genomes shared a common ancestor, especially deletion, duplication, and recombination. We review the basic modelling ingredients used implicitly or explicitly when performing such a pangenome analysis. In particular, we describe a basic neutral phylogenetic framework of bacterial pangenome microevolution, which is not incompatible with evaluating the role of natural selection. We survey the different ways in which pangenome data is summarised in order to be included in microevolutionary models, as well as the main methodological approaches that have been proposed to reconstruct pangenome microevolutionary history
Quantifying evolutionary constraints on B cell affinity maturation
The antibody repertoire of each individual is continuously updated by the
evolutionary process of B cell receptor mutation and selection. It has recently
become possible to gain detailed information concerning this process through
high-throughput sequencing. Here, we develop modern statistical molecular
evolution methods for the analysis of B cell sequence data, and then apply them
to a very deep short-read data set of B cell receptors. We find that the
substitution process is conserved across individuals but varies significantly
across gene segments. We investigate selection on B cell receptors using a
novel method that side-steps the difficulties encountered by previous work in
differentiating between selection and motif-driven mutation; this is done
through stochastic mapping and empirical Bayes estimators that compare the
evolution of in-frame and out-of-frame rearrangements. We use this new method
to derive a per-residue map of selection, which provides a more nuanced view of
the constraints on framework and variable regions.Comment: Previously entitled "Substitution and site-specific selection driving
B cell affinity maturation is consistent across individuals
A complex adaptive systems approach to the kinetic folding of RNA
The kinetic folding of RNA sequences into secondary structures is modeled as
a complex adaptive system, the components of which are possible RNA structural
rearrangements (SRs) and their associated bases and base pairs. RNA bases and
base pairs engage in local stacking interactions that determine the
probabilities (or fitnesses) of possible SRs. Meanwhile, selection operates at
the level of SRs; an autonomous stochastic process periodically (i.e., from one
time step to another) selects a subset of possible SRs for realization based on
the fitnesses of the SRs. Using examples based on selected natural and
synthetic RNAs, the model is shown to qualitatively reproduce characteristic
(nonlinear) RNA folding dynamics such as the attainment by RNAs of alternative
stable states. Possible applications of the model to the analysis of properties
of fitness landscapes, and of the RNA sequence to structure mapping are
discussed.Comment: 23 pages, 4 figures, 2 tables, to be published in BioSystems (Note:
updated 2 references
Conditions for the Evolution of Gene Clusters in Bacterial Genomes
Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model), genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters
- …