6,883 research outputs found
Distance-Based Genome Rearrangement Phylogeny
Evolution operates on whole genomes through direct rearrangements of genes, such as inversions, transpositions, and inverted transpositions, as well as through operations, such as duplications, losses, and transfers, that also affect the gene content of the genomes. Because these events are rare relative to nucleotide substitutions, gene order data offer the possibility of resolving ancient branches in the tree of life; the combination of gene order data with sequence data also has the potential to provide more robust phylogenetic reconstructions, since each can elucidate evolution at different time scales. Distance corrections greatly improve the accuracy of phylogeny reconstructions from DNA sequences, enabling distance-based methods to approach the accuracy of the more elaborate methods based on parsimony or likelihood at a fraction of the computational cost. This paper focuses on developing distance correction methods for phylogeny reconstruction from whole genomes. The main question we investigate is how to estimate evolutionary histories from whole genomes with equal gene content, and we present a technique, the empirically derived estimator (EDE), that we have developed for this purpose. We study the use of EDE on whole genomes with identical gene content, and we explore the accuracy of phylogenies inferred using EDE with the neighbor joining and minimum evolution methods under a wide range of model conditions. Our study shows that tree reconstruction under these two methods is much more accurate when based on EDE distances than when based on other distances previously suggested for whole genomes
Maximum likelihood estimates of pairwise rearrangement distances
Accurate estimation of evolutionary distances between taxa is important for
many phylogenetic reconstruction methods. In the case of bacteria, distances
can be estimated using a range of different evolutionary models, from single
nucleotide polymorphisms to large-scale genome rearrangements. In the case of
sequence evolution models (such as the Jukes-Cantor model and associated
metric) have been used to correct pairwise distances. Similar correction
methods for genome rearrangement processes are required to improve inference.
Current attempts at correction fall into 3 categories: Empirical computational
studies, Bayesian/MCMC approaches, and combinatorial approaches. Here we
introduce a maximum likelihood estimator for the inversion distance between a
pair of genomes, using the group-theoretic approach to modelling inversions
introduced recently. This MLE functions as a corrected distance: in particular,
we show that because of the way sequences of inversions interact with each
other, it is quite possible for minimal distance and MLE distance to
differently order the distances of two genomes from a third. This has obvious
implications for the use of minimal distance in phylogeny reconstruction. The
work also tackles the above problem allowing free rotation of the genome.
Generally a frame of reference is locked, and all computation made accordingly.
This work incorporates the action of the dihedral group so that distance
estimates are free from any a priori frame of reference.Comment: 21 pages, 7 figures. To appear in the Journal of Theoretical Biolog
On pairwise distances and median score of three genomes under DCJ
In comparative genomics, the rearrangement distance between two genomes
(equal the minimal number of genome rearrangements required to transform them
into a single genome) is often used for measuring their evolutionary
remoteness. Generalization of this measure to three genomes is known as the
median score (while a resulting genome is called median genome). In contrast to
the rearrangement distance between two genomes which can be computed in linear
time, computing the median score for three genomes is NP-hard. This inspires a
quest for simpler and faster approximations for the median score, the most
natural of which appears to be the halved sum of pairwise distances which in
fact represents a lower bound for the median score.
In this work, we study relationship and interplay of pairwise distances
between three genomes and their median score under the model of
Double-Cut-and-Join (DCJ) rearrangements. Most remarkably we show that while a
rearrangement may change the sum of pairwise distances by at most 2 (and thus
change the lower bound by at most 1), even the most "powerful" rearrangements
in this respect that increase the lower bound by 1 (by moving one genome
farther away from each of the other two genomes), which we call strong, do not
necessarily affect the median score. This observation implies that the two
measures are not as well-correlated as one's intuition may suggest.
We further prove that the median score attains the lower bound exactly on the
triples of genomes that can be obtained from a single genome with strong
rearrangements. While the sum of pairwise distances with the factor 2/3
represents an upper bound for the median score, its tightness remains unclear.
Nonetheless, we show that the difference of the median score and its lower
bound is not bounded by a constant.Comment: Proceedings of the 10-th Annual RECOMB Satellite Workshop on
Comparative Genomics (RECOMB-CG), 2012. (to appear
Hidden breakpoints in genome alignments
During the course of evolution, an organism's genome can undergo changes that
affect the large-scale structure of the genome. These changes include gene
gain, loss, duplication, chromosome fusion, fission, and rearrangement. When
gene gain and loss occurs in addition to other types of rearrangement,
breakpoints of rearrangement can exist that are only detectable by comparison
of three or more genomes. An arbitrarily large number of these "hidden"
breakpoints can exist among genomes that exhibit no rearrangements in pairwise
comparisons.
We present an extension of the multichromosomal breakpoint median problem to
genomes that have undergone gene gain and loss. We then demonstrate that the
median distance among three genomes can be used to calculate a lower bound on
the number of hidden breakpoints present. We provide an implementation of this
calculation including the median distance, along with some practical
improvements on the time complexity of the underlying algorithm.
We apply our approach to measure the abundance of hidden breakpoints in
simulated data sets under a wide range of evolutionary scenarios. We
demonstrate that in simulations the hidden breakpoint counts depend strongly on
relative rates of inversion and gene gain/loss. Finally we apply current
multiple genome aligners to the simulated genomes, and show that all aligners
introduce a high degree of error in hidden breakpoint counts, and that this
error grows with evolutionary distance in the simulation. Our results suggest
that hidden breakpoint error may be pervasive in genome alignments.Comment: 13 pages, 4 figure
Phylogeny of Prokaryotes and Chloroplasts Revealed by a Simple Composition Approach on All Protein Sequences from Complete Genomes Without Sequence Alignment
The complete genomes of living organisms have provided much information on their phylogenetic relationships. Similarly, the complete genomes of chloroplasts have helped to resolve the evolution of this organelle in photosynthetic eukaryotes. In this paper we propose an alternative method of phylogenetic analysis using compositional statistics for all protein sequences from complete genomes. This new method is conceptually simpler than and computationally as fast as the one proposed by Qi et al. (2004b) and Chu et al. (2004). The same data sets used in Qi et al. (2004b) and Chu et al. (2004) are analyzed using the new method. Our distance-based phylogenic tree of the 109 prokaryotes and eukaryotes agrees with the biologists tree of life based on 16S rRNA comparison in a predominant majority of basic branching and most lower taxa. Our phylogenetic analysis also shows that the chloroplast genomes are separated to two major clades corresponding to chlorophytes s.l. and rhodophytes s.l. The interrelationships among the chloroplasts are largely in agreement with the current understanding on chloroplast evolution
Assessing the robustness of parsimonious predictions for gene neighborhoods from reconciled phylogenies
The availability of a large number of assembled genomes opens the way to
study the evolution of syntenic character within a phylogenetic context. The
DeCo algorithm, recently introduced by B{\'e}rard et al. allows the computation
of parsimonious evolutionary scenarios for gene adjacencies, from pairs of
reconciled gene trees. Following the approach pioneered by Sturmfels and
Pachter, we describe how to modify the DeCo dynamic programming algorithm to
identify classes of cost schemes that generates similar parsimonious
evolutionary scenarios for gene adjacencies, as well as the robustness to
changes to the cost scheme of evolutionary events of the presence or absence of
specific ancestral gene adjacencies. We apply our method to six thousands
mammalian gene families, and show that computing the robustness to changes to
cost schemes provides new and interesting insights on the evolution of gene
adjacencies and the DeCo model.Comment: Accepted, to appear in ISBRA - 11th International Symposium on
Bioinformatics Research and Applications - 2015, Jun 2015, Norfolk, Virginia,
United State
Why genes evolve faster on secondary chromosomes in bacteria
In bacterial genomes composed of more than one chromosome, one replicon is typically larger, harbors more essential genes than the others, and is considered primary. The greater variability of secondary chromosomes among related taxa has led to the theory that they serve as an accessory genome for specific niches or conditions. By this rationale, purifying selection should be weaker on genes on secondary chromosomes because of their reduced necessity or usage. To test this hypothesis we selected bacterial genomes composed of multiple chromosomes from two genera, Burkholderia and Vibrio, and quantified the evolutionary rates (dN and dS) of all orthologs within each genus. Both evolutionary rate parameters were faster among orthologs found on secondary chromosomes than those on the primary chromosome. Further, in every bacterial genome with multiple chromosomes that we studied, genes on secondary chromosomes exhibited significantly weaker codon usage bias than those on primary chromosomes. Faster evolution and reduced codon bias could in turn result from global effects of chromosome position, as genes on secondary chromosomes experience reduced dosage and expression due to their delayed replication, or selection on specific gene attributes. These alternatives were evaluated using orthologs common to genomes with multiple chromosomes and genomes with single chromosomes. Analysis of these ortholog sets suggested that inherently fast-evolving genes tend to be sorted to secondary chromosomes when they arise; however, prolonged evolution on a secondary chromosome further accelerated substitution rates. In summary, secondary chromosomes in bacteria are evolutionary test beds where genes are weakly preserved and evolve more rapidly, likely because they are used less frequently
- …