345 research outputs found

    Parking functions, labeled trees and DCJ sorting scenarios

    Get PDF
    In genome rearrangement theory, one of the elusive questions raised in recent years is the enumeration of rearrangement scenarios between two genomes. This problem is related to the uniform generation of rearrangement scenarios, and the derivation of tests of statistical significance of the properties of these scenarios. Here we give an exact formula for the number of double-cut-and-join (DCJ) rearrangement scenarios of co-tailed genomes. We also construct effective bijections between the set of scenarios that sort a cycle and well studied combinatorial objects such as parking functions and labeled trees.Comment: 12 pages, 3 figure

    Sampling and counting genome rearrangement scenarios

    Get PDF
    Even for moderate size inputs, there are a tremendous number of optimal rearrangement scenarios, regardless what the model is and which specific question is to be answered. Therefore giving one optimal solution might be misleading and cannot be used for statistical inferring. Statistically well funded methods are necessary to sample uniformly from the solution space and then a small number of samples are sufficient for statistical inferring

    Sampling solution traces for the problem of sorting permutations by signed reversals

    Get PDF
    International audienceBackgroundTraditional algorithms to solve the problem of sorting by signed reversals output just one optimal solution while the space of all optimal solutions can be huge. A so-called trace represents a group of solutions which share the same set of reversals that must be applied to sort the original permutation following a partial ordering. By using traces, we therefore can represent the set of optimal solutions in a more compact way. Algorithms for enumerating the complete set of traces of solutions were developed. However, due to their exponential complexity, their practical use is limited to small permutations. A partial enumeration of traces is a sampling of the complete set of traces and can be an alternative for the study of distinct evolutionary scenarios of big permutations. Ideally, the sampling should be done uniformly from the space of all optimal solutions. This is however conjectured to be ♯P-complete.ResultsWe propose and evaluate three algorithms for producing a sampling of the complete set of traces that instead can be shown in practice to preserve some of the characteristics of the space of all solutions. The first algorithm (RA) performs the construction of traces through a random selection of reversals on the list of optimal 1-sequences. The second algorithm (DFALT) consists in a slight modification of an algorithm that performs the complete enumeration of traces. Finally, the third algorithm (SWA) is based on a sliding window strategy to improve the enumeration of traces. All proposed algorithms were able to enumerate traces for permutations with up to 200 elements.ConclusionsWe analysed the distribution of the enumerated traces with respect to their height and average reversal length. Various works indicate that the reversal length can be an important aspect in genome rearrangements. The algorithms RA and SWA show a tendency to lose traces with high average reversal length. Such traces are however rare, and qualitatively our results show that, for testable-sized permutations, the algorithms DFALT and SWA produce distributions which approximate the reversal length distributions observed with a complete enumeration of the set of traces

    Dynamics of Genome Rearrangement in Bacterial Populations

    Get PDF
    Genome structure variation has profound impacts on phenotype in organisms ranging from microbes to humans, yet little is known about how natural selection acts on genome arrangement. Pathogenic bacteria such as Yersinia pestis, which causes bubonic and pneumonic plague, often exhibit a high degree of genomic rearrangement. The recent availability of several Yersinia genomes offers an unprecedented opportunity to study the evolution of genome structure and arrangement. We introduce a set of statistical methods to study patterns of rearrangement in circular chromosomes and apply them to the Yersinia. We constructed a multiple alignment of eight Yersinia genomes using Mauve software to identify 78 conserved segments that are internally free from genome rearrangement. Based on the alignment, we applied Bayesian statistical methods to infer the phylogenetic inversion history of Yersinia. The sampling of genome arrangement reconstructions contains seven parsimonious tree topologies, each having different histories of 79 inversions. Topologies with a greater number of inversions also exist, but were sampled less frequently. The inversion phylogenies agree with results suggested by SNP patterns. We then analyzed reconstructed inversion histories to identify patterns of rearrangement. We confirm an over-representation of “symmetric inversions”—inversions with endpoints that are equally distant from the origin of chromosomal replication. Ancestral genome arrangements demonstrate moderate preference for replichore balance in Yersinia. We found that all inversions are shorter than expected under a neutral model, whereas inversions acting within a single replichore are much shorter than expected. We also found evidence for a canonical configuration of the origin and terminus of replication. Finally, breakpoint reuse analysis reveals that inversions with endpoints proximal to the origin of DNA replication are nearly three times more frequent. Our findings represent the first characterization of genome arrangement evolution in a bacterial population evolving outside laboratory conditions. Insight into the process of genomic rearrangement may further the understanding of pathogen population dynamics and selection on the architecture of circular bacterial chromosomes

    The inference of gene trees with species trees

    Get PDF
    Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. In this article we review the various models that have been used to describe the relationship between gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational Evolutionary Biology" conference, Montpellier, 201

    On Distance and Sorting of the Double Cut-and-Join and the Inversion-*indel* Model

    Get PDF
    Willing E. On Distance and Sorting of the Double Cut-and-Join and the Inversion-*indel* Model. Bielefeld: Universität Bielefeld; 2018.In der vergleichenden Genomik werden zwei oder mehrere Genome hinsichtlich ihres Verwandtschaftsgrades verglichen. Das Ziel dieser Arbeit ist die Erforschung von mathematischen Modellen, die zum einen die evolutionäre *Distanz*, zum anderen die evolutionären Vorgänge zwischen zwei Genomen bestimmen können. Neben Methoden, welche auf einer niedrigen Ebene, z. B. den Basen(paarungen), ansetzen, sind auch abstraktere Modelle, die auf einzelnen Genen oder noch größeren Abschnitten Genome vergleichen, etabliert. Handelt es sich auf niedrigerer Ebene um einzelne Basen, die eingefügt, gelöscht oder ersetzt werden, sind es auf höherer Ebene beispielsweise ganze Gene. Auf höherer Ebene können Ergebnisse sogenannter Umordnungsprozesse (*genome rearrangements*) beobachtet werden, welche in einem *Sortierszenario* beschrieben werden. Im Vergleich eines Genoms mit einem anderen können dies unter anderem Inversionen, Translokationen, aber auch Einfügungen oder Löschungen von großen Bereichen sein. Ein bekanntes Modell ist das *Inversionsmodell*, welches den Verwandtschaftsgrad zweier Genome ausschließlich durch Inversionen bestimmt. Ein weiteres ist das *double cut-and-join (DCJ)* Modell, welches neben Inversionen auch Translokationen, Chromosomenfusionen, bzw. -fissionen, sowie Integration und Extraktion von kleinen zirkulären Trägern erlaubt. Die Distanz ist hierbei die Anzahl Zwischenschritte eines Sortierszenarios von geringster Länge. Diese Dissertation ist in zwei Teile gegliedert. Der erste Teil beschäftigt sich mit dem zufälligen Ziehen eines Sortierszenarios innerhalb des DCJ-Modells. Neben einigen naiven Ansätzen interessieren wir uns im Wesentlichen dafür, jedes Szenario mit gleicher Wahrscheinlichkeit, also uniform verteilt, zu ziehen. Hierfür wird nicht nur der gesamte Sortierraum betrachtet, sondern auch Maßnahmen zur effizienten Berechnung aufgezeigt. Der vorgestellte Algorithmus ist in einer Software-suite implementiert und wird hinsichtlich seiner Erzeugung von zufälligen Szenarien evaluiert. Der zweite Teil der Arbeit beschäftigt sich mit dem Inversions-*indel* Modell. Dieses wenig erforschte Modell erlaubt Inversionen, sowie Einfügungen und Löschungen (kurz *indels*). Dessen Distanz soll in Abhängigkeit von der DCJ- bzw. der DCJ-*indel*-Distanz wiedergegeben werden. Wir erweitern altbekannte Datenstrukturen des Inversionsmodells um Einfügungen und Löschungen repräsentieren zu können. Hierfür benutzen wir unter anderem Ansätze aus zwei anderen Modellen: Die Erweiterung des DCJ-Modells um indels, sowie die Ermittlung der Abhängigkeit von DCJ- und Inversionsmodell. Um die minimale Anzahl an Inversionen, Einfügungen und Löschungen zu ermitteln muss beachtet werden, dass durch Inversionen zwei oder mehr getrennte Bereiche, die zur Löschung vorgesehen sind, verschmolzen werden. Diese können sodann in einem einzigen Schritt gelöscht werden. Ähnlich verhält es sich mit Einfügungen. Zunächst betrachten wir Instanzen in denen die DCJ-indel-Distanz und die Inversions-indel-Distanz identisch sind. Im Weiteren gehen wir dazu über, schwierige Instanzen, d.h. jene die mehr Schritte benötigen als die DCJ(-indel)-Distanz, zu berechnen. Zu diesen Zweck müssen die unterschiedlichen Eigenschaften der Instanzen und deren Auswirkungen ausgemacht werden. Durch geschickte Reduzierung des Lösungsraums gelangen wir zu einer Menge von Basisfällen, welche wir durch erschöpfende Aufzählung lösen können. Insgesamt bieten die unternommenen Schritte nicht nur die Lösung der Inversions-indel Distanz in Abhängigkeit zur DCJ-indel Distanz, sondern auch eine Möglichkeit des Sortierens. Die Suche nach einer exakten Lösung für das Distanz- und das Sortierproblem im Inversions-indel Modell blieb lange unbeantwortet. Der Hauptbeitrag dieser Arbeit liegt darin diese zwei Fragen zu klären

    The inference of gene trees with species trees.

    Get PDF
    This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution

    Mitochondrial genome rearrangements in the Scleractinia/Corallimorpharia complex: implications for coral phylogeny

    Get PDF
    Corallimorpharia is a small Order of skeleton-less animals that is closely related to the reef-building corals (Scleractinia) and of fundamental interest in the context of understanding the potential impacts of climate change in the future on coral reefs. The relationship between the nominal Orders Corallimorpharia and Scleractinia is controversial-the former is either the closest outgroup to the Scleractinia or alternatively is derived from corals via skeleton loss. This latter scenario, the "naked coral" hypothesis, is strongly supported by analyses based on mitochondrial (mt) protein sequences, whereas the former is equally strongly supported by analyses of mt nucleotide sequences. The "naked coral" hypothesis seeks to link skeleton loss in the putative ancestor of corallimorpharians with a period of elevated oceanic CO2 during the Cretaceous, leading to the idea that these skeleton-less animals may be harbingers for the fate of coral reefs under global climate change. In an attempt to better understand their evolutionary relationships, we examined mt genome organization in a representative range (12 species, representing 3 of the 4 extant families) of corallimorpharians and compared these patterns with other Hexacorallia. The most surprising finding was that mt genome organization in Corallimorphus profundus, a deep-water species that is the most scleractinian-like of all corallimorpharians on the basis of morphology, was much more similar to the common scleractinian pattern than to those of other corallimorpharians. This finding is consistent with the idea that C. profundus represents a key position in the coral corallimorpharian transition
    corecore