10 research outputs found

    On the PATHGROUPS approach to rapid small phylogeny

    Get PDF
    We present a data structure enabling rapid heuristic solution to the ancestral genome reconstruction problem for given phylogenies under genomic rearrangement metrics. The efficiency of the greedy algorithm is due to fast updating of the structure during run time and a simple priority scheme for choosing the next step. Since accuracy deteriorates for sets of highly divergent genomes, we investigate strategies for improving accuracy and expanding the range of data sets where accurate reconstructions can be expected. This includes a more refined priority system, and a two-step look-ahead, as well as iterative local improvements based on a the median version of the problem, incorporating simulated annealing. We apply this to a set of yeast genomes to corroborate a recent gene sequence-based phylogeny

    Gene order in rosid phylogeny, inferred from pairwise syntenies among extant genomes

    Get PDF
    BACKGROUND: Ancestral gene order reconstruction for flowering plants has lagged behind developments in yeasts, insects and higher animals, because of the recency of widespread plant genome sequencing, sequencers' embargoes on public data use, paralogies due to whole genome duplication (WGD) and fractionation of undeleted duplicates, extensive paralogy from other sources, and the computational cost of existing methods. RESULTS: We address these problems, using the gene order of four core eudicot genomes (cacao, castor bean, papaya and grapevine) that have escaped any recent WGD events, and two others (poplar and cucumber) that descend from independent WGDs, in inferring the ancestral gene order of the rosid clade and those of its main subgroups, the fabids and malvids. We improve and adapt techniques including the OMG method for extracting large, paralogy-free, multiple orthologies from conflated pairwise synteny data among the six genomes and the PATHGROUPS approach for ancestral gene order reconstruction in a given phylogeny, where some genomes may be descendants of WGD events. We use the gene order evidence to evaluate the hypothesis that the order Malpighiales belongs to the malvids rather than as traditionally assigned to the fabids. CONCLUSIONS: Gene orders of ancestral eudicot species, involving 10,000 or more genes can be reconstructed in an efficient, parsimonious and consistent way, despite paralogies due to WGD and other processes. Pairwise genomic syntenies provide appropriate input to a parameter-free procedure of multiple ortholog identification followed by gene-order reconstruction in solving instances of the "small phylogeny" problem

    Ancestral Gene Synteny Reconstruction Improves Extant Species Scaffolding

    Get PDF
    We exploit the methodological similarity between ancestral genome reconstruction and extant genome scaffolding. We present a method, called ARt-DeCo that constructs neighborhood relationships between genes or contigs, in both ancestral and extant genomes, in a phylogenetic context. It is able to handle dozens of complete genomes, including genes with complex histories, by using gene phylogenies reconciled with a species tree, that is, annotated with speciation, duplication and loss events. Reconstructed ancestral or extant synteny comes with a support computed from an exhaustive exploration of the solution space. We compare our method with a previously published one that follows the same goal on a small number of genomes with universal unicopy genes. Then we test it on the whole Ensembl database, by proposing partial ancestral genome structures, as well as a more complete scaffolding for many partially assembled genomes on 69 eukaryote species. We carefully analyze a couple of extant adjacencies proposed by our method, and show that they are indeed real links in the extant genomes, that were missing in the current assembly. On a reduced data set of 39 eutherian mammals, we estimate the precision and sensitivity of ARt-DeCo by simulating a fragmentation in some well assembled genomes, and measure how many adjacencies are recovered. We find a very high precision, while the sensitivity depends on the quality of the data and on the proximity of closely related genomes

    Phylogenetic assembly of paleogenomes integrating ancient DNA data

    Get PDF
    Luhmann N. Phylogenetic assembly of paleogenomes integrating ancient DNA data. Bielefeld: UniversitÀt Bielefeld; 2017.In comparative genomics, reconstructing the genomes of ancestral species in a given phylogeny is an important problem in order to analyze genome evolution over time. The diversity of present-day genomes in terms of local mutations and genome rearrangements allows to shed light on the dynamics of evolutionary processes that led from a common ancestor to a set of extant genomes. This speciation history is depicted in a phylogenetic tree. Comparative genome reconstruction methods aim to infer genomic features such as an order of markers (e.g. genes) for extinct species at internal nodes of the tree by applying different evolutionary models, relying only on the information available for the extant genomes at the leaves of the phylogenetic tree. Recently, the steady progress in sequencing technologies led to the emergence of the field of paleogenomics, where the study of ancient DNA (aDNA) found in conserved organic material is moving rapidly towards the sequencing and analysis of complete paleogenomes. Such ''genetic time travel'' allows direct insight into specific phases of the evolution of specific genomes that are not only implicitly inferred from extant DNA sequences. However, as DNA is naturally degraded over time after the death of an organism and environmental conditions interfere with the conservation of DNA material, an assembly of these paleogenomes is usually fragmented, preventing a detailed analysis of genome rearrangements along the branches of the phylogenetic tree. In this thesis, we aim to combine the study of aDNA and comparative ancestral reconstruction in a phylogenetic framework. The comparison with extant related genomes can naturally assist in scaffolding a fragmented aDNA assembly, while the aDNA sequencing data can be included as an additional source of information for comparative reconstruction methods to improve the reconstructions of all related genomes in the phylogenetic tree. Our first focus is on integrative methods to reconstruct marker orders globally in a phylogeny under the assumption of parsimony. An underlying rearrangement model can describe the evolutionary operations that occurred along the edges of the tree. However, as much as complex rearrangement scenarios can give insights into underlying biological mechanisms during evolution, from an computational point of view the ancestral reconstruction problem under rearrangement distances is an NP-hard problem. One exception is the Single-Cut-or-Join (SCJ) distance, that uses a marker order-based representation of the involved genomes to model the cut and join of marker adjacencies as evolutionary operations. We build upon this rearrangement model and describe parsimony-based reconstruction methods aiming to minimize the SCJ distance in the tree. In addition, we require the reconstructed solutions to be consistent, such that they represent linear or circular regions of the ancestral genome. Our first polynomial-time method is based on the Sankoff-Rousseau algorithm and directly includes an aDNA assembly graph at one internal node of the tree. We show that including branch lengths in the underlying tree can avoid ambiguity in practice. Our second approach follows a more general strategy and includes the aDNA sequencing data as local weights for adjacencies next to the SCJ distance in the objective. We describe a fixed-parameter-tractable algorithm that also allows to sample co-optimal solutions. Finally, we describe an approach to fill gaps between potentially adjacent markers by aDNA data to reconstruct the complete genome sequence of a paleogenome guided by the related extant genome sequences. In addition, this approach enables us to select the adjacencies that are supported by the sequencing information from sets of conflicting adjacencies. We evaluate our proposed models and algorithms on simulated and biological data. In particular, we integrate two aDNA sequencing data sets for ancient strains of the pathogen Yersinia pestis, that is understood to be the cause of several pandemics in medieval times. We show that the combination of aDNA sequencing reads and a parsimonious reconstruction in the phylogenetic tree reduces the fragmentation of an initial aDNA assembly substantially and explore alternative reconstructions to emphasize reliably reconstructed regions of the ancient genomes

    Algorithmes pour la reconstruction de génomes ancestraux

    Full text link
    L’infĂ©rence de gĂ©nomes ancestraux est une Ă©tape essentielle pour l’étude de l’évolution des gĂ©nomes. Connaissant les gĂ©nomes d’espĂšces Ă©teintes, on peut proposer des mĂ©canismes biologiques expliquant les divergences entre les gĂ©nomes des espĂšces modernes. Diverses mĂ©thodes visant Ă  rĂ©soudre ce problĂšme existent, se classant parmis deux grandes catĂ©gories : les mĂ©thodes de distance et les mĂ©thodes de syntĂ©nie. L’état de l’art des distances gĂ©nomiques ne permettant qu’un certain rĂ©pertoire de rĂ©arrangements pour le moment, les mĂ©thodes de syntĂ©nie sont donc plus appropriĂ©es en pratique. Nous proposons une mĂ©thode de syntĂ©nie pour la reconstruction de gĂ©nomes ancestraux basĂ©e sur une dĂ©finition relaxĂ©e d’adjacences de gĂšnes, permettant un contenu en gĂšne inĂ©gal dans les gĂ©nomes modernes causĂ© par des pertes de gĂšnes de mĂȘme que des duplications de gĂ©nomes entiers (DGE). Des simulations sont effectuĂ©es, dĂ©montrant une capacitĂ© de former une solution assemblĂ©e en un nombre rĂ©duit de rĂ©gions ancestrales contigĂŒes par rapport Ă  d’autres mĂ©thodes tout en gardant une bonne fiabilitĂ©. Des applications sur des donnĂ©es de levures et de plantes cĂ©rĂ©aliĂšres montrent des rĂ©sultats en accord avec d’autres publications, notamment la prĂ©sence de fusion imbriquĂ©e de chromosomes pendant l’évolution des cĂ©rĂ©ales.Ancestral genome inference is a decisive step for studying genome evolution. Knowing genomes from extinct species, one can propose biological mecanisms explaining divergences between extant species genomes. Various methods classified in two categories have been developped : distance based methods and synteny based methods. The state of the art of distance based methods only permit a certain repertoire of genomic rearrangements, thus synteny based methods are more appropriate in practice for the time being. We propose a synteny method for ancestral genome reconstruction based on a relaxed defenition of gene adjacencies, permitting unequal gene content in extant genomes caused by gene losses and whole genome duplications (WGD). Simulations results demonstrate our method’s ability to form a more assembled solution rather than a collection of contiguous ancestral regions (CAR) with respect to other methods, while maintaining a good reliability. Applications on data sets from yeasts and cereal species show results agreeing with other publications, notably the existence of nested chromosome fusion during the evolution of cereals
    corecore