51 research outputs found

    Ancestral Gene Synteny Reconstruction Improves Extant Species Scaffolding

    Get PDF
    We exploit the methodological similarity between ancestral genome reconstruction and extant genome scaffolding. We present a method, called ARt-DeCo that constructs neighborhood relationships between genes or contigs, in both ancestral and extant genomes, in a phylogenetic context. It is able to handle dozens of complete genomes, including genes with complex histories, by using gene phylogenies reconciled with a species tree, that is, annotated with speciation, duplication and loss events. Reconstructed ancestral or extant synteny comes with a support computed from an exhaustive exploration of the solution space. We compare our method with a previously published one that follows the same goal on a small number of genomes with universal unicopy genes. Then we test it on the whole Ensembl database, by proposing partial ancestral genome structures, as well as a more complete scaffolding for many partially assembled genomes on 69 eukaryote species. We carefully analyze a couple of extant adjacencies proposed by our method, and show that they are indeed real links in the extant genomes, that were missing in the current assembly. On a reduced data set of 39 eutherian mammals, we estimate the precision and sensitivity of ARt-DeCo by simulating a fragmentation in some well assembled genomes, and measure how many adjacencies are recovered. We find a very high precision, while the sensitivity depends on the quality of the data and on the proximity of closely related genomes

    Research Article Computation of Perfect DCJ Rearrangement Scenarios with Linear and Circular Chromosomes

    No full text
    We study the problem of transforming a multichromosomal genome into another using Double Cut-and-Join (DCJ) operations, which simulates several types of rearrangements, as reversals, translocations, and block-interchanges. We introduce the notion of a DCJ scenario that does not break families of common intervals (groups of genes co-localized in both genomes). Such scenarios are called perfect, and their properties are well known when the only considered rearrangements are reversals. We show that computing the minimum perfect DCJ rearrangement scenario is NP-hard, and describe an exact algorithm which exponential running time is bounded in terms of a specific pattern used in the NPcompleteness proof. The study of perfect DCJ rearrangement leads to some surprising properties. The DCJ model has often yielded algorithmic problems which complexities are comparable to the reversal-only model. In the perfect rearrangement framework, however, while perfect sorting by reversals is NP-hard if the family of common intervals to be preserved is nested, we show that finding a shortest perfect DCJ scenario can be answered in polynomial time in this case. Conversely, while perfect sorting by reversals is tractable when the family of common intervals is weakly separable, we show that the corresponding problem is still NP-hard in the DCJ case. This shows that despite the similarity of the two operations, easy patterns for revervals are hard ones for DCJ, and vice versa. 1

    Aligning the unalignable: bacteriophage whole genome alignments

    Get PDF
    International audienceBackgroundIn recent years, many studies focused on the description and comparison of large sets of related bacteriophage genomes. Due to the peculiar mosaic structure of these genomes, few informative approaches for comparing whole genomes exist: dot plots diagrams give a mostly qualitative assessment of the similarity/dissimilarity between two or more genomes, and clustering techniques are used to classify genomes. Multiple alignments are conspicuously absent from this scene. Indeed, whole genome aligners interpret lack of similarity between sequences as an indication of rearrangements, insertions, or losses. This behavior makes them ill-prepared to align bacteriophage genomes, where even closely related strains can accomplish the same biological function with highly dissimilar sequences.ResultsIn this paper, we propose a multiple alignment strategy that exploits functional collinearity shared by related strains of bacteriophages, and uses partial orders to capture mosaicism of sets of genomes. As classical alignments do, the computed alignments can be used to predict that genes have the same biological function, even in the absence of detectable similarity. The Alpha aligner implements these ideas in visual interactive displays, and is used to compute several examples of alignments of Staphylococcus aureus and Mycobacterium bacteriophages, involving up to 29 genomes. Using these datasets, we prove that Alpha alignments are at least as good as those computed by standard aligners. Comparison with the progressiveMauve aligner – which implements a partial order strategy, but whose alignments are linearized – shows a greatly improved interactive graphic display, while avoiding misalignments.ConclusionsMultiple alignments of whole bacteriophage genomes work, and will become an important conceptual and visual tool in comparative genomics of sets of related strains.A python implementation of Alpha, along with installation instructions for Ubuntu and OSX, is available on bitbucket (https://​bitbucket.​org/​thekswenson/​alpha)
    • …
    corecore