5,555 research outputs found

    Whole Genome Phylogenetic Tree Reconstruction Using Colored de Bruijn Graphs

    Full text link
    We present kleuren, a novel assembly-free method to reconstruct phylogenetic trees using the Colored de Bruijn Graph. kleuren works by constructing the Colored de Bruijn Graph and then traversing it, finding bubble structures in the graph that provide phylogenetic signal. The bubbles are then aligned and concatenated to form a supermatrix, from which a phylogenetic tree is inferred. We introduce the algorithms that kleuren uses to accomplish this task, and show its performance on reconstructing the phylogenetic tree of 12 Drosophila species. kleuren reconstructed the established phylogenetic tree accurately, and is a viable tool for phylogenetic tree reconstruction using whole genome sequences. Software package available at: https://github.com/Colelyman/kleurenComment: 6 pages, 3 figures, accepted at BIBE 2017. Minor modifications to the text due to reviewer feedback and fixed typo

    The Collatz conjecture and De Bruijn graphs

    Full text link
    We study variants of the well-known Collatz graph, by considering the action of the 3n+1 function on congruence classes. For moduli equal to powers of 2, these graphs are shown to be isomorphic to binary De Bruijn graphs. Unlike the Collatz graph, these graphs are very structured, and have several interesting properties. We then look at a natural generalization of these finite graphs to the 2-adic integers, and show that the isomorphism between these infinite graphs is exactly the conjugacy map previously studied by Bernstein and Lagarias. Finally, we show that for generalizations of the 3n+1 function, we get similar relations with 2-adic and p-adic De Bruijn graphs.Comment: 9 pages, 8 figure

    Cerulean: A hybrid assembly using high throughput short and long reads

    Full text link
    Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking regions. The emergence of third generation sequencing (Pacific Biosciences) with long reads enables the opportunity to resolve complicated repeats that could not be resolved by the short read data. However, these long reads have high error rate and it is an uphill task to assemble the genome without using additional high quality short reads. Recently, Koren et al. 2012 proposed an approach to use high quality short reads data to correct these long reads and, thus, make the assembly from long reads possible. However, due to the large size of both dataset (short and long reads), error-correction of these long reads requires excessively high computational resources, even on small bacterial genomes. In this work, instead of error correction of long reads, we first assemble the short reads and later map these long reads on the assembly graph to resolve repeats. Contribution: We present a hybrid assembly approach that is both computationally effective and produces high quality assemblies. Our algorithm first operates with a simplified version of the assembly graph consisting only of long contigs and gradually improves the assembly by adding smaller contigs in each iteration. In contrast to the state-of-the-art long reads error correction technique, which requires high computational resources and long running time on a supercomputer even for bacterial genome datasets, our software can produce comparable assembly using only a standard desktop in a short running time.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

    Number of cycles in the graph of 312-avoiding permutations

    Get PDF
    The graph of overlapping permutations is defined in a way analogous to the De Bruijn graph on strings of symbols. That is, for every permutation π=π1π2...πn+1\pi = \pi_{1} \pi_{2} ... \pi_{n+1} there is a directed edge from the standardization of π1π2...πn\pi_{1} \pi_{2} ... \pi_{n} to the standardization of π2π3...πn+1\pi_{2} \pi_{3} ... \pi_{n+1}. We give a formula for the number of cycles of length dd in the subgraph of overlapping 312-avoiding permutations. Using this we also give a refinement of the enumeration of 312-avoiding affine permutations and point out some open problems on this graph, which so far has been little studied.Comment: To appear in the Journal of Combinatorial Theory - Series

    Telescoper: de novo assembly of highly repetitive regions.

    Get PDF
    MotivationWith advances in sequencing technology, it has become faster and cheaper to obtain short-read data from which to assemble genomes. Although there has been considerable progress in the field of genome assembly, producing high-quality de novo assemblies from short-reads remains challenging, primarily because of the complex repeat structures found in the genomes of most higher organisms. The telomeric regions of many genomes are particularly difficult to assemble, though much could be gained from the study of these regions, as their evolution has not been fully characterized and they have been linked to aging.ResultsIn this article, we tackle the problem of assembling highly repetitive regions by developing a novel algorithm that iteratively extends long paths through a series of read-overlap graphs and evaluates them based on a statistical framework. Our algorithm, Telescoper, uses short- and long-insert libraries in an integrated way throughout the assembly process. Results on real and simulated data demonstrate that our approach can effectively resolve much of the complex repeat structures found in the telomeres of yeast genomes, especially when longer long-insert libraries are used.AvailabilityTelescoper is publicly available for download at sourceforge.net/p/[email protected] informationSupplementary data are available at Bioinformatics online

    Perfect Necklaces

    Full text link
    We introduce a variant of de Bruijn words that we call perfect necklaces. Fix a finite alphabet. Recall that a word is a finite sequence of symbols in the alphabet and a circular word, or necklace, is the equivalence class of a word under rotations. For positive integers k and n, we call a necklace (k,n)-perfect if each word of length k occurs exactly n times at positions which are different modulo n for any convention on the starting point. We call a necklace perfect if it is (k,k)-perfect for some k. We prove that every arithmetic sequence with difference coprime with the alphabet size induces a perfect necklace. In particular, the concatenation of all words of the same length in lexicographic order yields a perfect necklace. For each k and n, we give a closed formula for the number of (k,n)-perfect necklaces. Finally, we prove that every infinite periodic sequence whose period coincides with some (k,n)-perfect necklace for any n, passes all statistical tests of size up to k, but not all larger tests. This last theorem motivated this work
    corecore