2,833 research outputs found

    On Greedy Algorithms for Binary de Bruijn Sequences

    Full text link
    We propose a general greedy algorithm for binary de Bruijn sequences, called Generalized Prefer-Opposite (GPO) Algorithm, and its modifications. By identifying specific feedback functions and initial states, we demonstrate that most previously-known greedy algorithms that generate binary de Bruijn sequences are particular cases of our new algorithm

    On Binary de Bruijn Sequences from LFSRs with Arbitrary Characteristic Polynomials

    Full text link
    We propose a construction of de Bruijn sequences by the cycle joining method from linear feedback shift registers (LFSRs) with arbitrary characteristic polynomial f(x)f(x). We study in detail the cycle structure of the set Ω(f(x))\Omega(f(x)) that contains all sequences produced by a specific LFSR on distinct inputs and provide a fast way to find a state of each cycle. This leads to an efficient algorithm to find all conjugate pairs between any two cycles, yielding the adjacency graph. The approach is practical to generate a large class of de Bruijn sequences up to order n≈20n \approx 20. Many previously proposed constructions of de Bruijn sequences are shown to be special cases of our construction

    Extreme Scale De Novo Metagenome Assembly

    Full text link
    Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require big shared memory machines and cannot handle contemporary metagenome datasets that exceed Terabytes in size. In this paper, we introduce the MetaHipMer pipeline, a high-quality and high-performance metagenome assembler that employs an iterative de Bruijn graph approach. MetaHipMer leverages a specialized scaffolding algorithm that produces long scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is end-to-end parallelized using the Unified Parallel C language and therefore can run seamlessly on shared and distributed-memory systems. Experimental results show that MetaHipMer matches or outperforms the state-of-the-art tools in terms of accuracy. Moreover, MetaHipMer scales efficiently to large concurrencies and is able to assemble previously intractable grand challenge metagenomes. We demonstrate the unprecedented capability of MetaHipMer by computing the first full assembly of the Twitchell Wetlands dataset, consisting of 7.5 billion reads - size 2.6 TBytes.Comment: Accepted to SC1

    Boltzmann samplers for random generation of lambda terms

    Get PDF
    Randomly generating structured objects is important in testing and optimizing functional programs, whereas generating random ′l'l-terms is more specifically needed for testing and optimizing compilers. For that a tool called QuickCheck has been proposed, but in this tool the control of the random generation is left to the programmer. Ten years ago, a method called Boltzmann samplers has been proposed to generate combinatorial structures. In this paper, we show how Boltzmann samplers can be developed to generate lambda-terms, but also other data structures like trees. These samplers rely on a critical value which parameters the main random selector and which is exhibited here with explanations on how it is computed. Haskell programs are proposed to show how samplers are actually implemented

    A polynomial delay algorithm for the enumeration of bubbles with length constraints in directed graphs and its application to the detection of alternative splicing in RNA-seq data

    Full text link
    We present a new algorithm for enumerating bubbles with length constraints in directed graphs. This problem arises in transcriptomics, where the question is to identify all alternative splicing events present in a sample of mRNAs sequenced by RNA-seq. This is the first polynomial-delay algorithm for this problem and we show that in practice, it is faster than previous approaches. This enables us to deal with larger instances and therefore to discover novel alternative splicing events, especially long ones, that were previously overseen using existing methods.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013
    • …
    corecore