15 research outputs found

    Validating Paired-End Read Alignments in Sequence Graphs

    Get PDF
    Graph based non-linear reference structures such as variation graphs and colored de Bruijn graphs enable incorporation of full genomic diversity within a population. However, transitioning from a simple string-based reference to graphs requires addressing many computational challenges, one of which concerns accurately mapping sequencing read sets to graphs. Paired-end Illumina sequencing is a commonly used sequencing platform in genomics, where the paired-end distance constraints allow disambiguation of repeats. Many recent works have explored provably good index-based and alignment-based strategies for mapping individual reads to graphs. However, validating distance constraints efficiently over graphs is not trivial, and existing sequence to graph mappers rely on heuristics. We introduce a mathematical formulation of the problem, and provide a new algorithm to solve it exactly. We take advantage of the high sparsity of reference graphs, and use sparse matrix-matrix multiplications (SpGEMM) to build an index which can be queried efficiently by a mapping algorithm for validating the distance constraints. Effectiveness of the algorithm is demonstrated using real reference graphs, including a human MHC variation graph, and a pan-genome de-Bruijn graph built using genomes of 20 B. anthracis strains. While the one-time indexing time can vary from a few minutes to a few hours using our algorithm, answering a million distance queries takes less than a second

    Safely Filling Gaps with Partial Solutions Common to All Solutions

    Get PDF
    Gap filling has emerged as a natural sub-problem of many de novo genome assembly projects. The gap filling problem generally asks for an s-t path in an assembly graph whose length matches the gap length estimate. Several methods have addressed it, but only few have focused on strategies for dealing with multiple gap filling solutions and for guaranteeing reliable results. Such strategies include reporting only unique solutions, or exhaustively enumerating all filling solutions and heuristically creating their consensus. Our main contribution is a new method for reliable gap filling: filling gaps with those sub-paths common to all gap filling solutions. We call these partial solutions safe, following the framework of (Tomescu and Medvedev, RECOMB 2016). We give an efficient safe algorithm running in O(dm) time and space, where d is the gap length estimate and m is the number of edges of the assembly graph. To show the benefits of this method, we implemented this algorithm for the problem of filling gaps in scaffolds. Our experimental results on bacterial and on conservative human assemblies show that, on average, our method can retrieve over 73 percent more safe and correct bases as compared to previous methods, with a similar precision.Peer reviewe

    Diagnosability of labeled Dp\mathfrak{D_p} automata

    Full text link
    In this paper, we formulate a notion of diagnosability for labeled weighted automata over a class of dioids which admit both positive and negative numbers as well as vectors. The weights can represent diverse physical meanings such as time elapsing and position deviations. We also develop an original tool called concurrent composition to verify diagnosability for such automata. These results are fundamentally new compared with the existing ones in the literature.Comment: 28 pages, 7 figure

    LNCS

    Get PDF
    In the analysis of reactive systems a quantitative objective assigns a real value to every trace of the system. The value decision problem for a quantitative objective requires a trace whose value is at least a given threshold, and the exact value decision problem requires a trace whose value is exactly the threshold. We compare the computational complexity of the value and exact value decision problems for classical quantitative objectives, such as sum, discounted sum, energy, and mean-payoff for two standard models of reactive systems, namely, graphs and graph games

    An A* Algorithm for Flight Planning Based on Idealized Vertical Profiles

    Get PDF
    The Flight Planning Problem is to find a minimum fuel trajectory between two airports in a 3D airway network under consideration of the wind. We show that this problem is NP-hard, even in its most basic version. We then present a novel A* heuristic, whose potential function is derived from an idealized vertical profile over the remaining flight distance. This potential is, under rather general assumptions, both admissible and consistent and it can be computed efficiently. The method outperforms the state-of-the-art heuristic on real-life instances

    State-Based Opacity of Real-Time Automata

    Get PDF

    Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Next-generation sequencing technologies allow genomes to be sequenced more quickly and less expensively than ever before. However, as sequencing technology has improved, the difficulty of <it>de novo </it>genome assembly has increased, due in large part to the shorter reads generated by the new technologies. The use of mated sequences (referred to as mate-pairs) is a standard means of disambiguating assemblies to obtain a more complete picture of the genome without resorting to manual finishing. Here, we examine the effectiveness of mate-pair information in resolving repeated sequences in the DNA (a paramount issue to overcome). While it has been empirically accepted that mate-pairs improve assemblies, and a variety of assemblers use mate-pairs in the context of repeat resolution, the effectiveness of mate-pairs in this context has not been systematically evaluated in previous literature.</p> <p>Results</p> <p>We show that, in high-coverage prokaryotic assemblies, libraries of short mate-pairs (about 4-6 times the read-length) more effectively disambiguate repeat regions than the libraries that are commonly constructed in current genome projects. We also demonstrate that the best assemblies can be obtained by 'tuning' mate-pair libraries to accommodate the specific repeat structure of the genome being assembled - information that can be obtained through an initial assembly using unpaired reads. These results are shown across 360 simulations on 'ideal' prokaryotic data as well as assembly of 8 bacterial genomes using SOAPdenovo. The simulation results provide an upper-bound on the potential value of mate-pairs for resolving repeated sequences in real prokaryotic data sets. The assembly results show that our method of tuning mate-pairs exploits fundamental properties of these genomes, leading to better assemblies even when using an off -the-shelf assembler in the presence of base-call errors.</p> <p>Conclusions</p> <p>Our results demonstrate that dramatic improvements in prokaryotic genome assembly quality can be achieved by tuning mate-pair sizes to the actual repeat structure of a genome, suggesting the possible need to change the way sequencing projects are designed. We propose that a two-tiered approach - first generate an assembly of the genome with unpaired reads in order to evaluate the repeat structure of the genome; then generate the mate-pair libraries that provide most information towards the resolution of repeats in the genome being assembled - is not only possible, but likely also more cost-effective as it will significantly reduce downstream manual finishing costs. In future work we intend to address the question of whether this result can be extended to larger eukaryotic genomes, where repeat structure can be quite different.</p
    corecore