542 research outputs found

    FPGA acceleration of DNA sequence alignment: design analysis and optimization

    Get PDF
    Existing FPGA accelerators for short read mapping often fail to utilize the complete biological information in sequencing data for simple hardware design, leading to missed or incorrect alignment. In this work, we propose a runtime reconfigurable alignment pipeline that considers all information in sequencing data for the biologically accurate acceleration of short read mapping. We focus our efforts on accelerating two string matching techniques: FM-index and the Smith-Waterman algorithm with the affine-gap model which are commonly used in short read mapping. We further optimize the FPGA hardware using a design analyzer and merger to improve alignment performance. The contributions of this work are as follows. 1. We accelerate the exact-match and mismatch alignment by leveraging the FM-index technique. We optimize memory access by compressing the data structure and interleaving the access with multiple short reads. The FM-index hardware also considers complete information in the read data to maximize accuracy. 2. We propose a seed-and-extend model to accelerate alignment with indels. The FM-index hardware is extended to support the seeding stage while a Smith-Waterman implementation with the affine-gap model is developed on FPGA for the extension stage. This model can improve the efficiency of indel alignment with comparable accuracy versus state-of-the-art software. 3. We present an approach for merging multiple FPGA designs into a single hardware design, so that multiple place-and-route tasks can be replaced by a single task to speed up functional evaluation of designs. We first experiment with this approach to demonstrate its feasibility for different designs. Then we apply this approach to optimize one of the proposed FPGA aligners for better alignment performance.Open Acces

    Accelerating dynamic programming

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 129-136).Dynamic Programming (DP) is a fundamental problem-solving technique that has been widely used for solving a broad range of search and optimization problems. While DP can be invoked when more specialized methods fail, this generality often incurs a cost in efficiency. We explore a unifying toolkit for speeding up DP, and algorithms that use DP as subroutines. Our methods and results can be summarized as follows. - Acceleration via Compression. Compression is traditionally used to efficiently store data. We use compression in order to identify repeats in the table that imply a redundant computation. Utilizing these repeats requires a new DP, and often different DPs for different compression schemes. We present the first provable speedup of the celebrated Viterbi algorithm (1967) that is used for the decoding and training of Hidden Markov Models (HMMs). Our speedup relies on the compression of the HMM's observable sequence. - Totally Monotone Matrices. It is well known that a wide variety of DPs can be reduced to the problem of finding row minima in totally monotone matrices. We introduce this scheme in the context of planar graph problems. In particular, we show that planar graph problems such as shortest paths, feasible flow, bipartite perfect matching, and replacement paths can be accelerated by DPs that exploit a total-monotonicity property of the shortest paths. - Combining Compression and Total Monotonicity. We introduce a method for accelerating string edit distance computation by combining compression and totally monotone matrices.(cont.) In the heart of this method are algorithms for computing the edit distance between two straight-line programs. These enable us to exploits the compressibility of strings, even if each string is compressed using a different compression scheme. - Partial Tables. In typical DP settings, a table is filled in its entirety, where each cell corresponds to some subproblem. In some cases, by changing the DP, it is possible to compute asymptotically less cells of the table. We show that [theta](n³) subproblems are both necessary and sufficient for computing the similarity between two trees. This improves all known solutions and brings the idea of partial tables to its full extent. - Fractional Subproblems. In some DPs, the solution to a subproblem is a data structure rather than a single value. The entire data structure of a subproblem is then processed and used to construct the data structure of larger subproblems. We suggest a method for reusing parts of a subproblem's data structure. In some cases, such fractional parts remain unchanged when constructing the data structure of larger subproblems. In these cases, it is possible to copy this part of the data structure to the larger subproblem using only a constant number of pointer changes. We show how this idea can be used for finding the optimal tree searching strategy in linear time. This is a generalization of the well known binary search technique from arrays to trees.by Oren Weimann.Ph.D

    Small space and streaming pattern matching with k edits

    Full text link
    In this work, we revisit the fundamental and well-studied problem of approximate pattern matching under edit distance. Given an integer kk, a pattern PP of length mm, and a text TT of length nmn \ge m, the task is to find substrings of TT that are within edit distance kk from PP. Our main result is a streaming algorithm that solves the problem in O~(k5)\tilde{O}(k^5) space and O~(k8)\tilde{O}(k^8) amortised time per character of the text, providing answers correct with high probability. (Hereafter, O~()\tilde{O}(\cdot) hides a poly(logn)\mathrm{poly}(\log n) factor.) This answers a decade-old question: since the discovery of a poly(klogn)\mathrm{poly}(k\log n)-space streaming algorithm for pattern matching under Hamming distance by Porat and Porat [FOCS 2009], the existence of an analogous result for edit distance remained open. Up to this work, no poly(klogn)\mathrm{poly}(k\log n)-space algorithm was known even in the simpler semi-streaming model, where TT comes as a stream but PP is available for read-only access. In this model, we give a deterministic algorithm that achieves slightly better complexity. In order to develop the fully streaming algorithm, we introduce a new edit distance sketch parametrised by integers nkn\ge k. For any string of length at most nn, the sketch is of size O~(k2)\tilde{O}(k^2) and it can be computed with an O~(k2)\tilde{O}(k^2)-space streaming algorithm. Given the sketches of two strings, in O~(k3)\tilde{O}(k^3) time we can compute their edit distance or certify that it is larger than kk. This result improves upon O~(k8)\tilde{O}(k^8)-size sketches of Belazzougui and Zhu [FOCS 2016] and very recent O~(k3)\tilde{O}(k^3)-size sketches of Jin, Nelson, and Wu [STACS 2021]
    corecore