48 research outputs found

    ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis

    Full text link
    Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these probabilities to optimize and compute similarity scores. However, the Baum-Welch algorithm is computationally intensive, and existing solutions offer either software-only or hardware-only approaches with fixed pHMM designs. We identify an urgent need for a flexible, high-performance, and energy-efficient HW/SW co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs. We introduce ApHMM, the first flexible acceleration framework designed to significantly reduce both computational and energy overheads associated with the Baum-Welch algorithm for pHMMs. ApHMM tackles the major inefficiencies in the Baum-Welch algorithm by 1) designing flexible hardware to accommodate various pHMM designs, 2) exploiting predictable data dependency patterns through on-chip memory with memoization techniques, 3) rapidly filtering out negligible computations using a hardware-based filter, and 4) minimizing redundant computations. ApHMM achieves substantial speedups of 15.55x - 260.03x, 1.83x - 5.34x, and 27.97x when compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations in three key bioinformatics applications: 1) error correction, 2) protein family search, and 3) multiple sequence alignment, by 1.29x - 59.94x, 1.03x - 1.75x, and 1.03x - 1.95x, respectively, while improving their energy efficiency by 64.24x - 115.46x, 1.75x, 1.96x.Comment: Accepted to ACM TAC

    Implementing and Accelerating HMMER3 Protein Sequence Search on CUDA-Enabled GPU

    Get PDF
    The recent emergence of multi-core CPU and many-core GPU architectures has made parallel computing more accessible. Hundreds of industrial and research applications have been mapped onto GPUs to further utilize the extra computing resource. In bioinformatics, HMMER is a set of widely used applications for sequence analysis based on Hidden Markov Model. One of the tools in HMMER, hmmsearch, and the Smith-Waterman algorithm are two important tools for protein sequence analysis that use dynamic programming. Both tools are particularly well-suited for many-core GPU architecture due to the parallel nature of sequence database searches. After studying the existing research on CUDA acceleration in bioinformatics, this thesis investigated the acceleration of the key Multiple Segment Viterbi algorithm in HMMER version 3. A fully-featured CUDA-enabled protein database search tool cudaHmmsearch was designed, implemented and optimized. We demonstrated a variety of optimization strategies that are useful for general purpose GPU-based applications. Based on our optimization experience in parallel computing, six steps were summarized for optimizing performance using CUDA programming. We made comprehensive tests and analysis for multiple enhancements in our GPU kernels in order to demonstrate the effectiveness of selected approaches. The performance analysis showed that GPUs are able to deal with intensive computations, but are very sensitive to random accesses to the global memory. The results show that our implementation achieved 2.5x speedup over the single-threaded HMMER3 CPU SSE2 implementation on average

    Modern Computational Techniques for the HMMER Sequence Analysis

    Get PDF

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    Parallelization of dynamic programming recurrences in computational biology

    Get PDF
    The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms

    Characterizing and Accelerating Bioinformatics Workloads on Modern Microarchitectures

    Get PDF
    Bioinformatics, the use of computer techniques to analyze biological data, has been a particularly active research field in the last two decades. Advances in this field have contributed to the collection of enormous amounts of data, and the sheer amount of available data has started to overtake the processing capability possible with current computer systems. Clearly, computer architects need to have a better understanding of how bioinformatics applications work and what kind of architectural techniques could be used to accelerate these important scientific workloads on future processors. In this dissertation, we develop a bioinformatic benchmark suite and provide a detailed characterization of these applications in common use today from a computer architect's point of view. We analyze a wide range of detailed execution characteristics including instruction mix, IPC measurements, L1 and L2 cache misses on a real architecture; and proceed to analyze the workloads' memory access characteristics. We then concentrate on accelerating a particularly computationally intensive bioinformatics workload on the novel Cell Broadband Engine multiprocessor architecture. The HMMER workload is used for protein profile searching using hidden Markov models, and most of its execution time is spent running the Viterbi algorithm. We parallelize and partition the HMMER application to implement it on the Cell Broadband Engine. In order to run the Viterbi algorithm on the 256KB local stores of the Cell BE synergistic processing units (SPEs), we present a method to develop a fast SIMD implementation of the Viterbi algorithm that reduces the storage requirements significantly. Our HMMER implementation for the Cell BE architecture, Cell-HMMER, exploits the multiple levels of parallelism inherent in this application, and can run protein profile searches up to 27.98 times faster than a modern dual-core x86 microprocessor
    corecore