4,476 research outputs found

    Protein alignment HW/SW optimizations

    Get PDF
    Biosequence alignment recently received an amazing support from both commodity and dedicated hardware platforms. The limitless requirements of this application motivate the search for improved implementations to boost processing time and capabilities. We propose an unprecedented hardware improvement to the classic Smith-Waterman (S-W) algorithm based on a twofold approach: i) an on-the-fly gap-open/gap-extension selection that reduces the hardware implementation complexity; ii) a pre-selection filter that uses reduced amino-acid alphabets to screen out not-significant sequences and to shorten the S-Witerations on huge reference databases.We demonstrated the improvements w.r.t. a classic approach both from the point of view of algorithm efficiency and of HW performance (FPGA and ASIC post-synthesis analysis)

    FPGA acceleration of sequence analysis tools in bioinformatics

    Full text link
    Thesis (Ph.D.)--Boston UniversityWith advances in biotechnology and computing power, biological data are being produced at an exceptional rate. The purpose of this study is to analyze the application of FPGAs to accelerate high impact production biosequence analysis tools. Compared with other alternatives, FPGAs offer huge compute power, lower power consumption, and reasonable flexibility. BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. It is a complex highly-optimized system, consisting of tens of thousands of lines of code and a large number of heuristics. Our idea is to emulate the main phases of its algorithm on FPGA. Utilizing our FPGA engine, we quickly reduce the size of the database to a small fraction, and then use the original code to process the query. Using a standard FPGA-based system, we achieved 12x speedup over a highly optimized multithread reference code. Multiple Sequence Alignment (MSA)--the extension of pairwise Sequence Alignment to multiple Sequences--is critical to solve many biological problems. Previous attempts to accelerate Clustal-W, the most commonly used MSA code, have directly mapped a portion of the code to the FPGA. We use a new approach: we apply prefiltering of the kind commonly used in BLAST to perform the initial all-pairs alignments. This results in a speedup of from 8Ox to 190x over the CPU code (8 cores). The quality is comparable to the original according to a commonly used benchmark suite evaluated with respect to multiple distance metrics. The challenge in FPGA-based acceleration is finding a suitable application mapping. Unfortunately many software heuristics do not fall into this category and so other methods must be applied. One is restructuring: an entirely new algorithm is applied. Another is to analyze application utilization and develop accuracy/performance tradeoffs. Using our prefiltering approach and novel FPGA programming models we have achieved significant speedup over reference programs. We have applied approximation, seeding, and filtering to this end. The bulk of this study is to introduce the pros and cons of these acceleration models for biosequence analysis tools

    Accelerated Profile HMM Searches

    Get PDF
    Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches

    Reconfigurable acceleration of genetic sequence alignment: A survey of two decades of efforts

    Get PDF
    Genetic sequence alignment has always been a computational challenge in bioinformatics. Depending on the problem size, software-based aligners can take multiple CPU-days to process the sequence data, creating a bottleneck point in bioinformatic analysis flow. Reconfigurable accelerator can achieve high performance for such computation by providing massive parallelism, but at the expense of programming flexibility and thus has not been commensurately used by practitioners. Therefore, this paper aims to provide a thorough survey of the proposed accelerators by giving a qualitative categorization based on their algorithms and speedup. A comprehensive comparison between work is also presented so as to guide selection for biologist, and to provide insight on future research direction for FPGA scientists

    Design and analysis of an accelerated seed generation stage for BLASTP on the Mercury system - Master\u27s Thesis, August 2006

    Get PDF
    NCBI BLASTP is a popular sequence analysis tool used to study the evolutionary relationship between two protein sequences. Protein databases continue to grow exponentially as entire genomes of organisms are sequenced, making sequence analysis a computationally demanding task. For example, a search of the E. coli. k12 proteome against the GenBank Non-Redundant database takes 36 hours on a standard workstation. In this thesis, we look to address the problem by accelerating protein searching using Field Programmable Gate Arrays. We focus our attention on the BLASTP heuristic, building on work done earlier to accelerate DNA searching on the Mercury platform. We analyze the performance characteristics of the BLASTP algorithm and explore the design space of the seed generation stage in detail. We propose a hardware/software architecture and evaluate the performance of the individual stage, and its effect on the overall BLASTP pipeline running on the Mercury system. The seed generation stage is 13x faster than the software equivalent, and the integrated BLASTP pipeline is predicted to yield a speedup of 50x over NCBI BLASTP. Mercury BLASTP also shows a 2.5x speed improvement over the only other BLASTP-like accelerator for FPGAs while consuming far fewer logic resources

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

    FPGA acceleration of DNA sequence alignment: design analysis and optimization

    Get PDF
    Existing FPGA accelerators for short read mapping often fail to utilize the complete biological information in sequencing data for simple hardware design, leading to missed or incorrect alignment. In this work, we propose a runtime reconfigurable alignment pipeline that considers all information in sequencing data for the biologically accurate acceleration of short read mapping. We focus our efforts on accelerating two string matching techniques: FM-index and the Smith-Waterman algorithm with the affine-gap model which are commonly used in short read mapping. We further optimize the FPGA hardware using a design analyzer and merger to improve alignment performance. The contributions of this work are as follows. 1. We accelerate the exact-match and mismatch alignment by leveraging the FM-index technique. We optimize memory access by compressing the data structure and interleaving the access with multiple short reads. The FM-index hardware also considers complete information in the read data to maximize accuracy. 2. We propose a seed-and-extend model to accelerate alignment with indels. The FM-index hardware is extended to support the seeding stage while a Smith-Waterman implementation with the affine-gap model is developed on FPGA for the extension stage. This model can improve the efficiency of indel alignment with comparable accuracy versus state-of-the-art software. 3. We present an approach for merging multiple FPGA designs into a single hardware design, so that multiple place-and-route tasks can be replaced by a single task to speed up functional evaluation of designs. We first experiment with this approach to demonstrate its feasibility for different designs. Then we apply this approach to optimize one of the proposed FPGA aligners for better alignment performance.Open Acces

    Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.

    Full text link
    The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data. This in turn requires massive computational resources. Therefore, high performance computing (HPC) platforms are highly needed as well as efficient and scalable algorithms that can take advantage of these platforms. In this paper, we survey the state-of-the-art HPC platforms for big biological data analytics. We first list the characteristics of big biological data and popular computing platforms. Then we provide a taxonomy of different biological data analysis applications and a survey of the way they have been mapped onto various computing platforms. After that, we present a case study to compare the efficiency of different computing platforms for handling the classical biological sequence alignment problem. At last we discuss the open issues in big biological data analytics
    • …
    corecore