427 research outputs found

    Accelerating Short Read Mapping Using A DSP Based Coprocessor

    Get PDF
    Advances in next generation sequencing technologies have allowed short reads to be generated at an increasing rate, shifting the bottleneck of the sequencing process to the short read mapping computations. High costs and extended processing times drive researchers to pursue more efficient solutions with an overall goal of a short read mapping architecture capable of processing short reads as they are generated. Digital signal processors have shown high performance capabilities while maintaining low power consumption in a wide field of applications. This thesis explores the use of a DSP accelerated exact match short read mapping algorithm, focusing on a performance metric to increase the number of mapped bases per watt-second. The design is implemented and tested for CPU and alternate coprocessor implementation comparisons to analyze the potential benefit of accelerating a memory bound application

    Reconfigurable acceleration of genetic sequence alignment: A survey of two decades of efforts

    Get PDF
    Genetic sequence alignment has always been a computational challenge in bioinformatics. Depending on the problem size, software-based aligners can take multiple CPU-days to process the sequence data, creating a bottleneck point in bioinformatic analysis flow. Reconfigurable accelerator can achieve high performance for such computation by providing massive parallelism, but at the expense of programming flexibility and thus has not been commensurately used by practitioners. Therefore, this paper aims to provide a thorough survey of the proposed accelerators by giving a qualitative categorization based on their algorithms and speedup. A comprehensive comparison between work is also presented so as to guide selection for biologist, and to provide insight on future research direction for FPGA scientists

    Simple scalable nucleotic FPGA based short read aligner for exhaustive search of substitution errors

    Get PDF
    With the advent of the new and continuously improving technologies, in a couple of years DNA sequencing can be as commonplace as a simple blood test. The growth of sequencing efficiency has a larger exponent than the Moore’s law of standard processors, hence alignment and further processing of sequenced data is the bottleneck. The usage of FPGA (Field Programmable Gate Arrays) technology may provide an efficient alternative. We propose a simple algorithm for DNA sequence alignment, which can be realized efficiently by nucleotic principal agents of Non.Neumann nature. The prototype FPGA implementation runs on a small Terasic DE1-SoC demo board with a Cyclone V chip. We present test results and furthermore analyse the theoretical scalability of this system, showing that the execution time is independent of the length of reference genome sequences. A special advantage of this parallel algorithm is that it performs exhaustive search producing all match variants up to a predetermined number of point (mutation) errors

    SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data

    Get PDF
    Next-generation sequencing (NGS) technologies have led to a huge amount of genomic data that need to be analyzed and interpreted. This fact has a huge impact on the DNA sequence alignment process, which nowadays requires the mapping of billions of small DNA sequences onto a reference genome. In this way, sequence alignment remains the most time-consuming stage in the sequence analysis workflow. To deal with this issue, state of the art aligners take advantage of parallelization strategies. However, the existent solutions show limited scalability and have a complex implementation. In this work we introduce SparkBWA, a new tool that exploits the capabilities of a big data technology as Spark to boost the performance of one of the most widely adopted aligner, the Burrows-Wheeler Aligner (BWA). The design of SparkBWA uses two independent software layers in such a way that no modifications to the original BWA source code are required, which assures its compatibility with any BWA version (future or legacy). SparkBWA is evaluated in different scenarios showing noticeable results in terms of performance and scalability. A comparison to other parallel BWA-based aligners validates the benefits of our approach. Finally, an intuitive and flexible API is provided to NGS professionals in order to facilitate the acceptance and adoption of the new tool. The source code of the software described in this paper is publicly available at https://github.com/citiususc/SparkBWA, with a GPL3 licenseThis work was supported by Ministerio de Economía y Competitividad (Spain) (http://www.mineco.gob.es) grants TIN2013-41129-P and TIN2014-54565-JIN. There was no additional external funding received for this studyS

    FPGA-based acceleration of the RMAP short read mapping tool

    Get PDF
    Bioinformatics is a quickly emerging field. Next generation sequencing technologies are producing data up to several gigabytes per day, making bioinformatics applications increasingly computationally intensive. In order to achieve greater speeds for processing this data, various techniques have been developed. These techniques involve parallelizing algorithms and/or spreading data across many computing nodes composed of devices such as Microprocessors, Graphics Processing Units (GPUs), and Field Programmable Gate Arrays (FPGAs). In this thesis, an FPGA is used to accelerate a bioinformatics application called RMAP, which is used for Short-Read Mapping. The most computationally intensive function in RMAP, the read mapping function, is implemented on the FPGA\u27s reconfigurable hardware fabric. This is a first step in a larger effort to develop a more optimal hardware/software co-design for RMAP. The Convey HC-1 Hybrid Computing System was used as the platform for development. The short-read mapping functionality of RMAP was implemented on one of the four Xilinx Virtex 5 FPGAs available in the HC-1 system. The RMAP 2.0 software was rewritten to separate the read mapping function to facilitate its porting over to hardware. The implemented design was evaluated by varying input parameters such as genome size and number of reads. In addition, the hardware design was analyzed to find potential bottlenecks. The implementation results showed a speedup of ~5x using datasets with varying number of reads and a fixed reference genome, and ~2x using datasets with varying genome size and a fixed number of reads, for the hardware-implemented short-read mapping function of RMAP

    Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools

    Full text link
    This paper introduces a high-throughput software tool framework called {\it sam2bam} that enables users to significantly speedup pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156-186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize the multiple processors, available memory, high-bandwidth of storage, and hardware compression accelerators if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting the input data are provided by {\it plug-in} tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of NGS data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime for whole-genome sequencing data from about 20 hours to about nine minutes on the same system using up to 711 GB of memory

    SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner

    Get PDF
    To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, GEM and GPU-based aligners including BarraCUDA and CUSHAW, SOAP3-dp is two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60 percent. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1 percent FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides a scoring scheme same as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.Comment: 21 pages, 6 figures, submitted to PLoS ONE, additional files available at "https://www.dropbox.com/sh/bhclhxpoiubh371/O5CO_CkXQE". Comments most welcom