Search CORE

5 research outputs found

FPGA acceleration of reference-based compression for genomic data

Author: Arram J
Kaplan T
Luk W
Pflanzer M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/12/2015
Field of study

One of the key challenges facing genomics today is efficiently storing the massive amounts of data generated by next-generation sequencing platforms. Reference-based compression is a popular strategy for reducing the size of genomic data, whereby sequence information is encoded as a mapping to a known reference sequence. Determining the mapping is a computationally intensive problem, and is the bottleneck of most reference-based compression tools currently available. This paper presents the first FPGA acceleration of reference-based compression for genomic data. We develop a new mapping algorithm based on the FM-index search operation which includes optimisations targeting the compression ratio and speed. Our hardware design is implemented on a Maxeler MPC-X2000 node comprising 8 Altera Stratix V FPGAs. When evaluated against compression tools currently available, our tool achieves a superior compression ratio, compression time, and energy consumption for both FASTA and FASTQ formats. For example, our tool achieves a 30% higher compression ratio and is 71.9 times faster than the fastqz tool

Crossref

Spiral - Imperial College Digital Repository

FPGA Acceleration of Reference-Based Compression for Genomic Data

Author: James Arram
Moritz Pflanzer
Thomas Kaplan
Wayne Luk
Publication venue
Publication date: 23/04/2020
Field of study

Abstract-One of the key challenges facing genomics today is efficiently storing the massive amounts of data generated by nextgeneration sequencing platforms. Reference-based compression is a popular strategy for reducing the size of genomic data, whereby sequence information is encoded as a mapping to a known reference sequence. Determining the mapping is a computationally intensive problem, and is the bottleneck of most referencebased compression tools currently available. This paper presents the first FPGA acceleration of reference-based compression for genomic data. We develop a new mapping algorithm based on the FM-index search operation which includes optimisations targeting the compression ratio and speed. Our hardware design is implemented on a Maxeler MPC-X2000 node comprising 8 Altera Stratix V FPGAs. When evaluated against compression tools currently available, our tool achieves a superior compression ratio, compression time, and energy consumption for both FASTA and FASTQ formats. For example, our tool achieves a 30% higher compression ratio and is 71.9 times faster than the fastqz tool

CiteSeerX

Multithreaded FPGA acceleration of DNA sequence mapping

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Multithreaded FPGA Acceleration of DNA Sequence Mapping

Author: Edward B. Fern
Jason Villarreal
Walid A. Najjar
Publication venue
Publication date
Field of study

Abstract—In bioinformatics, short read alignment is a computationally intensive operation that involves matching millions of short strings (called reads) against a reference genome. At the time of writing, a representative run requires to match tens of millions of reads of length of about 100 symbols against a genome that can consists of a few billion characters. Existing short read aligners are expected to report all the occurrences of each read as well as allow users to control the number of allowed mismatches between reads and reference genome. Popular software implementations such as Bowtie [8] or BWA [10] can take many hours or days to execute, making the problem an ideal candidate for hardware acceleration. In this paper, we describe FHAST (FPGA Hardware Accelerated Sequencing-matching Tool), a hardware accelerator that acts as a drop-in replacement for short read alignment software. Our architecture masks memory latency by executing many concurrent hardware threads accessing memory simultaneously and consists of multiple parallel engines to exploit the parallelism available to us on an FPGA. We have implemented and tested FHAST on the Convey HC-1 [9], taking advantage of the large amount of memory bandwidth available to the system and the shared memory image between hardware and software. By comparing the performance of FHAST against Bowtie on the Convey HC-1 we observed up to ~70X improvement in total endto-end execution time, reducing runs that take several hours to a few minutes. We also favorably compare the rate of growth when expanding FHAST to utilize multiple FPGAs against multiple CPUs in Bowtie. Index Terms—bioinformatics, short read matching, hardware acceleration, FPGA, multithreaded. I

CiteSeerX