96 research outputs found
FAssem : FPGA based Acceleration of De Novo Genome Assembly
International audienceNext generation sequencing technologies produce large amounts of data at very low cost. They produce short reads of DNA fragments. These fragments have many overlaps, lots of repeats and may also include sequencing errors. The assembly process involves merging these sequences to form the original sequences. In recent years many software programs have been developed for this purpose. All of them take significant amount of time to execute. Velvet is a commonly used de novo assembly program. We propose a method to reduce the overall time for assembly by using pre-processing of the short read data on FPGAs and processing its output using Velvet. We show significant speed-ups with slight or no compromise on the quality of the assembled output
Genomic co-processor for long read assembly
Genomics data is transforming medicine and our understanding of life in fundamental ways; however, it is far outpacing Moore's Law. Third-generation sequencing technologies produce 100X longer reads than second generation technologies and reveal a much broader mutation spectrum of disease and evolution. However, these technologies incur prohibitively high computational costs. In order to enable the vast potential of exponentially growing genomics data, domain specific acceleration provides one of the few remaining approaches to continue to scale compute performance and efficiency, since general-purpose architectures are struggling to handle the huge amount of data needed for genome alignment. The aim of this project is to implement a genomic-coprocessor targeting HPC FPGAs starting from the Darwin FPGA co-processor. In this scenario, the final objective is the simulation and implementation of the algorithms described by Darwin using Alveo boards, exploiting High Bandwidth Memory (HBM) to increase its performance
ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis
Profile hidden Markov models (pHMMs) are widely employed in various
bioinformatics applications to identify similarities between biological
sequences, such as DNA or protein sequences. In pHMMs, sequences are
represented as graph structures. These probabilities are subsequently used to
compute the similarity score between a sequence and a pHMM graph. The
Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these
probabilities to optimize and compute similarity scores. However, the
Baum-Welch algorithm is computationally intensive, and existing solutions offer
either software-only or hardware-only approaches with fixed pHMM designs. We
identify an urgent need for a flexible, high-performance, and energy-efficient
HW/SW co-design to address the major inefficiencies in the Baum-Welch algorithm
for pHMMs.
We introduce ApHMM, the first flexible acceleration framework designed to
significantly reduce both computational and energy overheads associated with
the Baum-Welch algorithm for pHMMs. ApHMM tackles the major inefficiencies in
the Baum-Welch algorithm by 1) designing flexible hardware to accommodate
various pHMM designs, 2) exploiting predictable data dependency patterns
through on-chip memory with memoization techniques, 3) rapidly filtering out
negligible computations using a hardware-based filter, and 4) minimizing
redundant computations.
ApHMM achieves substantial speedups of 15.55x - 260.03x, 1.83x - 5.34x, and
27.97x when compared to CPU, GPU, and FPGA implementations of the Baum-Welch
algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations
in three key bioinformatics applications: 1) error correction, 2) protein
family search, and 3) multiple sequence alignment, by 1.29x - 59.94x, 1.03x -
1.75x, and 1.03x - 1.95x, respectively, while improving their energy efficiency
by 64.24x - 115.46x, 1.75x, 1.96x.Comment: Accepted to ACM TAC
Reconfigurable acceleration of genetic sequence alignment: A survey of two decades of efforts
Genetic sequence alignment has always been a computational challenge in bioinformatics. Depending on the problem size, software-based aligners can take multiple CPU-days to process the sequence data, creating a bottleneck point in bioinformatic analysis flow. Reconfigurable accelerator can achieve high performance for such computation by providing massive parallelism, but at the expense of programming flexibility and thus has not been commensurately used by practitioners. Therefore, this paper aims to provide a thorough survey of the proposed accelerators by giving a qualitative categorization based on their algorithms and speedup. A comprehensive comparison between work is also presented so as to guide selection for biologist, and to provide insight on future research direction for FPGA scientists
FPGA acceleration of DNA sequencing analysis and storage
In this work we explore how Field-Programmable Gate Arrays (FPGAs) can be used to alleviate the data processing bottlenecks in DNA sequencing. We focus our efforts on accelerating the FM-index, a data structure used to solve the computationally intensive string matching problems found in DNA sequencing analysis such as short read alignment. The main contributions of this work are:
1) We accelerate the FM-index using FPGAs and develop several novel methods for reducing the memory bottleneck of the search algorithm. These methods include customising the FM-index structure according to the memory architecture of the FPGA platform and minimising the number of memory accesses through both architectural and algorithmic optimisations.
2) We present a new approach for accelerating approximate string matching using the backtracking FM-index. This approach makes use of specialised approximate string matching modules and a run-time reconfigurable architecture in order to achieve both high sensitivity and high performance.
3) We extend the FM-index search algorithm for reference-based compression and accelerate it using FPGAs. This accelerated design is integrated into fastqZip and fastaZip, two new tools that we have developed for the fast and effective compression of sequence data stored in the FASTQ and FASTA formats respectively.
We implement our designs on the Maxeler Max4 Platform and show that they are able to outperform state-of-the-art DNA sequencing analysis software. For instance, our hardware-accelerated compression tool for FASTQ data is able to achieve a higher compression ratio than the best performing tool, fastqz, whilst the average compression and decompression speeds are 25 and 43 times faster respectively.Open Acces
- …