18 research outputs found

    FPGA acceleration of DNA sequence alignment: design analysis and optimization

    Get PDF
    Existing FPGA accelerators for short read mapping often fail to utilize the complete biological information in sequencing data for simple hardware design, leading to missed or incorrect alignment. In this work, we propose a runtime reconfigurable alignment pipeline that considers all information in sequencing data for the biologically accurate acceleration of short read mapping. We focus our efforts on accelerating two string matching techniques: FM-index and the Smith-Waterman algorithm with the affine-gap model which are commonly used in short read mapping. We further optimize the FPGA hardware using a design analyzer and merger to improve alignment performance. The contributions of this work are as follows. 1. We accelerate the exact-match and mismatch alignment by leveraging the FM-index technique. We optimize memory access by compressing the data structure and interleaving the access with multiple short reads. The FM-index hardware also considers complete information in the read data to maximize accuracy. 2. We propose a seed-and-extend model to accelerate alignment with indels. The FM-index hardware is extended to support the seeding stage while a Smith-Waterman implementation with the affine-gap model is developed on FPGA for the extension stage. This model can improve the efficiency of indel alignment with comparable accuracy versus state-of-the-art software. 3. We present an approach for merging multiple FPGA designs into a single hardware design, so that multiple place-and-route tasks can be replaced by a single task to speed up functional evaluation of designs. We first experiment with this approach to demonstrate its feasibility for different designs. Then we apply this approach to optimize one of the proposed FPGA aligners for better alignment performance.Open Acces

    An FPGA-based Convolution IP Core for Deep Neural Networks Acceleration

    Get PDF
    The development of machine learning has made a revolution in various applications such as object detection, image/video recognition, and semantic segmentation. Neural networks, a class of machine learning, play a crucial role in this process because of their remarkable improvement over traditional algorithms. However, neural networks are now going deeper and cost a significant amount of computation operations. Therefore they usually work ineffectively in edge devices that have limited resources and low performance. In this paper, we research a solution to accelerate the neural network inference phase using FPGA-based platforms. We analyze neural network models, their mathematical operations, and the inference phase in various platforms. We also profile the characteristics that affect the performance of neural network inference. Based on the analysis, we propose an architecture to accelerate the convolution operation used in most neural networks and takes up most of the computations in networks in terms of parallelism, data reuse, and memory management. We conduct different experiments to validate the FPGA-based convolution core architecture as well as to compare performance. Experimental results show that the core is platform-independent. The core outperforms a quad-core ARM processor functioning at 1.2 GHz and a 6-core Intel CPU with speed-ups of up to 15.69× and 2.78×, respectivel

    Hardware / Software System for Portable and Low-Cost Genome Assembly

    Full text link
    “The enjoyment of the highest attainable standard of health is one of the fundamental rights of every human being without distinction of race, religion, political belief, economic or social condition” [56]. Genomics (the study of the entire DNA) provides such a standard of health for people with rare diseases and helps control the spread of pandemics. Still, millions of human beings are unable to access genomics due to its cost, and portability. In genomics, DNA sequencers digitise DNA information, and computers analyse the digitised information. We have desktop and thumb-sized DNA sequencers, that digitise the DNA data rapidly. But computations necessary for the analysis of this data are inevitably performed on high-performance computers (HPCs) and cloud computers. These computations not only require powerful computers but also necessitate high-speed networks since the data generated are in the hundreds of gigabytes. Relying on HPCs and high-speed networks, deny the benefits that can be reaped by genomics for the masses who live in remote areas and in poorer nations. Developing a low-cost and portable genomics computation platform would provide personalised treatment based on an individual’s DNA and identify the source of the fast-spreading epidemics in remote areas and areas without HPC or network infrastructure. But developing a low-cost and portable genome analysing computing platform is a challenging task. This thesis develops novel computer architecture solutions to assemble the whole human DNA and COVID-19 virus RNA on a low-cost and portable platform. The first phase of the solution describes a ring-pipelined processor architecture for a key genome assembly algorithm. The human genome is partitioned to fit into the small memory footprint of embedded processors. These techniques allow an entire human genome to be assembled using highly portable and low-cost embedded processor cores. These processor cores can be housed within a single chip. Each processor was only 0.08 mm 2 and consumed just 37.5 mW. It has only 2 GB memory, 32-bit instruction width, and a clock with a 1 GHz frequency. The second phase of the solution describes how application-specific instruction-set processors can be sped up to execute a key genome assembly algorithm. A fully automated design system is presented, which improves the performance of large applications (such as genome assembly algorithm) and generates application-specific instructions for a commercial processor design tool (Xtensa). The tool enhances the base processor, which was used in the ring pipeline processor architecture. Thus, the alignment algorithms execute 2.1 times faster with only 11% additional hardware. The energy-delay product was reduced by 7.3× compared to the base processor. This tool is the only one of its type which can handle applications which are large. The third phase of the solution designs a portable low-cost genome assembly computer (PGA). PGA enhances the ring pipeline architecture with the customised processor found in phase two and with improved inter-processor communication. The results show that the COVID-19 virus RNA can be assembled in under 10 minutes and the whole human genome can be assembled in 11 days on a portable platform (HPC take around two days) for 30× coverage. PGA has an area footprint of just 5.68 mm 2 in a 28 nm technology node and is far smaller than a high-performance computer processor chip. The PGA consumes only 4W of power, which is lower than the power requirement of a high-performance processor chip. The manufacturing cost of the PGA also would be much cheaper than the high-performance system cost, when produced in volume. The developed solution can be powered by a USB port of a laptop. This thesis is the first of its type to show the design of a single-chip solution to be able to process a complex genomic problem. This thesis contributes to attaining one of the fundamental rights of every human being wherever they may live

    ALFALFA : fast and accurate mapping of long next generation sequencing reads

    Get PDF

    FPGA acceleration of DNA sequencing analysis and storage

    No full text
    In this work we explore how Field-Programmable Gate Arrays (FPGAs) can be used to alleviate the data processing bottlenecks in DNA sequencing. We focus our efforts on accelerating the FM-index, a data structure used to solve the computationally intensive string matching problems found in DNA sequencing analysis such as short read alignment. The main contributions of this work are: 1) We accelerate the FM-index using FPGAs and develop several novel methods for reducing the memory bottleneck of the search algorithm. These methods include customising the FM-index structure according to the memory architecture of the FPGA platform and minimising the number of memory accesses through both architectural and algorithmic optimisations. 2) We present a new approach for accelerating approximate string matching using the backtracking FM-index. This approach makes use of specialised approximate string matching modules and a run-time reconfigurable architecture in order to achieve both high sensitivity and high performance. 3) We extend the FM-index search algorithm for reference-based compression and accelerate it using FPGAs. This accelerated design is integrated into fastqZip and fastaZip, two new tools that we have developed for the fast and effective compression of sequence data stored in the FASTQ and FASTA formats respectively. We implement our designs on the Maxeler Max4 Platform and show that they are able to outperform state-of-the-art DNA sequencing analysis software. For instance, our hardware-accelerated compression tool for FASTQ data is able to achieve a higher compression ratio than the best performing tool, fastqz, whilst the average compression and decompression speeds are 25 and 43 times faster respectively.Open Acces

    Genome sequence alignment in processing-In-memory architectures

    Get PDF
    Finalmente, también realizamos un estudio experimental de varias arquitecturas con diferentes tecnologías de memoria (DDR y HBM) y núcleos de procesamiento de distintos tipos, explotando, en algunos casos, procesamiento en la memoria (PIM). La aplicación de referencia es Bowtie2, una aplicación completa para el alineamiento de secuencias en el genoma. La implementación y evaluación de estas arquitecturas se realiza utilizando un simulador arquitectural basado en gem5.La combinación de la aparición de un cuello de botella en el acceso a los datos y la creciente importancia de las aplicaciones de procesamiento intensivo de datos, muy limitadas por el sistema de memoria, crea un importante problema que debe ser abordado. Por ello, en esta tesis nos proponemos afrontar este problema e intentar reducir su efecto en la medida de lo posible. El principal objetivo de esta tesis es el diseño de nuevas soluciones arquitecturales y algorítmicas para superar el problema del cuello de botella conocido como memory-wall y mejorar el rendimiento de aplicaciones con gran uso de memoria que no son capaces de beneficiarse lo suficiente de las jerarquías de memoria actuales. Además, creemos que es esencial centrarse en la eficiencia energética, un factor cuya importancia crece cada día y uno de los factores más limitantes en la computación de alto rendimiento. Las principales contribuciones de esta tesis son: Primero, analizamos el comportamiento de aplicaciones con accesos de memoria aleatorios, que no aprovechan correctamente las nuevas arquitecturas de memoria con jerarquías cache profundas. Específicamente, analizamos la estructura de datos FM-index y un algoritmo de búsqueda de secuencias basado en esa estructura, ampliamente usado en el alineamiento de secuencias en el genoma. Después de este análisis y de obtener un conocimiento más detallado del cuello de botella de la memoria, proponemos una nueva versión de FM-index que permite reducir el consumo de ancho de banda de memoria, de forma que mejora significativamente el rendimiento computacional. Posteriormente, proponemos una nueva arquitectura energéticamente eficiente, basada en un cubo de memoria en 3D (3D-Stacked) al que añadimos unos núcleos sencillos de bajo consumo en su capa lógica. Esta arquitectura permite la ejecución cerca de los datos (near-data-processing
    corecore