51 research outputs found

    Smith-Waterman Protein Search with OpenCL on an FPGA

    Get PDF
    The well-known Smith-Waterman (SW) algorithm is a high-sensitivity method for local alignments. Unfortunately, SW is expensive in terms of both execution time and memory usage, which makes it impractical in many scenarios. Previous research has shown that massively parallel architectures such as GPUs and FPGAs are able to mitigate the computational problems and achieve impressive speedups. In this paper we explore SW acceleration on an FPGA with OpenCL. We efficiently exploit data and thread-level parallelism on an Altera Stratix V FPGA, obtaining up to 39 GCUPS with less than 25 watt of power consumption.Facultad de Informátic

    Smith-Waterman Protein Search with OpenCL on an FPGA

    Get PDF
    The well-known Smith-Waterman (SW) algorithm is a high-sensitivity method for local alignments. Unfortunately, SW is expensive in terms of both execution time and memory usage, which makes it impractical in many scenarios. Previous research has shown that massively parallel architectures such as GPUs and FPGAs are able to mitigate the computational problems and achieve impressive speedups. In this paper we explore SW acceleration on an FPGA with OpenCL. We efficiently exploit data and thread-level parallelism on an Altera Stratix V FPGA, obtaining up to 39 GCUPS with less than 25 watt of power consumption.Facultad de Informátic

    OSWALD: OpenCL Smith–Waterman on Altera’s FPGA for Large Protein Databases

    Get PDF
    The well-known Smith–Waterman algorithm is a high-sensitivity method for local sequence alignment. Unfortunately, the Smith–Waterman algorithm has quadratic time complexity, which makes it computationally demanding for large protein databases. In this paper, we present OSWALD, a portable, fully functional and general implementation to accelerate Smith–Waterman database searches in heterogeneous platforms based on Altera’s FPGA. OSWALD exploits OpenMP multithreading and SIMD computing through SSE and AVX2 extensions on the host while taking advantage of pipeline and vectorial parallelism by way of OpenCL on the FPGAs. Performance evaluations on two different heterogeneous architectures with real amino acid datasets show that OSWALD is competitive in comparison with other top-performing Smith–Waterman implementations, attaining up to 442 GCUPS peak with the best GCUPS/watts ratio.First published June 30, 2016. Article available in: Vol. 32, Issue 3, 2018.Facultad de Informátic

    OSWALD: OpenCL Smith–Waterman on Altera’s FPGA for Large Protein Databases

    Get PDF
    The well-known Smith–Waterman algorithm is a high-sensitivity method for local sequence alignment. Unfortunately, the Smith–Waterman algorithm has quadratic time complexity, which makes it computationally demanding for large protein databases. In this paper, we present OSWALD, a portable, fully functional and general implementation to accelerate Smith–Waterman database searches in heterogeneous platforms based on Altera’s FPGA. OSWALD exploits OpenMP multithreading and SIMD computing through SSE and AVX2 extensions on the host while taking advantage of pipeline and vectorial parallelism by way of OpenCL on the FPGAs. Performance evaluations on two different heterogeneous architectures with real amino acid datasets show that OSWALD is competitive in comparison with other top-performing Smith–Waterman implementations, attaining up to 442 GCUPS peak with the best GCUPS/watts ratio.First published June 30, 2016. Article available in: Vol. 32, Issue 3, 2018.Facultad de Informátic

    State-of-the-art in Smith-Waterman Protein Database Search on HPC Platforms

    Get PDF
    Searching biological sequence database is a common and repeated task in bioinformatics and molecular biology. The Smith–Waterman algorithm is the most accurate method for this kind of search. Unfortunately, this algorithm is computationally demanding and the situation gets worse due to the exponential growth of biological data in the last years. For that reason, the scientific community has made great efforts to accelerate Smith–Waterman biological database searches in a wide variety of hardware platforms. We give a survey of the state-of-the-art in Smith–Waterman protein database search, focusing on four hardware architectures: central processing units, graphics processing units, field programmable gate arrays and Xeon Phi coprocessors. After briefly describing each hardware platform, we analyse temporal evolution, contributions, limitations and experimental work and the results of each implementation. Additionally, as energy efficiency is becoming more important every day, we also survey performance/power consumption works. Finally, we give our view on the future of Smith–Waterman protein searches considering next generations of hardware architectures and its upcoming technologies.Instituto de Investigación en InformáticaUniversidad Complutense de Madri

    High performance reconfigurable architectures for biological sequence alignment

    Get PDF
    Bioinformatics and computational biology (BCB) is a rapidly developing multidisciplinary field which encompasses a wide range of domains, including genomic sequence alignments. It is a fundamental tool in molecular biology in searching for homology between sequences. Sequence alignments are currently gaining close attention due to their great impact on the quality aspects of life such as facilitating early disease diagnosis, identifying the characteristics of a newly discovered sequence, and drug engineering. With the vast growth of genomic data, searching for a sequence homology over huge databases (often measured in gigabytes) is unable to produce results within a realistic time, hence the need for acceleration. Since the exponential increase of biological databases as a result of the human genome project (HGP), supercomputers and other parallel architectures such as the special purpose Very Large Scale Integration (VLSI) chip, Graphic Processing Unit (GPUs) and Field Programmable Gate Arrays (FPGAs) have become popular acceleration platforms. Nevertheless, there are always trade-off between area, speed, power, cost, development time and reusability when selecting an acceleration platform. FPGAs generally offer more flexibility, higher performance and lower overheads. However, they suffer from a relatively low level programming model as compared with off-the-shelf microprocessors such as standard microprocessors and GPUs. Due to the aforementioned limitations, the need has arisen for optimized FPGA core implementations which are crucial for this technology to become viable in high performance computing (HPC). This research proposes the use of state-of-the-art reprogrammable system-on-chip technology on FPGAs to accelerate three widely-used sequence alignment algorithms; the Smith-Waterman with affine gap penalty algorithm, the profile hidden Markov model (HMM) algorithm and the Basic Local Alignment Search Tool (BLAST) algorithm. The three novel aspects of this research are firstly that the algorithms are designed and implemented in hardware, with each core achieving the highest performance compared to the state-of-the-art. Secondly, an efficient scheduling strategy based on the double buffering technique is adopted into the hardware architectures. Here, when the alignment matrix computation task is overlapped with the PE configuration in a folded systolic array, the overall throughput of the core is significantly increased. This is due to the bound PE configuration time and the parallel PE configuration approach irrespective of the number of PEs in a systolic array. In addition, the use of only two configuration elements in the PE optimizes hardware resources and enables the scalability of PE systolic arrays without relying on restricted onboard memory resources. Finally, a new performance metric is devised, which facilitates the effective comparison of design performance between different FPGA devices and families. The normalized performance indicator (speed-up per area per process technology) takes out advantages of the area and lithography technology of any FPGA resulting in fairer comparisons. The cores have been designed using Verilog HDL and prototyped on the Alpha Data ADM-XRC-5LX card with the Virtex-5 XC5VLX110-3FF1153 FPGA. The implementation results show that the proposed architectures achieved giga cell updates per second (GCUPS) performances of 26.8, 29.5 and 24.2 respectively for the acceleration of the Smith-Waterman with affine gap penalty algorithm, the profile HMM algorithm and the BLAST algorithm. In terms of speed-up improvements, comparisons were made on performance of the designed cores against their corresponding software and the reported FPGA implementations. In the case of comparison with equivalent software execution, acceleration of the optimal alignment algorithm in hardware yielded an average speed-up of 269x as compared to the SSEARCH 35 software. For the profile HMM-based sequence alignment, the designed core achieved speed-up of 103x and 8.3x against the HMMER 2.0 and the latest version of HMMER (version 3.0) respectively. On the other hand, the implementation of the gapped BLAST with the two-hit method in hardware achieved a greater than tenfold speed-up compared to the latest NCBI BLAST software. In terms of comparison against other reported FPGA implementations, the proposed normalized performance indicator was used to evaluate the designed architectures fairly. The results showed that the first architecture achieved more than 50 percent improvement, while acceleration of the profile HMM sequence alignment in hardware gained a normalized speed-up of 1.34. In the case of the gapped BLAST with the two-hit method, the designed core achieved 11x speed-up after taking out advantages of the Virtex-5 FPGA. In addition, further analysis was conducted in terms of cost and power performances; it was noted that, the core achieved 0.46 MCUPS per dollar spent and 958.1 MCUPS per watt. This shows that FPGAs can be an attractive platform for high performance computation with advantages of smaller area footprint as well as represent economic ‘green’ solution compared to the other acceleration platforms. Higher throughput can be achieved by redeploying the cores on newer, bigger and faster FPGAs with minimal design effort

    Challenges and Solutions to Next-Generation Single-Photon Imagers

    Get PDF
    Detecting and counting single photons is useful in an increasingly large number of applications. Most applications require large formats, approaching and even far exceeding 1 megapixel. In this thesis, we look at the challenges of massively parallel photon-counting cameras from all performance angles. The thesis deals with a number of performance issues that emerge when the number of pixels exceeds about 1/4 of megapixels, proposing characterization techniques and solutions to mitigate performance degradation and non-uniformity. Two cameras were created to validate the proposed techniques. The first camera, SwissSPAD, comprises an array of 512 x 128 SPAD pixels, each with a one-bit memory and a gating mechanism to achieve 5ns high precision time windows with high uniformity across the array. With a massively parallel readout of over 10 Gigabit/s and positioning of the integration time window accurate to the pico-second range, fluorescence lifetime imaging and fluorescence correlation spectroscopy imaging achieve a speedup of several orders of magnitude while ensuring high precision in the measurements. Other possible applications include wide-field time-of-flight imaging and the generation of quantum random numbers at highest bit-rates. Lately super-resolution microscopy techniques have also used SwissSPAD. The second camera, LinoSPAD, takes the concepts of SwissSPAD one step further by moving even more 'intelligence' to the FPGA and reducing the sensor complexity to the bare minimum. This allows focusing the optimization of the sensor on the most important metrics of photon efficiency and fill factor. As such, the sensor consists of one line of SPADs that have a direct connection each to the FPGA where complex photon processing algorithms can be implemented. As a demonstration of the capabilities of current lowcost FPGAs we implemented an array of time-to-digital converters that can handle up to 8.5 billion photons per second, measuring each one of them and accounting them in high precision histograms. Using simple laser diodes and a circuit to generate light pulses in the picosecond range, we demonstrate a ubiquitous 3D time-of-flight sensor. The thesis intends to be a first step towards achieving the world's first megapixel SPAD camera, which, we believe, is in grasp thanks to the architectural and circuital techniques proposed in this thesis. In addition, we believe that the applications proposed in this thesis offer a wide variety of uses of the sensors presented in this thesis and in future ones to come
    • …
    corecore