268 research outputs found

    Fast and compact evolvable systolic arrays on dynamically reconfigurable FPGAs

    Get PDF
    Evolvable hardware may be considered as the result of a design methodology that employs an evolutionary algorithm to find an optimal solution to a given problem in the form of a digital circuit. Evolutionary algorithms typically require testing thousands of candidate solutions, taking long time to complete. It would be desirable to reduce this time to a few seconds for applications that require a fast adaptation to a problem. Also, it is important to consider architectures that may operate at high clock speeds in order to reach very speed-demanding situations. This paper presents an implementation on an FPGA of an evolvable hardware image filter based on a systolic array architecture that uses dynamic partial reconfiguration in order to change between different candidate solutions. The neighbor to neighbor connections of the array offer improved performance versus other approaches, like Cartesian Genetic Programming derived circuits. Time savings due to faster evaluation compensate the slower reconfiguration time compared with virtual reconfiguration approaches, but, at any rate, reconfiguration time has been improved also by reducing the elements to reconfigure to just the LUT contents of the configurable blocks. The techniques presented in this paper lead to circuits that may operate at up to 500 MHz (in a Virtex-5), filtering 500 megapixels per second, the processing element size of the array is reduced to 2 CLBs, and over 80000 evaluations per second in a multiplearray structure in an FPGA permit to obtain good quality filters in around 3 seconds of evolution time

    A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment

    Get PDF

    A scalable evolvable hardware processing array

    Get PDF
    Evolvable hardware (EH) is an interesting alternative to conventional digital circuit design, since autonomous generation of solutions for a given task permits self-adaptivity of the system to changing environments, and they present inherent fault tolerance when evolution is intrinsically performed. Systems based on FPGAs that use Dynamic and Partial Reconfiguration (DPR) for evolving the circuit are an example. Also, thanks to DPR, these systems can be provided with scalability, a feature that allows a system to change the number of allocated resources at run-time in order to vary some feature, such as performance. The combination of both aspects leads to scalable evolvable hardware (SEH), which changes in size as an extra degree of freedom when trying to achieve the optimal solution by means of evolution. The main contributions of this paper are an architecture of a scalable and evolvable hardware processing array system, some preliminary evolution strategies which take scalability into consideration, and to show in the experimental results the benefits of combined evolution and scalability. A digital image filtering application is used as use case

    Interleaving in Systolic-Arrays: a Throughput Breakthrough

    Get PDF
    In past years the most common way to improve computers performance was to increase the clock frequency. In recent years this approach suffered the limits of technology scaling, therefore computers architectures are shifting toward the direction of parallel computing to further improve circuits performance. Not only GPU based architectures are spreading in consideration, but also Systolic Arrays are particularly suited for certain classes of algorithms. An important point in favor of Systolic Arrays is that, due to the regularity of their circuit layout, they are appealing when applied to many emerging and very promising technologies, like Quantum-dot Cellular Automata and nanoarrays based on Silicon NanoWire or on Carbon nanotube Field Effect Transistors. In this work we present a systematic method to improve Systolic Arrays performance exploiting Pipelining and Input Data Interleaving. We tackle the problem from a theoretical point of view first, and then we apply it to both CMOS technology and emerging technologies. On CMOS we demonstrate that it is possible to vastly improve the overall throughput of the circuit. By applying this technique to emerging technologies we show that it is possible to overcome some of their limitations greatly improving the throughput, making a considerable step forward toward the post-CMOS era

    FPGA acceleration of DNA sequence alignment: design analysis and optimization

    Get PDF
    Existing FPGA accelerators for short read mapping often fail to utilize the complete biological information in sequencing data for simple hardware design, leading to missed or incorrect alignment. In this work, we propose a runtime reconfigurable alignment pipeline that considers all information in sequencing data for the biologically accurate acceleration of short read mapping. We focus our efforts on accelerating two string matching techniques: FM-index and the Smith-Waterman algorithm with the affine-gap model which are commonly used in short read mapping. We further optimize the FPGA hardware using a design analyzer and merger to improve alignment performance. The contributions of this work are as follows. 1. We accelerate the exact-match and mismatch alignment by leveraging the FM-index technique. We optimize memory access by compressing the data structure and interleaving the access with multiple short reads. The FM-index hardware also considers complete information in the read data to maximize accuracy. 2. We propose a seed-and-extend model to accelerate alignment with indels. The FM-index hardware is extended to support the seeding stage while a Smith-Waterman implementation with the affine-gap model is developed on FPGA for the extension stage. This model can improve the efficiency of indel alignment with comparable accuracy versus state-of-the-art software. 3. We present an approach for merging multiple FPGA designs into a single hardware design, so that multiple place-and-route tasks can be replaced by a single task to speed up functional evaluation of designs. We first experiment with this approach to demonstrate its feasibility for different designs. Then we apply this approach to optimize one of the proposed FPGA aligners for better alignment performance.Open Acces

    Hexarray: A Novel Self-Reconfigurable Hardware System

    Get PDF
    Evolvable hardware (EHW) is a powerful autonomous system for adapting and finding solutions within a changing environment. EHW consists of two main components: a reconfigurable hardware core and an evolutionary algorithm. The majority of prior research focuses on improving either the reconfigurable hardware or the evolutionary algorithm in place, but not both. Thus, current implementations suffer from being application oriented and having slow reconfiguration times, low efficiencies, and less routing flexibility. In this work, a novel evolvable hardware platform is proposed that combines a novel reconfigurable hardware core and a novel evolutionary algorithm. The proposed reconfigurable hardware core is a systolic array, which is called HexArray. HexArray was constructed using processing elements with a redesigned architecture, called HexCells, which provide routing flexibility and support for hybrid reconfiguration schemes. The improved evolutionary algorithm is a genome-aware genetic algorithm (GAGA) that accelerates evolution. Guided by a fitness function the GAGA utilizes context-aware genetic operators to evolve solutions. The operators are genome-aware constrained (GAC) selection, genome-aware mutation (GAM), and genome-aware crossover (GAX). The GAC selection operator improves parallelism and reduces the redundant evaluations. The GAM operator restricts the mutation to the part of the genome that affects the selected output. The GAX operator cascades, interleaves, or parallel-recombines genomes at the cell level to generate better genomes. These operators improve evolution while not limiting the algorithm from exploring all areas of a solution space. The system was implemented on a SoC that includes a programmable logic (i.e., field-programmable gate array) to realize the HexArray and a processing system to execute the GAGA. A computationally intensive application that evolves adaptive filters for image processing was chosen as a case study and used to conduct a set of experiments to prove the developed system robustness. Through an iterative process using the genetic operators and a fitness function, the EHW system configures and adapts itself to evolve fitter solutions. In a relatively short time (e.g., seconds), HexArray is able to evolve autonomously to the desired filter. By exploiting the routing flexibility in the HexArray architecture, the EHW has a simple yet effective mechanism to detect and tolerate faulty cells, which improves system reliability. Finally, a mechanism that accelerates the evolution process by hiding the reconfiguration time in an “evolve-while-reconfigure” process is presented. In this process, the GAGA utilizes the array routing flexibility to bypass cells that are being configured and evaluates several genomes in parallel

    High performance reconfigurable architectures for biological sequence alignment

    Get PDF
    Bioinformatics and computational biology (BCB) is a rapidly developing multidisciplinary field which encompasses a wide range of domains, including genomic sequence alignments. It is a fundamental tool in molecular biology in searching for homology between sequences. Sequence alignments are currently gaining close attention due to their great impact on the quality aspects of life such as facilitating early disease diagnosis, identifying the characteristics of a newly discovered sequence, and drug engineering. With the vast growth of genomic data, searching for a sequence homology over huge databases (often measured in gigabytes) is unable to produce results within a realistic time, hence the need for acceleration. Since the exponential increase of biological databases as a result of the human genome project (HGP), supercomputers and other parallel architectures such as the special purpose Very Large Scale Integration (VLSI) chip, Graphic Processing Unit (GPUs) and Field Programmable Gate Arrays (FPGAs) have become popular acceleration platforms. Nevertheless, there are always trade-off between area, speed, power, cost, development time and reusability when selecting an acceleration platform. FPGAs generally offer more flexibility, higher performance and lower overheads. However, they suffer from a relatively low level programming model as compared with off-the-shelf microprocessors such as standard microprocessors and GPUs. Due to the aforementioned limitations, the need has arisen for optimized FPGA core implementations which are crucial for this technology to become viable in high performance computing (HPC). This research proposes the use of state-of-the-art reprogrammable system-on-chip technology on FPGAs to accelerate three widely-used sequence alignment algorithms; the Smith-Waterman with affine gap penalty algorithm, the profile hidden Markov model (HMM) algorithm and the Basic Local Alignment Search Tool (BLAST) algorithm. The three novel aspects of this research are firstly that the algorithms are designed and implemented in hardware, with each core achieving the highest performance compared to the state-of-the-art. Secondly, an efficient scheduling strategy based on the double buffering technique is adopted into the hardware architectures. Here, when the alignment matrix computation task is overlapped with the PE configuration in a folded systolic array, the overall throughput of the core is significantly increased. This is due to the bound PE configuration time and the parallel PE configuration approach irrespective of the number of PEs in a systolic array. In addition, the use of only two configuration elements in the PE optimizes hardware resources and enables the scalability of PE systolic arrays without relying on restricted onboard memory resources. Finally, a new performance metric is devised, which facilitates the effective comparison of design performance between different FPGA devices and families. The normalized performance indicator (speed-up per area per process technology) takes out advantages of the area and lithography technology of any FPGA resulting in fairer comparisons. The cores have been designed using Verilog HDL and prototyped on the Alpha Data ADM-XRC-5LX card with the Virtex-5 XC5VLX110-3FF1153 FPGA. The implementation results show that the proposed architectures achieved giga cell updates per second (GCUPS) performances of 26.8, 29.5 and 24.2 respectively for the acceleration of the Smith-Waterman with affine gap penalty algorithm, the profile HMM algorithm and the BLAST algorithm. In terms of speed-up improvements, comparisons were made on performance of the designed cores against their corresponding software and the reported FPGA implementations. In the case of comparison with equivalent software execution, acceleration of the optimal alignment algorithm in hardware yielded an average speed-up of 269x as compared to the SSEARCH 35 software. For the profile HMM-based sequence alignment, the designed core achieved speed-up of 103x and 8.3x against the HMMER 2.0 and the latest version of HMMER (version 3.0) respectively. On the other hand, the implementation of the gapped BLAST with the two-hit method in hardware achieved a greater than tenfold speed-up compared to the latest NCBI BLAST software. In terms of comparison against other reported FPGA implementations, the proposed normalized performance indicator was used to evaluate the designed architectures fairly. The results showed that the first architecture achieved more than 50 percent improvement, while acceleration of the profile HMM sequence alignment in hardware gained a normalized speed-up of 1.34. In the case of the gapped BLAST with the two-hit method, the designed core achieved 11x speed-up after taking out advantages of the Virtex-5 FPGA. In addition, further analysis was conducted in terms of cost and power performances; it was noted that, the core achieved 0.46 MCUPS per dollar spent and 958.1 MCUPS per watt. This shows that FPGAs can be an attractive platform for high performance computation with advantages of smaller area footprint as well as represent economic ‘green’ solution compared to the other acceleration platforms. Higher throughput can be achieved by redeploying the cores on newer, bigger and faster FPGAs with minimal design effort
    • 

    corecore