1,590 research outputs found

    A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment

    Get PDF

    Empowering parallel computing with field programmable gate arrays

    Get PDF
    After more than 30 years, reconïŹgurable computing has grown from a concept to a mature ïŹeld of science and technology. The cornerstone of this evolution is the ïŹeld programmable gate array, a building block enabling the conïŹguration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural reïŹnements

    The use of field-programmable gate arrays for the hardware acceleration of design automation tasks

    Get PDF
    This paper investigates the possibility of using Field-Programmable Gate Arrays (Fr’GAS) as reconfigurable co-processors for workstations to produce moderate speedups for most tasks in the design process, resulting in a worthwhile overall design process speedup at low cost and allowing algorithm upgrades with no hardware modification. The use of FPGAS as hardware accelerators is reviewed and then achievable speedups are predicted for logic simulation and VLSI design rule checking tasks for various FPGA co-processor arrangements

    Aggregation of Descriptive Regularization Methods with Hardware/Software Co-Design for Remote Sensing Imaging

    Get PDF
    This study consider the problem of high-resolution imaging of the remote sensing (RS) environment formalized in terms of a nonlinear ill- posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) of the wavefield scattered from an extended remotely sensed scene (referred to as the scene image). However, the remote sensing techniques for reconstructive imaging in many RS application areas are relatively unacceptable for being implemented in a (near) real time implementation. In this work, we address a new aggregated descriptive-regularization (DR) method and the Hardware/Software (HW/SW) co-design for the SSP reconstruction from the uncertain speckle-corrupted measurement data in a computationally efficient parallel fashion that meets the (near) real time image processing requirements. The hardware design is performed via efficient systolic arrays (SAs). Finally, the efficiency both in resolution enhancement and in computational complexity reduction metrics of the aggregated descriptive-regularized and the HW/SW co-design method is presented via numerical simulations and by the performance analysis of the implementation based on a Xilinx Field Programmable Gate Array (FPGA) XC4VSX35-10ff668.Universidad de GuadalajaraUniversidad AutĂłnoma de YucatĂĄnInstituto TecnolĂłgico de MĂ©rid

    Throughput-optimal systolic arrays from recurrence equations

    Get PDF
    Many compute-bound software kernels have seen order-of-magnitude speedups on special-purpose accelerators built on specialized architectures such as field-programmable gate arrays (FPGAs). These architectures are particularly good at implementing dynamic programming algorithms that can be expressed as systems of recurrence equations, which in turn can be realized as systolic array designs. To efficiently find good realizations of an algorithm for a given hardware platform, we pursue software tools that can search the space of possible parallel array designs to optimize various design criteria. Most existing design tools in this area produce a design that is latency-space optimal. However, we instead wish to target applications that operate on a large collection of small inputs, e.g. a database of biological sequences. For such applications, overall throughput rather than latency per input is the most important measure of performance. In this work, we introduce a new procedure to optimize throughput of a systolic array subject to resource constraints, in this case the area and bandwidth constraints of an FPGA device. We show that the throughput of an array is dependent on the maximum number of lattice points executed by any processor in the array, which to a close approximation is determined solely by the array’s projection vector. We describe a bounded search process to find throughput-optimal projection vectors and a tool to perform automated design space exploration, discovering a range of array designs that are optimal for inputs of different sizes. We apply our techniques to the Nussinov RNA folding algorithm to generate multiple mappings of this algorithm into systolic arrays. By combining our library of designs with run-time reconfiguration of an FPGA device to dynamically switch among them, we predict significant speedup over a single, latency-space optimal array
    • 

    corecore