71 research outputs found

    High performance computing with FPGAs

    Get PDF
    Field-programmable gate arrays represent an army of logical units which can be organized in a highly parallel or pipelined fashion to implement an algorithm in hardware. The flexibility of this new medium creates new challenges to find the right processing paradigm which takes into account of the natural constraints of FPGAs: clock frequency, memory footprint and communication bandwidth. In this paper first use of FPGAs as a multiprocessor on a chip or its use as a highly functional coprocessor are compared, and the programming tools for hardware/software codesign are discussed. Next a number of techniques are presented to maximize the parallelism and optimize the data locality in nested loops. This includes unimodular transformations, data locality improving loop transformations and use of smart buffers. Finally, the use of these techniques on a number of examples is demonstrated. The results in the paper and in the literature show that, with the proper programming tool set, FPGAs can speedup computation kernels significantly with respect to traditional processors

    An empirical evaluation of High-Level Synthesis languages and tools for database acceleration

    Get PDF
    High Level Synthesis (HLS) languages and tools are emerging as the most promising technique to make FPGAs more accessible to software developers. Nevertheless, picking the most suitable HLS for a certain class of algorithms depends on requirements such as area and throughput, as well as on programmer experience. In this paper, we explore the different trade-offs present when using a representative set of HLS tools in the context of Database Management Systems (DBMS) acceleration. More specifically, we conduct an empirical analysis of four representative frameworks (Bluespec SystemVerilog, Altera OpenCL, LegUp and Chisel) that we utilize to accelerate commonly-used database algorithms such as sorting, the median operator, and hash joins. Through our implementation experience and empirical results for database acceleration, we conclude that the selection of the most suitable HLS depends on a set of orthogonal characteristics, which we highlight for each HLS framework.Peer ReviewedPostprint (author’s final draft

    A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment

    Get PDF

    A framework for automatically generating optimized digital designs from C-language loops

    Get PDF
    Reconfigurable computing has the potential for providing significant performance increases to a number of computing applications. However, realizing these benefits requires digital design experience and knowledge of hardware description languages (HDLs). While a number of tools have focused on translation of high-level languages (HLLs) to HDLs, the tools do not always create optimized digital designs that are competitive with hand-coded solutions. This work describes an automatic optimization in the C-to-HDL transformation that reorganizes operations between pipeline stages in order to reduce critical path lengths. The effects of this optimization are examined on the MD5, SHA-1, and Smith-Waterman algorithms. Results show that the optimization results in performance gains of 13%-37% and that the automatically-generated implementations perform comparably to hand-coded implementations

    A framework for automatically generating optimized digital designs from C-language loops

    Get PDF
    Reconfigurable computing has the potential for providing significant performance increases to a number of computing applications. However, realizing these benefits requires digital design experience and knowledge of hardware description languages (HDLs). While a number of tools have focused on translation of high-level languages (HLLs) to HDLs, the tools do not always create optimized digital designs that are competitive with hand-coded solutions. This work describes an automatic optimization in the C-to-HDL transformation that reorganizes operations between pipeline stages in order to reduce critical path lengths. The effects of this optimization are examined on the MD5, SHA-1, and Smith-Waterman algorithms. Results show that the optimization results in performance gains of 13%-37% and that the automatically-generated implementations perform comparably to hand-coded implementations

    Run-time reconfigurable acceleration for genetic programming fitness evaluation in trading strategies

    Get PDF
    Genetic programming can be used to identify complex patterns in financial markets which may lead to more advanced trading strategies. However, the computationally intensive nature of genetic programming makes it difficult to apply to real world problems, particularly in real-time constrained scenarios. In this work we propose the use of Field Programmable Gate Array technology to accelerate the fitness evaluation step, one of the most computationally demanding operations in genetic programming. We propose to develop a fully-pipelined, mixed precision design using run-time reconfiguration to accelerate fitness evaluation. We show that run-time reconfiguration can reduce resource consumption by a factor of 2 compared to previous solutions on certain configurations. The proposed design is up to 22 times faster than an optimised, multithreaded software implementation while achieving comparable financial returns

    Intelligent systems engineering with reconfigurable computing

    Get PDF
    Intelligent computing systems comprising microprocessor cores, memory and reconfigurable user-programmable logic represent a promising technology which is well-suited for applications such as digital signal and image processing, cryptography and encryption, etc. These applications employ frequently recursive algorithms which are particularly appropriate when the underlying problem is defined in recursive terms and it is difficult to reformulate it as an iterative procedure. It is known, however, that hardware description languages (such as VHDL) as well as system-level specification languages (such as Handel-C) that are usually employed for specifying the required functionality of reconfigurable systems do not provide a direct support for recursion. In this paper a method allowing recursive algorithms to be easily described in Handel-C and implemented in an FPGA (field-programmable gate array) is proposed. The recursive search algorithm for the knapsack problem is considered as an exampleApplications in Artificial Intelligence - Knowledge EngineeringRed de Universidades con Carreras en InformĂĄtica (RedUNCI

    Hardware acceleration of the trace transform for vision applications

    Get PDF
    Computer Vision is a rapidly developing field in which machines process visual data to extract meaningful information. Digitised images in their pixels and bits serve no purpose of their own. It is only by interpreting the data, and extracting higher level information that a scene can be understood. The algorithms that enable this process are often complex, and data-intensive, limiting the processing rate when implemented in software. Hardware-accelerated implementations provide a significant performance boost that can enable real- time processing. The Trace Transform is a newly proposed algorithm that has been proven effective in image categorisation and recognition tasks. It is flexibly defined allowing the mathematical details to be tailored to the target application. However, it is highly computationally intensive, which limits its applications. Modern heterogeneous FPGAs provide an ideal platform for accelerating the Trace transform for real-time performance, while also allowing an element of flexibility, which highly suits the generality of the Trace transform. This thesis details the implementation of an extensible Trace transform architecture for vision applications, before extending this architecture to a full flexible platform suited to the exploration of Trace transform applications. As part of the work presented, a general set of architectures for large-windowed median and weighted median filters are presented as required for a number of Trace transform implementations. Finally an acceleration of Pseudo 2-Dimensional Hidden Markov Model decoding, usable in a person detection system, is presented. Such a system can be used to extract frames of interest from a video sequence, to be subsequently processed by the Trace transform. All these architectures emphasise the need for considered, platform-driven design in achieving maximum performance through hardware acceleration

    Automated Generating of Processing Elements for FPGA

    Get PDF
    NěkterĂ© aplikace zpracovĂĄvajĂ­cĂ­ informace, jako je napƙíklad monitorovĂĄnĂ­ počítačovĂœch sĂ­tĂ­, vyĆŸadujĂ­ nepƙetrĆŸitĂ© zpracovĂĄvĂĄnĂ­ dat pƙichĂĄzejĂ­cĂ­ch vysokou rychlostĂ­. S tĂ­m, jak tato rychlost vĂœvojem stĂĄle stoupĂĄ, je ĆŸĂĄdoucĂ­, aby bylo zpracovĂĄvĂĄnĂ­ dat provĂĄděno pomocĂ­ hardwarovĂ© implementace. Tato prĂĄce navrhuje konfiguračnĂ­ systĂ©m transformujĂ­cĂ­ uĆŸivatelem poskytnutou definici procesnĂ­ch funkcĂ­ na VHDL definici hardwarovĂ© implementace těchto funkcĂ­. SystĂ©m je zaměƙen na monitorovĂĄnĂ­ sĂ­Ć„ovĂ©ho provozu ve vysokorychlostnĂ­ch sĂ­tĂ­ch.Some information processing applications, such as computer networks monitoring, need to continuously perform processing of rapidly incoming data. As the speed of the incoming data increases, it is desirable to perform the processing in the hardware. This work proposes a configuration system that generates a VHDL specification of a hardware data processing circuit based on a user-provided definition of data and computation operations. The system focuses on network traffic monitoring in multi-gigabit computer networks.
    • 

    corecore