17,940 research outputs found

    High performance computing with FPGAs

    Get PDF
    Field-programmable gate arrays represent an army of logical units which can be organized in a highly parallel or pipelined fashion to implement an algorithm in hardware. The flexibility of this new medium creates new challenges to find the right processing paradigm which takes into account of the natural constraints of FPGAs: clock frequency, memory footprint and communication bandwidth. In this paper first use of FPGAs as a multiprocessor on a chip or its use as a highly functional coprocessor are compared, and the programming tools for hardware/software codesign are discussed. Next a number of techniques are presented to maximize the parallelism and optimize the data locality in nested loops. This includes unimodular transformations, data locality improving loop transformations and use of smart buffers. Finally, the use of these techniques on a number of examples is demonstrated. The results in the paper and in the literature show that, with the proper programming tool set, FPGAs can speedup computation kernels significantly with respect to traditional processors

    ParaFPGA : parallel computing with flexible hardware

    Get PDF
    ParaFPGA 2009 is a Mini-Symposium on parallel computing with field programmable gate arrays (FPGAs), held in conjunction with the ParCo conference on parallel computing. FPGAs allow to map an algorithm directly onto the hardware, optimize the architecture for parallel execution, and dynamically reconfigure the system in between different phases of the computation. Compared to e.g. Cell processors, GPGPU's (general-purpose GPU's) and other high-performance devices, FPGAs are considered as flexible hardware in the sense that the building blocks of one or more single or multiple FPGAs can be interconnected freely to build a highly parallel system. In this Mini-Symposium the following topics are addressed: clustering FPGAs, evolvable hardware using FPGAs and fast dynamic reconfiguration

    Performance evaluation of FPGA implementations of high-speed addition algorithms

    Get PDF
    Driven by the excellent properties of FPGAs and the need for high-performance and flexible computing machines, interest in FPGA-based computing machines has increased dramatically. Fixed-point adders are essential building blocks of any computing systems. In this work, various high-speed addition algorithms are implemented in FPGAs devices, and their performance is evaluated with the objective of finding and developing the most appropriate addition algorithms for implementing in FPGAs, and laying the ground-work for evaluating and constructing FPGA-based computing machines. The results demonstrate that the performance of adders built with the FPGAs dedicated carry logic combined with some other addition algorithms will be greatly improved, especially for larger adders.published_or_final_versio

    High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP

    Get PDF
    This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM’s Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs

    ParaFPGA 2013: Harnessing Programs, Power and Performance in Parallel FPGA applications

    Get PDF
    Future computing systems will require dedicated accelerators to achieve high-performance. The mini-symposium ParaFPGA explores parallel computing with FPGAs as an interesting avenue to reduce the gap between the architecture and the application. Topics discussed are the power of functional and dataflow languages, the performance of high-level synthesis tools, the automatic creation of hardware multi-cores using C-slow retiming, dynamic power management to control the energy consumption, real-time reconfiguration of streaming image processing filters and memory optimized event image segmentation

    ParaFPGA 2011 : high performance computing with multiple FPGAs : design, methodology and applications

    Get PDF
    ParaFPGA 2011 marks the third mini-symposium devoted to the methodology, design and implementation of parallel applications using FPGAs. The focus of the contributions is mainly on organizing parallel applications in multiple FPGAs. This includes experiences from building a supercomputer with FPGAs, automatic and dedicated balancing of different tasks on heterogeneous FPGA constellations and designing optimal interconnects between collaborating FPGAs

    High Performance Reconfigurable Computing for Linear Algebra: Design and Performance Analysis

    Get PDF
    Field Programmable Gate Arrays (FPGAs) enable powerful performance acceleration for scientific computations because of their intrinsic parallelism, pipeline ability, and flexible architecture. This dissertation explores the computational power of FPGAs for an important scientific application: linear algebra. First of all, optimized linear algebra subroutines are presented based on enhancements to both algorithms and hardware architectures. Compared to microprocessors, these routines achieve significant speedup. Second, computing with mixed-precision data on FPGAs is proposed for higher performance. Experimental analysis shows that mixed-precision algorithms on FPGAs can achieve the high performance of using lower-precision data while keeping higher-precision accuracy for finding solutions of linear equations. Third, an execution time model is built for reconfigurable computers (RC), which plays an important role in performance analysis and optimal resource utilization of FPGAs. The accuracy and efficiency of parallel computing performance models often depend on mean maximum computations. Despite significant prior work, there have been no sufficient mathematical tools for this important calculation. This work presents an Effective Mean Maximum Approximation method, which is more general, accurate, and efficient than previous methods. Together, these research results help address how to make linear algebra applications perform better on high performance reconfigurable computing architectures

    A Study of Reconfigurable Accelerators for Cloud Computing

    Get PDF
    Due to the exponential increase in network traffic in the data centers, thousands of servers interconnected with high bandwidth switches are required. Field Programmable Gate Arrays (FPGAs) with Cloud ecosystem offer high performance in efficiency and energy, making them active resources, easy to program and reconfigure. This paper looks at FPGAs as reconfigurable accelerators for the cloud computing presents the main hardware accelerators that have been presented in various widely used cloud computing applications such as: MapReduce, Spark, Memcached, Databases
    corecore