1,970 research outputs found

    High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP

    Get PDF
    This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM’s Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs

    ParaFPGA15 : exploring threads and trends in programmable hardware

    Get PDF
    The symposium ParaFPGA focuses on parallel techniques using FPGAs as accelerator in high performance computing. The green computing aspects of low power consumption at high performance were somewhat tempered by long design cycles and hard programmability issues. However, in recent years FPGAs have become new contenders as versatile compute accelerators because of a growing market interest, extended application domains and maturing high-level synthesis tools. The keynote paper highlights the historical and modern approaches to high-level FPGA and the contributions cover applications such as NP-complete satisfiability problems and convex hull image processing as well as performance evaluation, partial reconfiguration and systematic design exploration

    Weighing up the new kid on the block: Impressions of using Vitis for HPC software development

    Get PDF
    The use of reconfigurable computing, and FPGAs in particular, has strong potential in the field of High Performance Computing (HPC). However the traditionally high barrier to entry when it comes to programming this technology has, until now, precluded widespread adoption. To popularise reconfigurable computing with communities such as HPC, Xilinx have recently released the first version of Vitis, a platform aimed at making the programming of FPGAs much more a question of software development rather than hardware design. However a key question is how well this technology fulfils the aim, and whether the tooling is mature enough such that software developers using FPGAs to accelerate their codes is now a more realistic proposition, or whether it simply increases the convenience for existing experts. To examine this question we use the Himeno benchmark as a vehicle for exploring the Vitis platform for building, executing and optimising HPC codes, describing the different steps and potential pitfalls of the technology. The outcome of this exploration is a demonstration that, whilst Vitis is an excellent step forwards and significantly lowers the barrier to entry in developing codes for FPGAs, it is not a silver bullet and an underlying understanding of dataflow style algorithmic design and appreciation of the architecture is still key to obtaining good performance on reconfigurable architectures.Comment: Pre-print of Weighing up the new kid on the block: Impressions of using Vitis for HPC software development, paper in 30th International Conference on Field Programmable Logic and Application

    It's all about data movement: Optimising FPGA data access to boost performance

    Get PDF
    The use of reconfigurable computing, and FPGAs in particular, to accelerate computational kernels has the potential to be of great benefit to scientific codes and the HPC community in general. However, whilst recent advanced in FPGA tooling have made the physical act of programming reconfigurable architectures much more accessible, in order to gain good performance the entire algorithm must be rethought and recast in a dataflow style. Reducing the cost of data movement for all computing devices is critically important, and in this paper we explore the most appropriate techniques for FPGAs. We do this by describing the optimisation of an existing FPGA implementation of an atmospheric model's advection scheme. By taking an FPGA code that was over four times slower than running on the CPU, mainly due to data movement overhead, we describe the profiling and optimisation strategies adopted to significantly reduce the runtime and bring the performance of our FPGA kernels to a much more practical level for real-world use. The result of this work is a set of techniques, steps, and lessons learnt that we have found significantly improves the performance of FPGA based HPC codes and that others can adopt in their own codes to achieve similar results.Comment: Preprint of article in 2019 IEEE/ACM International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC

    Application specific dataflow machine construction for programming FPGAs via Lucent

    Get PDF
    Field Programmable Gate Arrays (FPGAs) have the potential to accelerate specific HPC codes. However even with the advent of High Level Synthesis (HLS), which enables FPGA programmers to write code in C or C++, programming such devices still requires considerable expertise. Much of this is due to the fact that these architectures are founded on dataflow rather than the Von Neumann abstraction of CPUs or GPUs. Thus programming FPGAs via imperative languages is not optimal and can result in very significant performance differences between the first and final versions of algorithms on dataflow architectures with the steps in between often not obvious and requiring considerable expertise. In this position paper we argue that languages built upon dataflow principals should be exploited to enable fast by construction codes for FPGAs, and this is akin to the programmer adopting the abstraction of developing a bespoke dataflow machine specialised for their application. It is our belief that much can be learnt from the generation of dataflow languages that gained popularity in the 1970s and 1980s around programming general purpose dataflow machines, and we introduce Lucent which is a modern derivative of Lucid, and used as a vehicle to explore this hypothesis. The idea behind Lucent is to provide high programmer productivity and performance for FPGAs by giving developers the most suitable language level abstractions. The focus of Lucent is very much to support the acceleration of HPC kernels, rather than the embedded electronics and circuit level, and we provide a brief overview of the language driven by examples.Comment: Accepted at the LATTE (Languages, Tools, and Techniques for Accelerator Design) ASPLOS worksho

    Proxy Circuits for Fault-Tolerant Primitive Interfacing in Reconfigurable Devices Targeting Extreme Environments

    Get PDF
    Continuous interface access to device-level primitives in reconfigurable devices in extreme environments is key to reliable operation. However, it is possible for a primitive's interface controller, which is static to be rendered non-operational by a permanent damage in the controller's circuitry. In order to mitigate this, this paper proposes the use of relocatable proxy circuits to provide remote interfacing capability to primitives from anywhere on a reconfigurable device. A demonstration with device register read controller shows that an improvement in fault-tolerance can be achieved

    Personal area technologies for internetworked services

    Get PDF

    Accelerating advection for atmospheric modelling on Xilinx and Intel FPGAs

    Get PDF

    Accelerating advection for atmospheric modelling on Xilinx and Intel FPGAs

    Get PDF
    Reconfigurable architectures, such as FPGAs, enable the execution of code at the electronics level, avoiding the assumptions imposed by the general purpose black-box micro-architectures of CPUs and GPUs. Such tailored execution can result in increased performance and power efficiency, and as the HPC community moves towards exascale an important question is the role such hardware technologies can play in future supercomputers. In this paper we explore the porting of the PW advection kernel, an important code component used in a variety of atmospheric simulations and accounting for around 40\% of the runtime of the popular Met Office NERC Cloud model (MONC). Building upon previous work which ported this kernel to an older generation of Xilinx FPGA, we target latest generation Xilinx Alveo U280 and Intel Stratix 10 FPGAs. Exploring the development of a dataflow design which is performance portable between vendors, we then describe implementation differences between the tool chains and compare kernel performance between FPGA hardware. This is followed by a more general performance comparison, scaling up the number of kernels on the Xilinx Alveo and Intel Stratix 10, against a 24 core Xeon Platinum Cascade Lake CPU and NVIDIA Tesla V100 GPU. When overlapping the transfer of data to and from the boards with compute, the FPGA solutions considerably outperform the CPU and, whilst falling short of the GPU in terms of performance, demonstrate power usage benefits, with the Alveo being especially power efficient. The result of this work is a comparison and set of design techniques that apply both to this specific atmospheric advection kernel on Xilinx and Intel FPGAs, and that are also of interest more widely when looking to accelerate HPC codes on a variety of reconfigurable architectures.Comment: Preprint of article in the IEEE Cluster FPGA for HPC Workshop 2021 (HPC FPGA 2021
    corecore