4,625 research outputs found

    A Reconfigurable Vector Instruction Processor for Accelerating a Convection Parametrization Model on FPGAs

    Full text link
    High Performance Computing (HPC) platforms allow scientists to model computationally intensive algorithms. HPC clusters increasingly use General-Purpose Graphics Processing Units (GPGPUs) as accelerators; FPGAs provide an attractive alternative to GPGPUs for use as co-processors, but they are still far from being mainstream due to a number of challenges faced when using FPGA-based platforms. Our research aims to make FPGA-based high performance computing more accessible to the scientific community. In this work we present the results of investigating the acceleration of a particular atmospheric model, Flexpart, on FPGAs. We focus on accelerating the most computationally intensive kernel from this model. The key contribution of our work is the architectural exploration we undertook to arrive at a solution that best exploits the parallelism available in the legacy code, and is also convenient to program, so that eventually the compilation of high-level legacy code to our architecture can be fully automated. We present the three different types of architecture, comparing their resource utilization and performance, and propose that an architecture where there are a number of computational cores, each built along the lines of a vector instruction processor, works best in this particular scenario, and is a promising candidate for a generic FPGA-based platform for scientific computation. We also present the results of experiments done with various configuration parameters of the proposed architecture, to show its utility in adapting to a range of scientific applications.Comment: This is an extended pre-print version of work that was presented at the international symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART2014), Sendai, Japan, June 911, 201

    A general framework for efficient FPGA implementation of matrix product

    Get PDF
    Original article can be found at: http://www.medjcn.com/ Copyright Softmotor LimitedHigh performance systems are required by the developers for fast processing of computationally intensive applications. Reconfigurable hardware devices in the form of Filed-Programmable Gate Arrays (FPGAs) have been proposed as viable system building blocks in the construction of high performance systems at an economical price. Given the importance and the use of matrix algorithms in scientific computing applications, they seem ideal candidates to harness and exploit the advantages offered by FPGAs. In this paper, a system for matrix algorithm cores generation is described. The system provides a catalog of efficient user-customizable cores, designed for FPGA implementation, ranging in three different matrix algorithm categories: (i) matrix operations, (ii) matrix transforms and (iii) matrix decomposition. The generated core can be either a general purpose or a specific application core. The methodology used in the design and implementation of two specific image processing application cores is presented. The first core is a fully pipelined matrix multiplier for colour space conversion based on distributed arithmetic principles while the second one is a parallel floating-point matrix multiplier designed for 3D affine transformations.Peer reviewe

    A portable platform for accelerated PIC codes and its application to GPUs using OpenACC

    Get PDF
    We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the basic steps of the PIC algorithm and has been designed as a test bed for different algorithmic options and data structures. Among the architectures that this engine can explore, particular attention is given here to systems equipped with GPUs. The study demonstrates that our portable PIC implementation based on the OpenACC programming model can achieve performance closely matching theoretical predictions. Using the Cray XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the one on an Intel Sandybridge 8-core CPU by a factor of 3.4

    Direct NN-body code on low-power embedded ARM GPUs

    Full text link
    This work arises on the environment of the ExaNeSt project aiming at design and development of an exascale ready supercomputer with low energy consumption profile but able to support the most demanding scientific and technical applications. The ExaNeSt compute unit consists of densely-packed low-power 64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are heterogeneous architecture where computing power is supplied both by CPUs and GPUs, and are emerging as a possible low-power and low-cost alternative to clusters based on traditional CPUs. A state-of-the-art direct NN-body code suitable for astrophysical simulations has been re-engineered in order to exploit SoC heterogeneous platforms based on ARM CPUs and embedded GPUs. Performance tests show that embedded GPUs can be effectively used to accelerate real-life scientific calculations, and that are promising also because of their energy efficiency, which is a crucial design in future exascale platforms.Comment: 16 pages, 7 figures, 1 table, accepted for publication in the Computing Conference 2019 proceeding
    • …
    corecore