2,064 research outputs found

    A general framework for efficient FPGA implementation of matrix product

    Get PDF
    Original article can be found at: http://www.medjcn.com/ Copyright Softmotor LimitedHigh performance systems are required by the developers for fast processing of computationally intensive applications. Reconfigurable hardware devices in the form of Filed-Programmable Gate Arrays (FPGAs) have been proposed as viable system building blocks in the construction of high performance systems at an economical price. Given the importance and the use of matrix algorithms in scientific computing applications, they seem ideal candidates to harness and exploit the advantages offered by FPGAs. In this paper, a system for matrix algorithm cores generation is described. The system provides a catalog of efficient user-customizable cores, designed for FPGA implementation, ranging in three different matrix algorithm categories: (i) matrix operations, (ii) matrix transforms and (iii) matrix decomposition. The generated core can be either a general purpose or a specific application core. The methodology used in the design and implementation of two specific image processing application cores is presented. The first core is a fully pipelined matrix multiplier for colour space conversion based on distributed arithmetic principles while the second one is a parallel floating-point matrix multiplier designed for 3D affine transformations.Peer reviewe

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconļ¬gurable Array (CGRA) architectures accelerate the same inner loops that beneļ¬t from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efļ¬ciently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on ļ¬‚exibility, performance, and power-efļ¬ciency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual ļ¬ne-tuning of source code

    Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code

    Get PDF
    Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance. In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented

    Programmable photonics : an opportunity for an accessible large-volume PIC ecosystem

    Get PDF
    We look at the opportunities presented by the new concepts of generic programmable photonic integrated circuits (PIC) to deploy photonics on a larger scale. Programmable PICs consist of waveguide meshes of tunable couplers and phase shifters that can be reconfigured in software to define diverse functions and arbitrary connectivity between the input and output ports. Off-the-shelf programmable PICs can dramatically shorten the development time and deployment costs of new photonic products, as they bypass the design-fabrication cycle of a custom PIC. These chips, which actually consist of an entire technology stack of photonics, electronics packaging and software, can potentially be manufactured cheaper and in larger volumes than application-specific PICs. We look into the technology requirements of these generic programmable PICs and discuss the economy of scale. Finally, we make a qualitative analysis of the possible application spaces where generic programmable PICs can play an enabling role, especially to companies who do not have an in-depth background in PIC technology

    Efficient Molecular Dynamics Simulation on Reconfigurable Models with MultiGrid Method

    Get PDF
    In the field of biology, MD simulations are continuously used to investigate biological studies. A Molecular Dynamics (MD) system is defined by the position and momentum of particles and their interactions. The dynamics of a system can be evaluated by an N-body problem and the simulation is continued until the energy reaches equilibrium. Thus, solving the dynamics numerically and evaluating the interaction is computationally expensive even for a small number of particles in the system. We are focusing on long-ranged interactions, since the calculation time is O(N^2) for an N particle system. In this dissertation, we are proposing two research directions for the MD simulation. First, we design a new variation of Multigrid (MG) algorithm called Multi-level charge assignment (MCA) that requires O(N) time for accurate and efficient calculation of the electrostatic forces. We apply MCA and back interpolation based on the structure of molecules to enhance the accuracy of the simulation. Our second research utilizes reconfigurable models to achieve fast calculation time. We have been working on exploiting two reconfigurable models. We design FPGA-based MD simulator implementing MCA method for Xilinx Virtex-IV. It performs about 10 to 100 times faster than software implementation depending on the simulation accuracy desired. We also design fast and scalable Reconfigurable mesh (R-Mesh) algorithms for MD simulations. This work demonstrates that the large scale biological studies can be simulated in close to real time. The R-Mesh algorithms we design highlight the feasibility of these models to evaluate potentials with faster calculation times. Specifically, we develop R-Mesh algorithms for both Direct method and Multigrid method. The Direct method evaluates exact potentials and forces, but requires O(N^2) calculation time for evaluating electrostatic forces on a general purpose processor. The MG method adopts an interpolation technique to reduce calculation time to O(N) for a given accuracy. However, our R-Mesh algorithms require only O(N) or O(logN) time complexity for the Direct method on N linear R-Mesh and NĀ”ĀæN R-Mesh, respectively and O(r)+O(logM) time complexity for the Multigrid method on an XĀ”ĀæYĀ”ĀæZ R-Mesh. r is N/M and M = XĀ”ĀæYĀ”ĀæZ is the number of finest grid points

    Network on chip architecture for multi-agent systems in FPGA

    Get PDF
    A system of interacting agents is, by definition, very demanding in terms of computational resources. Although multi-agent systems have been used to solve complex problems in many areas, it is usually very difficult to perform large-scale simulations in their targeted serial computing platforms. Reconfigurable hardware, in particular Field Programmable Gate Arrays (FPGA) devices, have been successfully used in High Performance Computing applications due to their inherent flexibility, data parallelism and algorithm acceleration capabilities. Indeed, reconfigurable hardware seems to be the next logical step in the agency paradigm, but only a few attempts have been successful in implementing multi-agent systems in these platforms. This paper discusses the problem of inter-agent communications in Field Programmable Gate Arrays. It proposes a Network-on-Chip in a hierarchical star topology to enable agentsā€™ transactions through message broadcasting using the Open Core Protocol, as an interface between hardware modules. A customizable router microarchitecture is described and a multi-agent system is created to simulate and analyse message exchanges in a generic heavy traffic load agent-based application. Experiments have shown a throughput of 1.6Gbps per port at 100 MHz without packet loss and seamless scalability characteristics
    • ā€¦
    corecore