38,654 research outputs found

    PGPG: An Automatic Generator of Pipeline Design for Programmable GRAPE Systems

    Get PDF
    We have developed PGPG (Pipeline Generator for Programmable GRAPE), a software which generates the low-level design of the pipeline processor and communication software for FPGA-based computing engines (FBCEs). An FBCE typically consists of one or multiple FPGA (Field-Programmable Gate Array) chips and local memory. Here, the term "Field-Programmable" means that one can rewrite the logic implemented to the chip after the hardware is completed, and therefore a single FBCE can be used for calculation of various functions, for example pipeline processors for gravity, SPH interaction, or image processing. The main problem with FBCEs is that the user need to develop the detailed hardware design for the processor to be implemented to FPGA chips. In addition, she or he has to write the control logic for the processor, communication and data conversion library on the host processor, and application program which uses the developed processor. These require detailed knowledge of hardware design, a hardware description language such as VHDL, the operating system and the application, and amount of human work is huge. A relatively simple design would require 1 person-year or more. The PGPG software generates all necessary design descriptions, except for the application software itself, from a high-level design description of the pipeline processor in the PGPG language. The PGPG language is a simple language, specialized to the description of pipeline processors. Thus, the design of pipeline processor in PGPG language is much easier than the traditional design. For real applications such as the pipeline for gravitational interaction, the pipeline processor generated by PGPG achieved the performance similar to that of hand-written code. In this paper we present a detailed description of PGPG version 1.0.Comment: 24 pages, 6 figures, accepted PASJ 2005 July 2

    Accelerating statistical texture analysis with an FPGA-DSP hybrid architecture

    Get PDF
    Nowadays, most image processing systems are implemented using either MMX-optimized software libraries or, when time requirements are limited, expensive high performance DSP-based boards. In this paper we present a texture analysis co-processor concept that permits the efficient hardware implementation of statistical feature extraction, and hardware-software codesign to achieve high-performance low-cost solutions. We propose a hybrid architecture based on FPGA chips, for massive data processing, and digital signal processor (DSP) for floating-point computations. In our preliminary trials with test images, we achieved sufficient performance improvements to handle a wide range of real-time applications

    Multi-task Implementation for Image Reconstruction of an AER Communication

    Get PDF
    Address-Event-Representation (AER) is a communication protocol for transferring spikes between bio-inspired chips. Such systems may consist of a hierarchical structure with several chips that transmit spikes among them in real time, while performing some processing. There exist several AER tools to help in developing and testing AER based systems. These tools require the use of a computer to allow the processing of the event information, reaching very high bandwidth at the AER communication level. We propose to use an embedded platform based on multi-task operating system to allow both, the AER communication and the AER processing without a laptop or a computer. We have connected and programmed a Gumstix computer to process Address- Event information and measured the performance referred to the previous AER tools solutions. In this paper, we present and study the performance of a new philosophy of a frame-grabber AER tool based on a multi-task environment, composed by the Intel XScale processor governed by an embedded GNU/Linux system.Ministerio de Ciencia e Innovación TEC2006-11730-C03-0

    PROGRAPE-1: A Programmable, Multi-Purpose Computer for Many-Body Simulations

    Get PDF
    We have developed PROGRAPE-1 (PROgrammable GRAPE-1), a programmable multi-purpose computer for many-body simulations. The main difference between PROGRAPE-1 and "traditional" GRAPE systems is that the former uses FPGA (Field Programmable Gate Array) chips as the processing elements, while the latter rely on the hardwired pipeline processor specialized to gravitational interactions. Since the logic implemented in FPGA chips can be reconfigured, we can use PROGRAPE-1 to calculate not only gravitational interactions but also other forms of interactions such as van der Waals force, hydrodynamical interactions in SPH calculation and so on. PROGRAPE-1 comprises two Altera EPF10K100 FPGA chips, each of which contains nominally 100,000 gates. To evaluate the programmability and performance of PROGRAPE-1, we implemented a pipeline for gravitational interaction similar to that of GRAPE-3. One pipeline fitted into a single FPGA chip, which operated at 16 MHz clock. Thus, for gravitational interaction, PROGRAPE-1 provided the speed of 0.96 Gflops-equivalent. PROGRAPE will prove to be useful for wide-range of particle-based simulations in which the calculation cost of interactions other than gravity is high, such as the evaluation of SPH interactions.Comment: 20 pages with 9 figures; submitted to PAS

    GRAPE-6: The massively-parallel special-purpose computer for astrophysical particle simulation

    Full text link
    In this paper, we describe the architecture and performance of the GRAPE-6 system, a massively-parallel special-purpose computer for astrophysical NN-body simulations. GRAPE-6 is the successor of GRAPE-4, which was completed in 1995 and achieved the theoretical peak speed of 1.08 Tflops. As was the case with GRAPE-4, the primary application of GRAPE-6 is simulation of collisional systems, though it can be used for collisionless systems. The main differences between GRAPE-4 and GRAPE-6 are (a) The processor chip of GRAPE-6 integrates 6 force-calculation pipelines, compared to one pipeline of GRAPE-4 (which needed 3 clock cycles to calculate one interaction), (b) the clock speed is increased from 32 to 90 MHz, and (c) the total number of processor chips is increased from 1728 to 2048. These improvements resulted in the peak speed of 64 Tflops. We also discuss the design of the successor of GRAPE-6.Comment: Accepted for publication in PASJ, scheduled to appear in Vol. 55, No.

    Performance evaluation of multi-core multi-cluster architecture

    No full text
    A multi-core cluster is a cluster composed of numbers of nodes where each node has a number of processors, each with more than one core within each single chip. Cluster nodes are connected via an interconnection network. Multi-cored processors are able to achieve higher performance without driving up power consumption and heat, which is the main concern in a single-core processor. A general problem in the network arises from the fact that multiple messages can be in transit at the same time on the same network links. This paper considers the communication latencies of a multi-core multi-cluster architecture will be investigated using simulation experiments and measurements under various working conditions
    corecore