38,654 research outputs found
PGPG: An Automatic Generator of Pipeline Design for Programmable GRAPE Systems
We have developed PGPG (Pipeline Generator for Programmable GRAPE), a
software which generates the low-level design of the pipeline processor and
communication software for FPGA-based computing engines (FBCEs). An FBCE
typically consists of one or multiple FPGA (Field-Programmable Gate Array)
chips and local memory. Here, the term "Field-Programmable" means that one can
rewrite the logic implemented to the chip after the hardware is completed, and
therefore a single FBCE can be used for calculation of various functions, for
example pipeline processors for gravity, SPH interaction, or image processing.
The main problem with FBCEs is that the user need to develop the detailed
hardware design for the processor to be implemented to FPGA chips. In addition,
she or he has to write the control logic for the processor, communication and
data conversion library on the host processor, and application program which
uses the developed processor. These require detailed knowledge of hardware
design, a hardware description language such as VHDL, the operating system and
the application, and amount of human work is huge. A relatively simple design
would require 1 person-year or more. The PGPG software generates all necessary
design descriptions, except for the application software itself, from a
high-level design description of the pipeline processor in the PGPG language.
The PGPG language is a simple language, specialized to the description of
pipeline processors. Thus, the design of pipeline processor in PGPG language is
much easier than the traditional design. For real applications such as the
pipeline for gravitational interaction, the pipeline processor generated by
PGPG achieved the performance similar to that of hand-written code. In this
paper we present a detailed description of PGPG version 1.0.Comment: 24 pages, 6 figures, accepted PASJ 2005 July 2
Accelerating statistical texture analysis with an FPGA-DSP hybrid architecture
Nowadays, most image processing systems are implemented using either MMX-optimized software libraries or, when time requirements are limited, expensive high performance DSP-based boards. In this paper we present a texture analysis co-processor concept that permits the efficient hardware implementation of statistical feature extraction, and hardware-software codesign to achieve high-performance low-cost solutions. We propose a hybrid architecture based on FPGA chips, for massive data processing, and digital signal processor (DSP) for floating-point computations. In our preliminary trials with test images, we achieved sufficient performance improvements to handle a wide range of real-time applications
Multi-task Implementation for Image Reconstruction of an AER Communication
Address-Event-Representation (AER) is a communication protocol
for transferring spikes between bio-inspired chips. Such systems may consist of
a hierarchical structure with several chips that transmit spikes among them in
real time, while performing some processing. There exist several AER tools to
help in developing and testing AER based systems. These tools require the use
of a computer to allow the processing of the event information, reaching very
high bandwidth at the AER communication level. We propose to use an
embedded platform based on multi-task operating system to allow both, the
AER communication and the AER processing without a laptop or a computer.
We have connected and programmed a Gumstix computer to process Address-
Event information and measured the performance referred to the previous AER
tools solutions. In this paper, we present and study the performance of a new
philosophy of a frame-grabber AER tool based on a multi-task environment,
composed by the Intel XScale processor governed by an embedded GNU/Linux
system.Ministerio de Ciencia e Innovación TEC2006-11730-C03-0
PROGRAPE-1: A Programmable, Multi-Purpose Computer for Many-Body Simulations
We have developed PROGRAPE-1 (PROgrammable GRAPE-1), a programmable
multi-purpose computer for many-body simulations. The main difference between
PROGRAPE-1 and "traditional" GRAPE systems is that the former uses FPGA (Field
Programmable Gate Array) chips as the processing elements, while the latter
rely on the hardwired pipeline processor specialized to gravitational
interactions. Since the logic implemented in FPGA chips can be reconfigured, we
can use PROGRAPE-1 to calculate not only gravitational interactions but also
other forms of interactions such as van der Waals force, hydrodynamical
interactions in SPH calculation and so on. PROGRAPE-1 comprises two Altera
EPF10K100 FPGA chips, each of which contains nominally 100,000 gates. To
evaluate the programmability and performance of PROGRAPE-1, we implemented a
pipeline for gravitational interaction similar to that of GRAPE-3. One pipeline
fitted into a single FPGA chip, which operated at 16 MHz clock. Thus, for
gravitational interaction, PROGRAPE-1 provided the speed of 0.96
Gflops-equivalent. PROGRAPE will prove to be useful for wide-range of
particle-based simulations in which the calculation cost of interactions other
than gravity is high, such as the evaluation of SPH interactions.Comment: 20 pages with 9 figures; submitted to PAS
GRAPE-6: The massively-parallel special-purpose computer for astrophysical particle simulation
In this paper, we describe the architecture and performance of the GRAPE-6
system, a massively-parallel special-purpose computer for astrophysical
-body simulations. GRAPE-6 is the successor of GRAPE-4, which was completed
in 1995 and achieved the theoretical peak speed of 1.08 Tflops. As was the case
with GRAPE-4, the primary application of GRAPE-6 is simulation of collisional
systems, though it can be used for collisionless systems. The main differences
between GRAPE-4 and GRAPE-6 are (a) The processor chip of GRAPE-6 integrates 6
force-calculation pipelines, compared to one pipeline of GRAPE-4 (which needed
3 clock cycles to calculate one interaction), (b) the clock speed is increased
from 32 to 90 MHz, and (c) the total number of processor chips is increased
from 1728 to 2048. These improvements resulted in the peak speed of 64 Tflops.
We also discuss the design of the successor of GRAPE-6.Comment: Accepted for publication in PASJ, scheduled to appear in Vol. 55, No.
Performance evaluation of multi-core multi-cluster architecture
A multi-core cluster is a cluster composed of numbers of nodes where each node has a number of processors, each with more than one core within each single chip. Cluster nodes are connected via an interconnection network. Multi-cored processors are able to achieve higher performance without driving up power consumption and heat, which is the main concern in a single-core processor. A general problem in the network arises from the fact that multiple messages can be in transit at the same time on the same network links. This paper considers the communication latencies of a multi-core multi-cluster architecture will be investigated using simulation experiments and measurements under various working conditions
- …