17,938 research outputs found
GRAPE-5: A Special-Purpose Computer for N-body Simulation
We have developed a special-purpose computer for gravitational many-body
simulations, GRAPE-5. GRAPE-5 is the successor of GRAPE-3. Both consist of
eight custom pipeline chips (G5 chip and GRAPE chip). The difference between
GRAPE-5 and GRAPE-3 are: (1) The G5 chip contains two pipelines operating at 80
MHz, while the GRAPE chip had one at 20 MHz. Thus, the calculation speed of the
G5 chip and that of GRAPE-5 board are 8 times faster than that of GRAPE chip
and GRAPE-3 board. (2) The GRAPE-5 board adopted PCI bus as the interface to
the host computer instead of VME of GRAPE-3, resulting in the communication
speed one order of magnitude faster. (3) In addition to the pure 1/r potential,
the G5 chip can calculate forces with arbitrary cutoff functions, so that it
can be applied to Ewald or P^3M methods. (4) The pairwise force calculated on
GRAPE-5 is about 10 times more accurate than that on GRAPE-3. On one GRAPE-5
board, one timestep of 128k-body simulation with direct summation algorithm
takes 14 seconds. With Barnes-Hut tree algorithm (theta = 0.75), one timestep
of 10^6-body simulation can be done in 16 seconds.Comment: 19 pages, 24 Postscript figures, 3 tables, Latex, submitted to
Publications of the Astronomical Society of Japa
PGPG: An Automatic Generator of Pipeline Design for Programmable GRAPE Systems
We have developed PGPG (Pipeline Generator for Programmable GRAPE), a
software which generates the low-level design of the pipeline processor and
communication software for FPGA-based computing engines (FBCEs). An FBCE
typically consists of one or multiple FPGA (Field-Programmable Gate Array)
chips and local memory. Here, the term "Field-Programmable" means that one can
rewrite the logic implemented to the chip after the hardware is completed, and
therefore a single FBCE can be used for calculation of various functions, for
example pipeline processors for gravity, SPH interaction, or image processing.
The main problem with FBCEs is that the user need to develop the detailed
hardware design for the processor to be implemented to FPGA chips. In addition,
she or he has to write the control logic for the processor, communication and
data conversion library on the host processor, and application program which
uses the developed processor. These require detailed knowledge of hardware
design, a hardware description language such as VHDL, the operating system and
the application, and amount of human work is huge. A relatively simple design
would require 1 person-year or more. The PGPG software generates all necessary
design descriptions, except for the application software itself, from a
high-level design description of the pipeline processor in the PGPG language.
The PGPG language is a simple language, specialized to the description of
pipeline processors. Thus, the design of pipeline processor in PGPG language is
much easier than the traditional design. For real applications such as the
pipeline for gravitational interaction, the pipeline processor generated by
PGPG achieved the performance similar to that of hand-written code. In this
paper we present a detailed description of PGPG version 1.0.Comment: 24 pages, 6 figures, accepted PASJ 2005 July 2
MORA - an architecture and programming model for a resource efficient coarse grained reconfigurable processor
This paper presents an architecture and implementation details for MORA, a novel coarse grained reconfigurable processor for accelerating media processing applications. The MORA architecture involves a 2-D array of several such processors, to deliver low cost, high throughput performance in media processing applications. A distinguishing feature of the MORA architecture is the co-design of hardware architecture and low-level programming language throughout the design cycle. The implementation details for the single MORA processor, and benchmark evaluation using a cycle accurate simulator are presented
An area-efficient 2-D convolution implementation on FPGA for space applications
The 2-D Convolution is an algorithm widely used in image and video processing. Although its computation is simple, its implementation requires a high computational power and an intensive use of memory. Field Programmable Gate Arrays (FPGA) architectures were proposed to accelerate calculations of 2-D Convolution and the use of buffers implemented on FPGAs are used to avoid direct memory access. In this paper we present an implementation of the 2-D Convolution algorithm on a FPGA architecture designed to support this operation in space applications. This proposed solution dramatically decreases the area needed keeping good performance, making it appropriate for embedded systems in critical space application
The Computational Complexity of Generating Random Fractals
In this paper we examine a number of models that generate random fractals.
The models are studied using the tools of computational complexity theory from
the perspective of parallel computation. Diffusion limited aggregation and
several widely used algorithms for equilibrating the Ising model are shown to
be highly sequential; it is unlikely they can be simulated efficiently in
parallel. This is in contrast to Mandelbrot percolation that can be simulated
in constant parallel time. Our research helps shed light on the intrinsic
complexity of these models relative to each other and to different growth
processes that have been recently studied using complexity theory. In addition,
the results may serve as a guide to simulation physics.Comment: 28 pages, LATEX, 8 Postscript figures available from
[email protected]
A versatile sensor interface for programmable vision systems-on-chip
This paper describes an optical sensor interface designed for a programmable mixed-signal vision chip. This chip has been designed and manufactured in a standard 0.35μm n-well CMOS technology with one poly layer and five metal layers. It contains a digital shell for control and data interchange, and a central array of 128 × 128 identical cells, each cell corresponding to a pixel. Die size is 11.885 × 12.230mm2 and cell size is 75.7μm × 73.3μm. Each cell contains 198 transistors dedicated to functions like processing, storage, and sensing. The system is oriented to real-time, single-chip image acquisition and processing. Since each pixel performs the basic functions of sensing, processing and storage, data transferences are fully parallel (image-wide). The programmability of the processing functions enables the realization of complex image processing functions based on the sequential application of simpler operations. This paper provides a general overview of the system architecture and functionality, with special emphasis on the optical interface.European Commission IST-1999-19007Office of Naval Research (USA) N00014021088
Wildcard dimensions, coding theory and fault-tolerant meshes and hypercubes
Hypercubes, meshes and tori are well known interconnection networks for parallel computers. The sets of edges in those graphs can be partitioned to dimensions. It is well known that the hypercube can be extended by adding a wildcard dimension resulting in a folded hypercube that has better fault-tolerant and communication capabilities. First we prove that the folded hypercube is optimal in the sense that only a single wildcard dimension can be added to the hypercube. We then investigate the idea of adding wildcard dimensions to d-dimensional meshes and tori. Using techniques from error correcting codes we construct d-dimensional meshes and tori with wildcard dimensions. Finally, we show how these constructions can be used to tolerate edge and node faults in mesh and torus networks
- …