11,843 research outputs found
PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation
High-performance computing has recently seen a surge of interest in
heterogeneous systems, with an emphasis on modern Graphics Processing Units
(GPUs). These devices offer tremendous potential for performance and efficiency
in important large-scale applications of computational science. However,
exploiting this potential can be challenging, as one must adapt to the
specialized and rapidly evolving computing environment currently exhibited by
GPUs. One way of addressing this challenge is to embrace better techniques and
develop tools tailored to their needs. This article presents one simple
technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL,
two open-source toolkits that support this technique.
In introducing PyCUDA and PyOpenCL, this article proposes the combination of
a dynamic, high-level scripting language with the massive performance of a GPU
as a compelling two-tiered computing platform, potentially offering significant
performance and productivity advantages over conventional single-tier, static
systems. The concept of RTCG is simple and easily implemented using existing,
robust infrastructure. Nonetheless it is powerful enough to support (and
encourage) the creation of custom application-specific tools by its users. The
premise of the paper is illustrated by a wide range of examples where the
technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie
The Advanced Compton Telescope
The Advanced Compton Telescope (ACT), the next major step in gamma-ray astronomy, will probe the fires where chemical elements are formed by enabling high-resolution spectroscopy of nuclear emission from supernova explosions. During the past two years, our collaboration has been undertaking a NASA mission concept study for ACT. This study was designed to (1) transform the key scientific objectives into specific instrument requirements, (2) to identify the most promising technologies to meet those requirements, and (3) to design a viable mission concept for this instrument. We present the results of this study, including scientific goals and expected performance, mission design, and technology recommendations
Architecture and Design of Medical Processor Units for Medical Networks
This paper introduces analogical and deductive methodologies for the design
medical processor units (MPUs). From the study of evolution of numerous earlier
processors, we derive the basis for the architecture of MPUs. These specialized
processors perform unique medical functions encoded as medical operational
codes (mopcs). From a pragmatic perspective, MPUs function very close to CPUs.
Both processors have unique operation codes that command the hardware to
perform a distinct chain of subprocesses upon operands and generate a specific
result unique to the opcode and the operand(s). In medical environments, MPU
decodes the mopcs and executes a series of medical sub-processes and sends out
secondary commands to the medical machine. Whereas operands in a typical
computer system are numerical and logical entities, the operands in medical
machine are objects such as such as patients, blood samples, tissues, operating
rooms, medical staff, medical bills, patient payments, etc. We follow the
functional overlap between the two processes and evolve the design of medical
computer systems and networks.Comment: 17 page
GRAPE-5: A Special-Purpose Computer for N-body Simulation
We have developed a special-purpose computer for gravitational many-body
simulations, GRAPE-5. GRAPE-5 is the successor of GRAPE-3. Both consist of
eight custom pipeline chips (G5 chip and GRAPE chip). The difference between
GRAPE-5 and GRAPE-3 are: (1) The G5 chip contains two pipelines operating at 80
MHz, while the GRAPE chip had one at 20 MHz. Thus, the calculation speed of the
G5 chip and that of GRAPE-5 board are 8 times faster than that of GRAPE chip
and GRAPE-3 board. (2) The GRAPE-5 board adopted PCI bus as the interface to
the host computer instead of VME of GRAPE-3, resulting in the communication
speed one order of magnitude faster. (3) In addition to the pure 1/r potential,
the G5 chip can calculate forces with arbitrary cutoff functions, so that it
can be applied to Ewald or P^3M methods. (4) The pairwise force calculated on
GRAPE-5 is about 10 times more accurate than that on GRAPE-3. On one GRAPE-5
board, one timestep of 128k-body simulation with direct summation algorithm
takes 14 seconds. With Barnes-Hut tree algorithm (theta = 0.75), one timestep
of 10^6-body simulation can be done in 16 seconds.Comment: 19 pages, 24 Postscript figures, 3 tables, Latex, submitted to
Publications of the Astronomical Society of Japa
- …