Search CORE

11,843 research outputs found

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

Author: Ahmed Fasih
Andreas Klöckner
Bell
Bryan Catanzaro
Buck
Chandler
Dalcín
Eich
Feldman
Flanagan
Frigo
Group
Hestenes
Hesthaven
Kennedy
Klöckner
Lam
Langtangen
Lindholm
McCarthy
McCool
Nicolas Pinto
Oliphant
Owens
Paul Ivanov
Pinto
Pinto
Prud’homme
Reynders
Seiler
Stein
Valiant
van Hateren
Veldhuizen
Wang
Whaley
Yunsup Lee
Publication venue: 'Elsevier BV'
Publication date: 29/03/2011
Field of study

High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

arXiv.org e-Print Archive

Crossref

The Advanced Compton Telescope

Author: Aprile Elena
Baring Matthew
Beacom John
Bildsten Lars
Bloser Peter F.
Boggs S E
Dermer Charles
Gehrels Neil
Harris M
Hartman Dieter H
Hernanz Margarita
Hoover A
Kippen R M
Klimenk Alexei
Kocevski Dan
Kurfess J
Leising Marc
McConnell Mark L
Milne Peter
Novikova E I
Oberlack U
Phlips B F
Polsen Mark
Ryan James M.
Smith David
Starrfield Sumner
Sturner Steven
Tournear Derek
Weidenspointer G
Wulf Eric
Wunderer Cornelia B
Zoglauer A
Zych Allen
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 13/06/2006
Field of study

The Advanced Compton Telescope (ACT), the next major step in gamma-ray astronomy, will probe the fires where chemical elements are formed by enabling high-resolution spectroscopy of nuclear emission from supernova explosions. During the past two years, our collaboration has been undertaking a NASA mission concept study for ACT. This study was designed to (1) transform the key scientific objectives into specific instrument requirements, (2) to identify the most promising technologies to meet those requirements, and (3) to design a viable mission concept for this instrument. We present the results of this study, including scientific goals and expected performance, mission design, and technology recommendations

UNH Scholars' Repository

Architecture and Design of Medical Processor Units for Medical Networks

Author: Ahamed Syed V.
Rahman Syed Shawon M.
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 13/04/2011
Field of study

This paper introduces analogical and deductive methodologies for the design medical processor units (MPUs). From the study of evolution of numerous earlier processors, we derive the basis for the architecture of MPUs. These specialized processors perform unique medical functions encoded as medical operational codes (mopcs). From a pragmatic perspective, MPUs function very close to CPUs. Both processors have unique operation codes that command the hardware to perform a distinct chain of subprocesses upon operands and generate a specific result unique to the opcode and the operand(s). In medical environments, MPU decodes the mopcs and executes a series of medical sub-processes and sends out secondary commands to the medical machine. Whereas operands in a typical computer system are numerical and logical entities, the operands in medical machine are objects such as such as patients, blood samples, tissues, operating rooms, medical staff, medical bills, patient payments, etc. We follow the functional overlap between the two processes and evolve the design of medical computer systems and networks.Comment: 17 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

GRAPE-5: A Special-Purpose Computer for N-body Simulation

Author: Fukushige Toshiyuki
Kawai Atsushi
Makino Junichiro
Taiji Makoto
Publication venue: 'Oxford University Press (OUP)'
Publication date: 07/09/1999
Field of study

We have developed a special-purpose computer for gravitational many-body simulations, GRAPE-5. GRAPE-5 is the successor of GRAPE-3. Both consist of eight custom pipeline chips (G5 chip and GRAPE chip). The difference between GRAPE-5 and GRAPE-3 are: (1) The G5 chip contains two pipelines operating at 80 MHz, while the GRAPE chip had one at 20 MHz. Thus, the calculation speed of the G5 chip and that of GRAPE-5 board are 8 times faster than that of GRAPE chip and GRAPE-3 board. (2) The GRAPE-5 board adopted PCI bus as the interface to the host computer instead of VME of GRAPE-3, resulting in the communication speed one order of magnitude faster. (3) In addition to the pure 1/r potential, the G5 chip can calculate forces with arbitrary cutoff functions, so that it can be applied to Ewald or P^3M methods. (4) The pairwise force calculated on GRAPE-5 is about 10 times more accurate than that on GRAPE-3. On one GRAPE-5 board, one timestep of 128k-body simulation with direct summation algorithm takes 14 seconds. With Barnes-Hut tree algorithm (theta = 0.75), one timestep of 10^6-body simulation can be done in 16 seconds.Comment: 19 pages, 24 Postscript figures, 3 tables, Latex, submitted to Publications of the Astronomical Society of Japa

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server