Search CORE

3,268 research outputs found

Benchmarking hypercube hardware and software

Author: Grunwald Dirk C.
Reed Daniel A.
Publication venue
Publication date
Field of study

It was long a truism in computer systems design that balanced systems achieve the best performance. Message passing parallel processors are no different. To quantify the balance of a hypercube design, an experimental methodology was developed and the associated suite of benchmarks was applied to several existing hypercubes. The benchmark suite includes tests of both processor speed in the absence of internode communication and message transmission speed as a function of communication patterns

NASA Technical Reports Server

A sparse octree gravitational N-body code that runs entirely on the GPU processor

Author: Barnes
Barnes
Barnes
Belleman
Billeter
Buck
Burtscher
de Berg
Dehnen
Dubinski
Evghenii Gaburov
Fukushige
Gaburov
Gaburov
Hamada
Hamada
Harfst
Hut
Jeroen Bédorf
Knuth
Lauterbach
Makino
Makino
McMillan
Nyland
Plummer
Portegies Zwart
Portegies Zwart
Raman
Salmon
Satish
Simon Portegies Zwart
Springel
Warren
Yokota
Publication venue: 'Elsevier BV'
Publication date: 01/04/2012
Field of study

We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html) The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35 pages, 12 figures, single colum

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

Formal change impact analyses for emulated control software

Author: A.C. Shaw
C. J. Fidge
C.D. Locke
D. Culpin
D. Kalinsky
D. Scholefield
D.B. Stewart
E. Börger
E.W. Dijkstra
I.J. Hayes
J. Hooman
K. Lermer
L. Beus-Dukic
L.A. Leventhal
P. Gust
S. Stepney
T.P. Baker
U
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Processor emulators are a software tool for allowing legacy computer programs to be executed on a modern processor. In the past emulators have been used in trivial applications such as maintenance of video games. Now, however, processor emulation is being applied to safety-critical control systems, including military avionics. These applications demand utmost guarantees of correctness, but no verification techniques exist for proving that an emulated system preserves the original system’s functional and timing properties. Here we show how this can be done by combining concepts previously used for reasoning about real-time program compilation, coupled with an understanding of the new and old software architectures. In particular, we show how both the old and new systems can be given a common semantics, thus allowing their behaviours to be compared directly

Crossref

Queensland University of Technology ePrints Archive

University of Queensland eSpace

FPGA-Based Bandwidth Selection for Kernel Density Estimation Using High Level Synthesis Approach

Author: Gramacki Artur
Gramacki Jarosław
Sawerwain Marek
Publication venue
Publication date: 29/09/2015
Field of study

FPGA technology can offer significantly hi\-gher performance at much lower power consumption than is available from CPUs and GPUs in many computational problems. Unfortunately, programming for FPGA (using ha\-rdware description languages, HDL) is a difficult and not-trivial task and is not intuitive for C/C++/Java programmers. To bring the gap between programming effectiveness and difficulty the High Level Synthesis (HLS) approach is promoting by main FPGA vendors. Nowadays, time-intensive calculations are mainly performed on GPU/CPU architectures, but can also be successfully performed using HLS approach. In the paper we implement a bandwidth selection algorithm for kernel density estimation (KDE) using HLS and show techniques which were used to optimize the final FPGA implementation. We are also going to show that FPGA speedups, comparing to highly optimized CPU and GPU implementations, are quite substantial. Moreover, power consumption for FPGA devices is usually much less than typical power consumption of the present CPUs and GPUs.Comment: 23 pages, 6 figures, extended version of initial pape

arXiv.org e-Print Archive

Biblioteka Nauki - repozytorium artykuÅÃ³w