3,268 research outputs found

    Benchmarking hypercube hardware and software

    Get PDF
    It was long a truism in computer systems design that balanced systems achieve the best performance. Message passing parallel processors are no different. To quantify the balance of a hypercube design, an experimental methodology was developed and the associated suite of benchmarks was applied to several existing hypercubes. The benchmark suite includes tests of both processor speed in the absence of internode communication and message transmission speed as a function of communication patterns

    A sparse octree gravitational N-body code that runs entirely on the GPU processor

    Get PDF
    We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html) The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.Comment: Accepted version. Published in Journal of Computational Physics. 35 pages, 12 figures, single colum

    Formal change impact analyses for emulated control software

    Get PDF
    Processor emulators are a software tool for allowing legacy computer programs to be executed on a modern processor. In the past emulators have been used in trivial applications such as maintenance of video games. Now, however, processor emulation is being applied to safety-critical control systems, including military avionics. These applications demand utmost guarantees of correctness, but no verification techniques exist for proving that an emulated system preserves the original system’s functional and timing properties. Here we show how this can be done by combining concepts previously used for reasoning about real-time program compilation, coupled with an understanding of the new and old software architectures. In particular, we show how both the old and new systems can be given a common semantics, thus allowing their behaviours to be compared directly

    FPGA-Based Bandwidth Selection for Kernel Density Estimation Using High Level Synthesis Approach

    Full text link
    FPGA technology can offer significantly hi\-gher performance at much lower power consumption than is available from CPUs and GPUs in many computational problems. Unfortunately, programming for FPGA (using ha\-rdware description languages, HDL) is a difficult and not-trivial task and is not intuitive for C/C++/Java programmers. To bring the gap between programming effectiveness and difficulty the High Level Synthesis (HLS) approach is promoting by main FPGA vendors. Nowadays, time-intensive calculations are mainly performed on GPU/CPU architectures, but can also be successfully performed using HLS approach. In the paper we implement a bandwidth selection algorithm for kernel density estimation (KDE) using HLS and show techniques which were used to optimize the final FPGA implementation. We are also going to show that FPGA speedups, comparing to highly optimized CPU and GPU implementations, are quite substantial. Moreover, power consumption for FPGA devices is usually much less than typical power consumption of the present CPUs and GPUs.Comment: 23 pages, 6 figures, extended version of initial pape
    corecore