7,338 research outputs found

    Accelerated test execution using GPUs

    Get PDF
    As product life-cycles become shorter and the scale and complexity of systems increase, accelerating the execution of large test suites gains importance. Existing research has primarily focussed on techniques that reduce the size of the test suite. By contrast, we propose a technique that accelerates test execution, allowing test suites to run in a fraction of the original time, by parallel execution with a Graphics Processing Unit (GPU). Program testing, which is in essence execution of the same program with multiple sets of test data, naturally exhibits the kind of data parallelism that can be exploited with GPUs. Our approach simultaneously executes the program with one test case per GPU thread. GPUs have severe limitations, and we discuss these in the context of our approach and define the scope of our applications. We observe speed-ups up to a factor of 27 compared to single-core execution on conventional CPUs with embedded systems benchmark programs

    Accelerated Finite State Machine Test Execution Using GPUs

    Get PDF

    Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision

    Full text link
    Modern graphics processing units (GPUs) provide impressive computing resources, which can be accessed conveniently through the CUDA programming interface. We describe how GPUs can be used to considerably speed up molecular dynamics (MD) simulations for system sizes ranging up to about 1 million particles. Particular emphasis is put on the numerical long-time stability in terms of energy and momentum conservation, and caveats on limited floating-point precision are issued. Strict energy conservation over 10^8 MD steps is obtained by double-single emulation of the floating-point arithmetic in accuracy-critical parts of the algorithm. For the slow dynamics of a supercooled binary Lennard-Jones mixture, we demonstrate that the use of single-floating point precision may result in quantitatively and even physically wrong results. For simulations of a Lennard-Jones fluid, the described implementation shows speedup factors of up to 80 compared to a serial implementation for the CPU, and a single GPU was found to compare with a parallelised MD simulation using 64 distributed cores.Comment: 12 pages, 7 figures, to appear in Comp. Phys. Comm., HALMD package licensed under the GPL, see http://research.colberg.org/projects/halm

    CoreTSAR: Task Scheduling for Accelerator-aware Runtimes

    Get PDF
    Heterogeneous supercomputers that incorporate computational accelerators such as GPUs are increasingly popular due to their high peak performance, energy efficiency and comparatively low cost. Unfortunately, the programming models and frameworks designed to extract performance from all computational units still lack the flexibility of their CPU-only counterparts. Accelerated OpenMP improves this situation by supporting natural migration of OpenMP code from CPUs to a GPU. However, these implementations currently lose one of OpenMP’s best features, its flexibility: typical OpenMP applications can run on any number of CPUs. GPU implementations do not transparently employ multiple GPUs on a node or a mix of GPUs and CPUs. To address these shortcomings, we present CoreTSAR, our runtime library for dynamically scheduling tasks across heterogeneous resources, and propose straightforward extensions that incorporate this functionality into Accelerated OpenMP. We show that our approach can provide nearly linear speedup to four GPUs over only using CPUs or one GPU while increasing the overall flexibility of Accelerated OpenMP
    • 

    corecore