7,426 research outputs found

    A pilgrimage to gravity on GPUs

    Get PDF
    In this short review we present the developments over the last 5 decades that have led to the use of Graphics Processing Units (GPUs) for astrophysical simulations. Since the introduction of NVIDIA's Compute Unified Device Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body simulations and is so popular these days that almost all papers about high precision N-body simulations use methods that are accelerated by GPUs. With the GPU hardware becoming more advanced and being used for more advanced algorithms like gravitational tree-codes we see a bright future for GPU like hardware in computational astrophysics.Comment: To appear in: European Physical Journal "Special Topics" : "Computer Simulations on Graphics Processing Units" . 18 pages, 8 figure

    High Performance Direct Gravitational N-body Simulations on Graphics Processing Units

    Get PDF
    We present the results of gravitational direct NN-body simulations using the commercial graphics processing units (GPU) NVIDIA Quadro FX1400 and GeForce 8800GTX, and compare the results with GRAPE-6Af special purpose hardware. The force evaluation of the NN-body problem was implemented in Cg using the GPU directly to speed-up the calculations. The integration of the equations of motions were, running on the host computer, implemented in C using the 4th order predictor-corrector Hermite integrator with block time steps. We find that for a large number of particles (N \apgt 10^4) modern graphics processing units offer an attractive low cost alternative to GRAPE special purpose hardware. A modern GPU continues to give a relatively flat scaling with the number of particles, comparable to that of the GRAPE. Using the same time step criterion the total energy of the NN-body system was conserved better than to one in 10610^6 on the GPU, which is only about an order of magnitude worse than obtained with GRAPE. For N\apgt 10^6 the GeForce 8800GTX was about 20 times faster than the host computer. Though still about an order of magnitude slower than GRAPE, modern GPU's outperform GRAPE in their low cost, long mean time between failure and the much larger onboard memory; the GRAPE-6Af holds at most 256k particles whereas the GeForce 8800GTF can hold 9 million particles in memory.Comment: Submitted to New Astronom

    Harvesting graphics power for MD simulations

    Get PDF
    We discuss an implementation of molecular dynamics (MD) simulations on a graphic processing unit (GPU) in the NVIDIA CUDA language. We tested our code on a modern GPU, the NVIDIA GeForce 8800 GTX. Results for two MD algorithms suitable for short-ranged and long-ranged interactions, and a congruential shift random number generator are presented. The performance of the GPU's is compared to their main processor counterpart. We achieve speedups of up to 80, 40 and 150 fold, respectively. With newest generation of GPU's one can run standard MD simulations at 10^7 flops/$.Comment: 12 pages, 5 figures. Submitted to Mol. Si

    Scalable Interactive Volume Rendering Using Off-the-shelf Components

    Get PDF
    This paper describes an application of a second generation implementation of the Sepia architecture (Sepia-2) to interactive volu-metric visualization of large rectilinear scalar fields. By employingpipelined associative blending operators in a sort-last configuration a demonstration system with 8 rendering computers sustains 24 to 28 frames per second while interactively rendering large data volumes (1024x256x256 voxels, and 512x512x512 voxels). We believe interactive performance at these frame rates and data sizes is unprecedented. We also believe these results can be extended to other types of structured and unstructured grids and a variety of GL rendering techniques including surface rendering and shadow map-ping. We show how to extend our single-stage crossbar demonstration system to multi-stage networks in order to support much larger data sizes and higher image resolutions. This requires solving a dynamic mapping problem for a class of blending operators that includes Porter-Duff compositing operators

    A Framework for Megascale Agent Based Model Simulations on Graphics Processing Units

    Get PDF
    Agent-based modeling is a technique for modeling dynamic systems from the bottom up. Individual elements of the system are represented computationally as agents. The system-level behaviors emerge from the micro-level interactions of the agents. Contemporary state-of-the-art agent-based modeling toolkits are essentially discrete-event simulators designed to execute serially on the Central Processing Unit (CPU). They simulate Agent-Based Models (ABMs) by executing agent actions one at a time. In addition to imposing an un-natural execution order, these toolkits have limited scalability. In this article, we investigate data-parallel computer architectures such as Graphics Processing Units (GPUs) to simulate large scale ABMs. We have developed a series of efficient, data parallel algorithms for handling environment updates, various agent interactions, agent death and replication, and gathering statistics. We present three fundamental innovations that provide unprecedented scalability. The first is a novel stochastic memory allocator which enables parallel agent replication in O(1) average time. The second is a technique for resolving precedence constraints for agent actions in parallel. The third is a method that uses specialized graphics hardware, to gather and process statistical measures. These techniques have been implemented on a modern day GPU resulting in a substantial performance increase. We believe that our system is the first ever completely GPU based agent simulation framework. Although GPUs are the focus of our current implementations, our techniques can easily be adapted to other data-parallel architectures. We have benchmarked our framework against contemporary toolkits using two popular ABMs, namely, SugarScape and StupidModel.GPGPU, Agent Based Modeling, Data Parallel Algorithms, Stochastic Simulations

    High Performance Direct Gravitational N-body Simulations on Graphics Processing Units -- II: An implementation in CUDA

    Get PDF
    We present the results of gravitational direct NN-body simulations using the Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the NN-body problem is implemented in ``Compute Unified Device Architecture'' (CUDA) using the GPU to speed-up the calculations. We tested the implementation on three different NN-body codes: two direct NN-body integration codes, using the 4th order predictor-corrector Hermite integrator with block time-steps, and one Barnes-Hut treecode, which uses a 2nd order leapfrog integration scheme. The integration of the equations of motions for all codes is performed on the host CPU. We find that for N>512N > 512 particles the GPU outperforms the GRAPE-6Af, if some softening in the force calculation is accepted. Without softening and for very small integration time steps the GRAPE still outperforms the GPU. We conclude that modern GPUs offer an attractive alternative to GRAPE-6Af special purpose hardware. Using the same time-step criterion, the total energy of the NN-body system was conserved better than to one in 10610^6 on the GPU, only about an order of magnitude worse than obtained with GRAPE-6Af. For N \apgt 10^5 the 8800GTX outperforms the host CPU by a factor of about 100 and runs at about the same speed as the GRAPE-6Af.Comment: Accepted for publication in New Astronom

    BrainFrame: A node-level heterogeneous accelerator platform for neuron simulations

    Full text link
    Objective: The advent of High-Performance Computing (HPC) in recent years has led to its increasing use in brain study through computational models. The scale and complexity of such models are constantly increasing, leading to challenging computational requirements. Even though modern HPC platforms can often deal with such challenges, the vast diversity of the modeling field does not permit for a single acceleration (or homogeneous) platform to effectively address the complete array of modeling requirements. Approach: In this paper we propose and build BrainFrame, a heterogeneous acceleration platform, incorporating three distinct acceleration technologies, a Dataflow Engine, a Xeon Phi and a GP-GPU. The PyNN framework is also integrated into the platform. As a challenging proof of concept, we analyze the performance of BrainFrame on different instances of a state-of-the-art neuron model, modeling the Inferior- Olivary Nucleus using a biophysically-meaningful, extended Hodgkin-Huxley representation. The model instances take into account not only the neuronal- network dimensions but also different network-connectivity circumstances that can drastically change application workload characteristics. Main results: The synthetic approach of three HPC technologies demonstrated that BrainFrame is better able to cope with the modeling diversity encountered. Our performance analysis shows clearly that the model directly affect performance and all three technologies are required to cope with all the model use cases.Comment: 16 pages, 18 figures, 5 table

    Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code

    Get PDF
    Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance. In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented
    corecore