Search CORE

241,713 research outputs found

Molecular Dynamics Simulation of Macromolecules Using Graphics Processing Unit

Author: Ge W
Ge Wei
Li Jinghai
Ren Ying
Xu Ji
Yang Xiaozhen
Yu Xiang
Publication venue
Publication date: 01/01/2010
Field of study

Molecular dynamics (MD) simulation is a powerful computational tool to study the behavior of macromolecular systems. But many simulations of this field are limited in spatial or temporal scale by the available computational resource. In recent years, graphics processing unit (GPU) provides unprecedented computational power for scientific applications. Many MD algorithms suit with the multithread nature of GPU. In this paper, MD algorithms for macromolecular systems that run entirely on GPU are presented. Compared to the MD simulation with free software GROMACS on a single CPU core, our codes achieve about 10 times speed-up on a single GPU. For validation, we have performed MD simulations of polymer crystallization on GPU, and the results observed perfectly agree with computations on CPU. Therefore, our single GPU codes have already provided an inexpensive alternative for macromolecular simulations on traditional CPU clusters and they can also be used as a basis to develop parallel GPU programs to further speedup the computations.Comment: 21 pages, 16 figure

arXiv.org e-Print Archive

Institutional Repository of Institute of Process Engineering, CAS (IPE-IR）

Architecture-Aware Optimization on a 1600-core Graphics Processor

Author: Daga Mayank
Feng Wu-chun
Scogland Thomas R.W.
Publication venue
Publication date: 01/07/2011
Field of study

The graphics processing unit (GPU) continues to make significant strides as an accelerator in commodity cluster computing for high-performance computing (HPC). For example, three of the top five fastest supercomputers in the world, as ranked by the TOP500, employ GPUs as accelerators. Despite this increasing interest in GPUs, however, optimizing the performance of a GPU-accelerated compute node requires deep technical knowledge of the underlying architecture. Although significant literature exists on how to optimize GPU performance on the more mature NVIDIA CUDA architecture, the converse is true for OpenCL on the AMD GPU. Consequently, we present and evaluate architecture-aware optimizations for the AMD GPU. The most prominent optimizations include (i) explicit use of registers, (ii) use of vector types, (iii) removal of branches, and (iv) use of image memory for global data. We demonstrate the efficacy of our AMD GPU optimizations by applying each optimization in isolation as well as in concert to a large-scale, molecular modeling application called GEM. Via these AMD-specific GPU optimizations, the AMD Radeon HD 5870 GPU delivers 65% better performance than with the wellknown NVIDIA-specific optimizations

Computer Science Technical Reports @Virginia Tech

Three Dimensional Pseudo-Spectral Compressible Magnetohydrodynamic GPU Code for Astrophysical Plasma Simulation

Author: Ganesh Rajaraman
Maurya Udaya
Mukherjee Rupak
Saini Vinod
Sharma Bharatkumar
Vydyanathan Nagavijayalakshmi
Publication venue
Publication date: 30/10/2018
Field of study

This paper presents the benchmarking and scaling studies of a GPU accelerated three dimensional compressible magnetohydrodynamic code. The code is developed keeping an eye to explain the large and intermediate scale magnetic field generation is cosmos as well as in nuclear fusion reactors in the light of the theory given by Eugene Newman Parker. The spatial derivatives of the code are pseudo-spectral method based and the time solvers are explicit. GPU acceleration is achieved with minimal code changes through OpenACC parallelization and use of NVIDIA CUDA Fast Fourier Transform library (cuFFT). NVIDIAs unified memory is leveraged to enable over-subscription of the GPU device memory for seamless out-of-core processing of large grids. Our experimental results indicate that the GPU accelerated code is able to achieve upto two orders of magnitude speedup over a corresponding OpenMP parallel, FFTW library based code, on a NVIDIA Tesla P100 GPU. For large grids that require out-of-core processing on the GPU, we see a 7x speedup over the OpenMP, FFTW based code, on the Tesla P100 GPU. We also present performance analysis of the GPU accelerated code on different GPU architectures - Kepler, Pascal and Volta

arXiv.org e-Print Archive

Crossref

A Survey of Techniques for Improving Security of GPUs

Author: Abhinaya S. B.
Ali Irfan
Mittal Sparsh
Reddy Manish
Publication venue
Publication date: 01/01/2018
Field of study

Graphics processing unit (GPU), although a powerful performance-booster, also has many security vulnerabilities. Due to these, the GPU can act as a safe-haven for stealthy malware and the weakest `link' in the security `chain'. In this paper, we present a survey of techniques for analyzing and improving GPU security. We classify the works on key attributes to highlight their similarities and differences. More than informing users and researchers about GPU security techniques, this survey aims to increase their awareness about GPU security vulnerabilities and potential countermeasures

arXiv.org e-Print Archive

Crossref

Research Archive of Indian Institute of Technology Hyderabad

Gunrock: GPU Graph Analytics

Author: Davidson Andrew
Liu Weitang
Osama Muhammad
Owens John D.
Pan Yuechao
Riffel Andy T.
Wang Leyuan
Wang Yangzihao
Wu Yuduo
Yang Carl
Yuan Chenshan
Publication venue
Publication date: 04/01/2017
Field of study

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU

arXiv.org e-Print Archive

eScholarship - University of California

The Francis Crick Institute