Search CORE

5 research outputs found

Acceleration of Large-Scale Electronic Structure Simulations with Heterogeneous Parallel Computing

Author: Kwon Oh-Kyoung
Ryu Hoon
Publication venue: 'IntechOpen'
Publication date: 05/11/2018
Field of study

Large-scale electronic structure simulations coupled to an empirical modeling approach are critical as they present a robust way to predict various quantum phenomena in realistically sized nanoscale structures that are hard to be handled with density functional theory. For tight-binding (TB) simulations of electronic structures that normally involve multimillion atomic systems for a direct comparison to experimentally realizable nanoscale materials and devices, we show that graphical processing unit (GPU) devices help in saving computing costs in terms of time and energy consumption. With a short introduction of the major numerical method adopted for TB simulations of electronic structures, this work presents a detailed description for the strategies to drive performance enhancement with GPU devices against traditional clusters of multicore processors. While this work only uses TB electronic structure simulations for benchmark tests, it can be also utilized as a practical guideline to enhance performance of numerical operations that involve large-scale sparse matrices

IntechOpen

Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform

Author: D Chen
E Cuthill
Hai Xiang Lin
J Willcock
JW Choi
L Buatois
M Belgin
N Bell
S Williams
Shiming Xu
Wei Xue
Y Saad
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In this article, we discuss the performance modeling and optimization of Sparse Matrix-Vector Multiplication (SpMV) on NVIDIA GPUs using CUDA. SpMV has a very low computation-data ratio and its performance is mainly bound by the memory bandwidth. We propose optimization of SpMV based on ELLPACK from two aspects: (1) enhanced performance for the dense vector by reducing cache misses, and (2) reduce accessed matrix data by index reduction. With matrix bandwidth reduction techniques, both cache usage enhancement and index compression can be enabled. For GPU with better cache support, we propose differentiated memory access scheme to avoid contamination of caches by matrix data. Performance evaluation shows that the combined speedups of proposed optimizations for GT-200 are 16% (single-precision) and 12.6% (double-precision) for GT-200 GPU, and 19% (single-precision) and 15% (double-precision) for GF-100 GPU.Delft Institute of Applied MathematicsElectrical Engineering, Mathematics and Computer Scienc

Crossref

TU Delft Repository

Springer - Publisher Connector

Tilburg University Repository

MASSIVELY PARALLEL OIL RESERVOIR SIMULATION FOR HISTORY MATCHING

Author
Publication venue
Publication date
Field of study

KFUPM ePrints

MASSIVELY PARALLEL OIL RESERVOIR SIMULATION FOR HISTORY MATCHING

Author
Publication venue
Publication date
Field of study