711 research outputs found
A software-based self test of CUDA Fermi GPUs
Nowadays, Graphical Processing Units (GPUs) have become increasingly popular due to their high computational power and low prices. This makes them particularly suitable for high-performance computing applications, like data elaboration and financial computation. In these fields, high efficient test methodologies are mandatory. One of the most effective ways to detect and localize hardware faults in GPUs is a Software-Based-Self-Test methodology (SBST). In this paper a fully comprehensive SBST and fault localization methodology for GPUs is presented. This novel approach exploits different custom test strategies for each component inside the GPU architecture. Such strategies guarantee both permanent fault detection and accurate fault localization
Increasing the robustness of CUDA Fermi GPU-based systems
Nowadays Graphical processing Units (GPUs) have become increasingly popular due to their high computational power and low prices. This makes them particularly suitable for high-performance computing applications, like data elaboration and image processing. In these fields, the capability of properly work even in presence of faults is mandatory. This paper presents an innovative approach, that combines a Software Based Self Test & Diagnosis (SBSTD) methodology with a fault mitigation strategy, to increase the robustness of a CUDA Fermi GPU-based system
Computational Physics on Graphics Processing Units
The use of graphics processing units for scientific computations is an
emerging strategy that can significantly speed up various different algorithms.
In this review, we discuss advances made in the field of computational physics,
focusing on classical molecular dynamics, and on quantum simulations for
electronic structure calculations using the density functional theory, wave
function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012,
Helsinki, Finland, June 10-13, 201
An improved fault mitigation strategy for CUDA Fermi GPUs
High computation is a predominant requirement in many applications. In this field, Graphic Processing Units (GPUs) are more and more adopted. Low prices and high parallelism let GPUs be attractive, even in safety critical applications. Nonetheless, new methodologies must be studied and developed to increase the dependability of GPUs. This paper presents an improved fault mitigation strategy against permanent faults for CUDA Fermi GPUs. The proposed approach exploits the reverse engineering of the block scheduling policy in CUDA Fermi GPUs in order to minimize the fault mitigation timing overhead. The graceful performance degradation achieved by the proposed technique outperforms multithreaded CPU implementations and other fault mitigation strategies for CUDA GPU, even in presence of multiple permanent faults
Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes
The ongoing hardware evolution exhibits an escalation in the number, as well
as in the heterogeneity, of computing resources. The pressure to maintain
reasonable levels of performance and portability forces application developers
to leave the traditional programming paradigms and explore alternative
solutions. PaStiX is a parallel sparse direct solver, based on a dynamic
scheduler for modern hierarchical manycore architectures. In this paper, we
study the benefits and limits of replacing the highly specialized internal
scheduler of the PaStiX solver with two generic runtime systems: PaRSEC and
StarPU. The tasks graph of the factorization step is made available to the two
runtimes, providing them the opportunity to process and optimize its traversal
in order to maximize the algorithm efficiency for the targeted hardware
platform. A comparative study of the performance of the PaStiX solver on top of
its native internal scheduler, PaRSEC, and StarPU frameworks, on different
execution environments, is performed. The analysis highlights that these
generic task-based runtimes achieve comparable results to the
application-optimized embedded scheduler on homogeneous platforms. Furthermore,
they are able to significantly speed up the solver on heterogeneous
environments by taking advantage of the accelerators while hiding the
complexity of their efficient manipulation from the programmer.Comment: Heterogeneity in Computing Workshop (2014
GPU accelerated Nature Inspired Methods for Modelling Large Scale Bi-Directional Pedestrian Movement
Pedestrian movement, although ubiquitous and well-studied, is still not that
well understood due to the complicating nature of the embedded social dynamics.
Interest among researchers in simulating pedestrian movement and interactions
has grown significantly in part due to increased computational and
visualization capabilities afforded by high power computing. Different
approaches have been adopted to simulate pedestrian movement under various
circumstances and interactions. In the present work, bi-directional crowd
movement is simulated where an equal numbers of individuals try to reach the
opposite sides of an environment. Two movement methods are considered. First a
Least Effort Model (LEM) is investigated where agents try to take an optimal
path with as minimal changes from their intended path as possible. Following
this, a modified form of Ant Colony Optimization (ACO) is proposed, where
individuals are guided by a goal of reaching the other side in a least effort
mode as well as a pheromone trail left by predecessors. The basic idea is to
increase agent interaction, thereby more closely reflecting a real world
scenario. The methodology utilizes Graphics Processing Units (GPUs) for general
purpose computing using the CUDA platform. Because of the inherent parallel
properties associated with pedestrian movement such as proximate interactions
of individuals on a 2D grid, GPUs are well suited. The main feature of the
implementation undertaken here is that the parallelism is data driven. The data
driven implementation leads to a speedup up to 18x compared to its sequential
counterpart running on a single threaded CPU. The numbers of pedestrians
considered in the model ranged from 2K to 100K representing numbers typical of
mass gathering events. A detailed discussion addresses implementation
challenges faced and averted
Comparison of Different Parallel Implementations of the 2+1-Dimensional KPZ Model and the 3-Dimensional KMC Model
We show that efficient simulations of the Kardar-Parisi-Zhang interface
growth in 2 + 1 dimensions and of the 3-dimensional Kinetic Monte Carlo of
thermally activated diffusion can be realized both on GPUs and modern CPUs. In
this article we present results of different implementations on GPUs using CUDA
and OpenCL and also on CPUs using OpenCL and MPI. We investigate the runtime
and scaling behavior on different architectures to find optimal solutions for
solving current simulation problems in the field of statistical physics and
materials science.Comment: 14 pages, 8 figures, to be published in a forthcoming EPJST special
issue on "Computer simulations on GPU
- …