711 research outputs found

    A software-based self test of CUDA Fermi GPUs

    Get PDF
    Nowadays, Graphical Processing Units (GPUs) have become increasingly popular due to their high computational power and low prices. This makes them particularly suitable for high-performance computing applications, like data elaboration and financial computation. In these fields, high efficient test methodologies are mandatory. One of the most effective ways to detect and localize hardware faults in GPUs is a Software-Based-Self-Test methodology (SBST). In this paper a fully comprehensive SBST and fault localization methodology for GPUs is presented. This novel approach exploits different custom test strategies for each component inside the GPU architecture. Such strategies guarantee both permanent fault detection and accurate fault localization

    Increasing the robustness of CUDA Fermi GPU-based systems

    Get PDF
    Nowadays Graphical processing Units (GPUs) have become increasingly popular due to their high computational power and low prices. This makes them particularly suitable for high-performance computing applications, like data elaboration and image processing. In these fields, the capability of properly work even in presence of faults is mandatory. This paper presents an innovative approach, that combines a Software Based Self Test & Diagnosis (SBSTD) methodology with a fault mitigation strategy, to increase the robustness of a CUDA Fermi GPU-based system

    Computational Physics on Graphics Processing Units

    Full text link
    The use of graphics processing units for scientific computations is an emerging strategy that can significantly speed up various different algorithms. In this review, we discuss advances made in the field of computational physics, focusing on classical molecular dynamics, and on quantum simulations for electronic structure calculations using the density functional theory, wave function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 201

    An improved fault mitigation strategy for CUDA Fermi GPUs

    Get PDF
    High computation is a predominant requirement in many applications. In this field, Graphic Processing Units (GPUs) are more and more adopted. Low prices and high parallelism let GPUs be attractive, even in safety critical applications. Nonetheless, new methodologies must be studied and developed to increase the dependability of GPUs. This paper presents an improved fault mitigation strategy against permanent faults for CUDA Fermi GPUs. The proposed approach exploits the reverse engineering of the block scheduling policy in CUDA Fermi GPUs in order to minimize the fault mitigation timing overhead. The graceful performance degradation achieved by the proposed technique outperforms multithreaded CPU implementations and other fault mitigation strategies for CUDA GPU, even in presence of multiple permanent faults

    Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes

    Get PDF
    The ongoing hardware evolution exhibits an escalation in the number, as well as in the heterogeneity, of computing resources. The pressure to maintain reasonable levels of performance and portability forces application developers to leave the traditional programming paradigms and explore alternative solutions. PaStiX is a parallel sparse direct solver, based on a dynamic scheduler for modern hierarchical manycore architectures. In this paper, we study the benefits and limits of replacing the highly specialized internal scheduler of the PaStiX solver with two generic runtime systems: PaRSEC and StarPU. The tasks graph of the factorization step is made available to the two runtimes, providing them the opportunity to process and optimize its traversal in order to maximize the algorithm efficiency for the targeted hardware platform. A comparative study of the performance of the PaStiX solver on top of its native internal scheduler, PaRSEC, and StarPU frameworks, on different execution environments, is performed. The analysis highlights that these generic task-based runtimes achieve comparable results to the application-optimized embedded scheduler on homogeneous platforms. Furthermore, they are able to significantly speed up the solver on heterogeneous environments by taking advantage of the accelerators while hiding the complexity of their efficient manipulation from the programmer.Comment: Heterogeneity in Computing Workshop (2014

    GPU accelerated Nature Inspired Methods for Modelling Large Scale Bi-Directional Pedestrian Movement

    Full text link
    Pedestrian movement, although ubiquitous and well-studied, is still not that well understood due to the complicating nature of the embedded social dynamics. Interest among researchers in simulating pedestrian movement and interactions has grown significantly in part due to increased computational and visualization capabilities afforded by high power computing. Different approaches have been adopted to simulate pedestrian movement under various circumstances and interactions. In the present work, bi-directional crowd movement is simulated where an equal numbers of individuals try to reach the opposite sides of an environment. Two movement methods are considered. First a Least Effort Model (LEM) is investigated where agents try to take an optimal path with as minimal changes from their intended path as possible. Following this, a modified form of Ant Colony Optimization (ACO) is proposed, where individuals are guided by a goal of reaching the other side in a least effort mode as well as a pheromone trail left by predecessors. The basic idea is to increase agent interaction, thereby more closely reflecting a real world scenario. The methodology utilizes Graphics Processing Units (GPUs) for general purpose computing using the CUDA platform. Because of the inherent parallel properties associated with pedestrian movement such as proximate interactions of individuals on a 2D grid, GPUs are well suited. The main feature of the implementation undertaken here is that the parallelism is data driven. The data driven implementation leads to a speedup up to 18x compared to its sequential counterpart running on a single threaded CPU. The numbers of pedestrians considered in the model ranged from 2K to 100K representing numbers typical of mass gathering events. A detailed discussion addresses implementation challenges faced and averted

    Comparison of Different Parallel Implementations of the 2+1-Dimensional KPZ Model and the 3-Dimensional KMC Model

    Full text link
    We show that efficient simulations of the Kardar-Parisi-Zhang interface growth in 2 + 1 dimensions and of the 3-dimensional Kinetic Monte Carlo of thermally activated diffusion can be realized both on GPUs and modern CPUs. In this article we present results of different implementations on GPUs using CUDA and OpenCL and also on CPUs using OpenCL and MPI. We investigate the runtime and scaling behavior on different architectures to find optimal solutions for solving current simulation problems in the field of statistical physics and materials science.Comment: 14 pages, 8 figures, to be published in a forthcoming EPJST special issue on "Computer simulations on GPU
    • …
    corecore