3 research outputs found

    Power-aware Performance Tuning of GPU Applications Through Microbenchmarking

    Get PDF
    Tuning GPU applications is a very challenging task as any source-code optimization can sensibly impact performance, power, and energy consumption of the GPU device. Such an impact also depends on the GPU on which the application is run. This paper presents a suite of microbenchmarks that provides the actual characteristics of specific GPU device components (e.g., arithmetic instruction units, memories, etc.) in terms of throughput, power, and energy consumption. It shows how the suite can be combined to standard profiler information to efficiently drive the application tuning by considering the three design constraints (power, performance, energy consumption) and the characteristics of the target GPU device

    Traçage et profilage de systèmes hétérogènes

    Get PDF
    RÉSUMÉ : Les systèmes hétérogènes sont de plus en plus présents dans tous les ordinateurs. En effet, de nombreuses tâches nécessitent l’utilisation de coprocesseurs spécialisés. Ces coprocesseurs ont permis des gains de performance très importants qui ont mené à des découvertes scientifiques, notamment l’apprentissage profond qui n’est réapparu qu’avec l’arrivée de la programmation multiusage des processeurs graphiques. Ces coprocesseurs sont de plus en plus complexes. La collaboration et la cohabitation dans un même système de ces puces mènent à des comportements qui ne peuvent pas être prédits avec l’utilisation d’analyse statique. De plus, l’utilisation de systèmes parallèles qui possèdent des milliers de fils d’exécution, et de modèles de programmation spécialisés, rend la compréhension de tels systèmes très difficile. Ces problèmes de compréhension rendent non seulement la programmation plus lente, plus couteuse, mais empêchent aussi le diagnostic de problèmes de performance.----------ABSTRACT : Heterogeneous systems are becoming increasingly relevant and important with the emergence of powerful specialized coprocessors. Because of the nature of certain problems, like graphics display, deep learning and physics simulation, these devices have become a necessity. The power derived from their highly parallel or very specialized architecture is essential to meet the demands of these problems. Because these use cases are common on everyday devices like cellphones and computers, highly parallel coprocessors are added to these devices and collaborate with standard CPUs. The cooperation between these different coprocessors makes the system very difficult to analyze and understand. The highly parallel workload and specialized programming models make programming applications very difficult. Troubleshooting performance issues is even more complex. Since these systems communicate through many layers, the abstractions hide many performance defects

    High-Performance and Power-Aware Graph Processing on GPUs

    Get PDF
    Graphs are a common representation in many problem domains, including engineering, finance, medicine, and scientific applications. Different problems map to very large graphs, often involving millions of vertices. Even though very efficient sequential implementations of graph algorithms exist, they become impractical when applied on such actual very large graphs. On the other hand, graphics processing units (GPUs) have become widespread architectures as they provide massive parallelism at low cost. Parallel execution on GPUs may achieve speedup up to three orders of magnitude with respect to the sequential counterparts. Nevertheless, accelerating efficient and optimized sequential algorithms and porting (i.e., parallelizing) their implementation to such many-core architectures is a very challenging task. The task is made even harder since energy and power consumption are becoming constraints in addition, or in same case as an alternative, to performance. This work aims at developing a platform that provides (I) a library of parallel, efficient, and tunable implementations of the most important graph algorithms for GPUs, and (II) an advanced profiling model to analyze both performance and power consumption of the algorithm implementations. The platform goal is twofold. Through the library, it aims at saving developing effort in the parallelization task through a primitive-based approach. Through the profiling framework, it aims at customizing such primitives by considering both the architectural details and the target efficiency metrics (i.e., performance or power)
    corecore