18 research outputs found
Searching for the highest performance for simulation codes and scientific visualization
Cette thèse vise à démontrer que l'algorithmique et la programmation, dans un contexte de calcul haute performance (HPC), ne peuvent être envisagées sans tenir compte de l'architecture matérielle des supercalculateurs car cette dernière est régulièrement remise en cause.Après avoir rappelé quelques définitions relatives aux codes et au parallélisme, nous montrons que l'analyse des différentes générations de supercalculateurs, présents au CEA lors de ces 30 dernières années, permet de dégager des points de vigilances et des recommandations de bonnes pratiques en direction des développeurs de code.En se reposant sur plusieurs expériences, nous montrons comment viser une performance adaptée aux supercalculateurs et comment essayer d'atteindre la performance portable voire la performance extrême dans le monde du massivement parallèle, incluant ou non l'usage de GPU.Nous expliquons que les logiciels et matériels dédiés au dépouillement graphique des résultats de calcul suivent les mêmes principes de parallélisme que pour les grands codes scientifiques, impliquant de devoir maîtriser une vue globale de la chaîne de simulation. Enfin, nous montrons quelles sont les tendances et contraintes qui vont s'imposer à la conception des futurs supercalculateurs de classe exaflopique, impactant de fait le développement des prochaines générations de codes de calcul.This thesis aims to demonstrate that algorithms and coding, in a high performance computing (HPC) context, cannot be envisioned without taking into account the hardware at the core of supercomputers since those machines evolve dramatically over time. After setting a few definitions relating to scientific codes and parallelism, we show that the analysis of the different generations of supercomputer used at CEA over the past 30 years allows to exhibit a number of attention points and best practices toward code developers.Based on some experiments, we show how to aim at code performance suited to the usage of supercomputers, how to try to get portable performance and possibly extreme performance in the world of massive parallelism, potentially using GPUs.We explain that graphical post-processing software and hardware follow the same parallelism principles as large scientific codes, requiring to master a global view of the simulation chain.Last, we describe tendencies and constraints that will be forced on the new generations of exaflopic class supercomputers. These evolutions will, yet again, impact the development of the next generations of scientific codes
Computing element evolution towards Exascale and its impact on legacy simulation codes
In the light of the current race towards the Exascale, this article highlights the main features of the forthcoming computing elements that will be at the core of next generations of supercomputers. The market analysis, underlying this work, shows that computers are facing a major evolution in terms of architecture. As a consequence, it is important to understand the impacts of those evolutions on legacy codes or programming methods. The problems of dissipated power and memory access are discussed and will lead to a vision of what should be an exascale system. To survive, programming languages had to respond to the hardware evolutions either by evolving or with the creation of new ones. From the previous elements, we elaborate why vectorization, multithreading, data locality awareness and hybrid programming will be the key to reach the exascale, implying that it is time to start rewriting codes
Correction to “Low-frequency variability in the Southern Ocean region in a simplified coupled model”
A Simple Guideline for Code Optimizations on Modern Architectures with OpenACC and CUDA
International audienceLearn a simple strategy guideline to optimize applications runtime. The strategy is based on four steps and illustrated on a two-dimensional Discontinuous Galerkin solver for computational fluid dynamics on structured meshes. Starting from a CPU sequential code, we guide the audience through the different steps that allowed us to increase performances on a GPU around 149 times the original runtime of the code (performances evaluated on a K20Xm). The same optimization strategy is applied to the CPU code and increases performances around 35 times the original run time (performances evaluated on a E5-1650v3 processor). Finally, different hardware architectures (Xeon CPUs, GPUs, KNL) are benchmarked with the native CUDA implementation and one based on OpenACC
Towards a Unified CPU–GPU code hybridization: A GPU Based Optimization Strategy Efficient on Other Modern Architectures
International audienceIn this paper, we suggest a different methodology to shorten the code optimization development time while getting a unified code with good performance on different targeted devices. In the scope of this study, experiments are illustrated on a Discontinuous Galerkin code applied to Computational Fluid Dynamics. Tests are performed on CPUs, KNL Xeon-Phi and GPUs where performance comparison confirms that the GPU optimization guideline leads to efficient versions on CPU and Xeon-Phi for this kind of scientific applications. Based on these results, we finally suggest a methodology to end-up with an efficient hybridized CPU–GPU implementation