Search CORE

4 research outputs found

Iterative Compilation and Performance Prediction for Numerical Applications

Author: Fursin Grigori G
Publication venue: University of Edinburgh. College of Science and Engineering. School of Informatics.
Publication date: 01/07/2004
Field of study

Institute for Computing Systems ArchitectureAs the current rate of improvement in processor performance far exceeds the rate of memory performance, memory latency is the dominant overhead in many performance critical applications. In many cases, automatic compiler-based approaches to improving memory performance are limited and programmers frequently resort to manual optimisation techniques. However, this process is tedious and time-consuming. Furthermore, a diverse range of a rapidly evolving hardware makes the optimisation process even more complex. It is often hard to predict the potential benefits from different optimisations and there are no simple criteria to stop optimisations i.e. when optimal memory performance has been achieved or sufficiently approached. This thesis presents a platform independent optimisation approach for numerical applications based on iterative feedback-directed program restructuring using a new reasonably fast and accurate performance prediction technique for guiding optimisations. New strategies for searching the optimisation space, by means of profiling to find the best possible program variant, have been developed. These strategies have been evaluated using a range of kernels and programs on different platforms and operating systems. A significant performance improvement has been achieved using new approaches when compared to the state-of-the-art native static and platform-specific feedback directed compilers

Edinburgh Research Archive

Simulations parallèles de Monte Carlo appliquées à la Physique des Hautes Energies pour plates-formes manycore et multicore : mise au point, optimisation, reproductibilité

Author: Schweitzer Pierre
Publication venue: HAL CCSD
Publication date: 19/10/2015
Field of study

During this thesis, we focused on High Performance Computing, specifically on Monte Carlo simulations applied to High Energy Physics. We worked on simulations dedicated to the propagation of particles through matter. Monte Carlo simulations require significant CPU time and memory footprint.Our first Monte Carlo simulation was taking more time to simulate the physical phenomenon than the said phenomenon required to happen in the experimental conditions. It raised a real performance issue. The minimal technical aim of the thesis was to have a simulation requiring as much time as the real observed phenomenon. Our maximal target was to have a much faster simulation. Indeed, these simulations are critical to asses our correct understanding of what is observed during experimentation. The more we have simulated statistics samples, the better are our results. This initial state of our simulation was allowing numerous perspectives regarding optimisation, and high performance computing. Furthermore, in our case, increasing the performance of the simulation was pointless if it was at the cost of losing results reproducibility. The numerical reproducibility of the simulation was then an aspect we had to take into account. In this manuscript, after a state of the art about profiling, optimisation and reproducibility, we proposed several strategies to gain more performance in our simulations. In each case, all the proposed optimisations followed a profiling step. One never optimises without having profiled first. Then, we looked at the design of a parallel profiler using aspect-oriented programming for our specific needs. Finally, we took a new look at the issues raised by our Monte Carlo simulations: instead of optimising existing simulations, we proposed methods for developing a new simulation from scratch, having in mind it is for High Performance Computing and it has to be statistically sound, reproducible and scalable. In all our proposals, we looked at both multicore and manycore architectures from Intel to benchmark the performance on server-oriented architecture and High Performance Computing oriented architecture.Through the implementation of our proposals, we were able to optimise one of the Monte Carlo simulations, permitting us to achieve a 400X speedup, once optimised and parallelised on a computing node with 32 physical cores. We were also able to implement a profiler with aspects, able to deal with the parallelism of its computer and of the application it profiles. Moreover, because it relies on aspects, it is portable and not tied to any specific architecture. Finally, we implemented the simulation designed to be reproducible, scalable and to have statistically sound results. We observed that these goals could be achieved, whatever the target architecture for execution. This enabled us to assess our method for validating the numerical reproducibility of a simulation.Lors de cette thèse, nous nous sommes focalisés sur le calcul à haute performance, dans le domaine très précis des simulations de Monte Carlo appliquées à la physique des hautes énergies, et plus particulièrement, aux simulations pour la propagation de particules dans un milieu. Les simulations de Monte Carlo sont des simulations particulièrement consommatrices en ressources, temps de calcul, capacité mémoire.Dans le cas précis sur lequel nous nous sommes penchés, la première simulation de Monte Carlo existante prenait plus de temps à simuler le phénomène physique que le phénomène lui-même n’en prenait pour se dérouler dans les conditions expérimentales. Cela posait donc un sévère problème de performance. L’objectif technique minimal était d’avoir une simulation prenant autant de temps que le phénomène réel observé, l’objectif maximal était d’avoir une simulation bien plus rapide. En effet, ces simulations sont importantes pour vérifier la bonne compréhension de ce qui est observé dans les conditions expérimentales. Plus nous disposons d’échantillons statistiques simulés, meilleurs sont les résultats. Cet état initial des simulations ouvrait donc de nombreuses perspectives d’un point de vue optimisation et calcul à haute performance. Par ailleurs, dans notre cas, le gain de performance étant proprement inutile s’il n’est pas accompagné d’une reproductibilité des résultats, la reproductibilité numérique de la simulation est de ce fait un aspect que nous devons prendre en compte.C’est ainsi que dans le cadre de cette thèse, après un état de l’art sur le profilage, l’optimisation et la reproductibilité, nous avons proposé plusieurs stratégies visant à obtenir plus de performances pour nos simulations. Dans tous les cas, les optimisations proposées étaient précédées d’un profilage. On n’optimise jamais sans avoir profilé. Par la suite, nous nous intéressés à la création d’un profileur parallèle en programmation orientée aspect pour nos besoins très spécifiques, enfin, nous avons considéré la problématique de nos simulations sous un angle nouveau : plutôt que d’optimiser une simulation existante, nous avons proposé des méthodes permettant d’en créer une nouvelle, très spécifique à notre domaine, qui soit d’emblée reproductible, statistiquement correcte et qui puisse passer à l’échelle. Dans toutes les propositions, de façon transverse, nous nous sommes intéressés aux architectures multicore et manycore d’Intel pour évaluer les performances à travers une architecture orientée serveur et une architecture orientée calcul à haute performance.Ainsi, grâce à la mise en application de nos propositions, nous avons pu optimiser une des simulations de Monte Carlo, nous permettant d’obtenir un gain de performance de l’ordre de 400X, une fois optimisée et parallélisée sur un nœud de calcul avec 32 cœurs physiques. De même, nous avons pu proposer l’implémentation d’un profileur, programmé à l’aide d’aspects et capable de gérer le parallélisme à la fois de la machine sur laquelle il est exécuté mais aussi de l’application qu’il profile. De plus, parce qu’il emploi les aspects, il est portable et n’est pas fixé à une architecture matérielle en particulier. Enfin, nous avons implémenté la simulation prévue pour être reproductible, performante et ayant des résultats statistiquement viables. Nous avons pu constater que ces objectifs étaient atteints quelle que soit l’architecture cible pour l’exécution. Cela nous a permis de valider notamment notre méthode de vérification de la reproductibilité numérique d’une simulation

Thèses en Ligne

HAL-IN2P3

HAL Clermont Université

Recommended from our members

Visualizing the Impact of the Cache on Program Execution

Author: Beyls Kristof
D'Hollander Erik H.
Yu Yijun
Publication venue
Publication date: 01/01/2004
Field of study

Cache behavior of a program has an ever-growing strong impact on its execution time. Characterizing the behavior by visible patterns is considered a way to pinpoint the bottleneck against performance. This paper presents a framework of visualization for trace distributions to extract the useful cache behavior patterns. We focus on cache misses, reuse distances, temporal or spatial localities, etc. The histograms of these distribution patterns measure the behavior in quantity, revealing effective program optimizations. The performance bottlenecks are exposed as hot spots highlighted in the source code, showing the exact locations to apply suitable optimizations. The impact of the source-level program optimizations, again, can be verified by the visualization tool

Open Research Online (The Open University)

Visualizing the impact of the cache on program execution

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

Crossref