15 research outputs found
Phase-TA: Periodicity Detection and Characterization for HPC Applications
International audienceThe world of High-Performance Computing (HPC) currently stands on the edge of the ExaScale. The supercomputers are growing ever more powerful, requiring power-efficient components and ever smarter tool-suites to operate them. One of the key features of those frameworks will be their ability to monitor and predict the behavior of executed applications to optimize resources utilization, and abide by the operating constraints, notably on power consumption. In this context, this article presents Phase-TA, an offline tool which detects and characterizes the inherent periodicities of iterative HPC applications, with no prior knowledge of the latter. To do so, it analyzes the evolution of several performance counters at the scale of the compute node, and infers patterns representing the identified periodicities. As a result, Phase-TA offers a nonintrusive mean to gain insights on the processor use associated with an application, and paves the way to predicting its behavior. Phase-TA was tested on a panel of 3 applications and benchmarks from the supercomputing field: HPCG, NEMO, and OpenFoam. For all of them, periodicities, accountable for on average 78% of their execution time, were detected and represented by accurate patterns. Furthermore, it was demonstrated that there is no need to analyze the whole profile of an application to precisely characterize its periodic behaviors. Indeed, an extract of the aforementioned profile is enough for Phase-TA to infer representative patterns on-the-fly, opening the way to energyefficiency optimization through Dynamic Voltage-Frequency Scaling (DVFS)
Stock price reaction to profit warnings: The role of time-varying betas
This study investigates the role of time-varying betas, event-induced variance and conditional heteroskedasticity in the estimation of abnormal returns around important news announcements. Our analysis is based on the stock price reaction to profit warnings issued by a sample of firms listed on the Hong Kong Stock Exchange. The standard event study methodology indicates the presence of price reversal patterns following both positive and negative warnings. However, incorporating time-varying betas, event-induced variance and conditional heteroskedasticity in the modelling process results in post-negative-warning price patterns that are consistent with the predictions of the efficient market hypothesis. These adjustments also cause the statistical significance of some post-positive-warning cumulative abnormal returns to disappear and their magnitude to drop to an extent that minor transaction costs would eliminate the profitability of the contrarian strategy
Une Ă©tude empirique des performances des applications OpenMP sur les plateformes multi-coeurs
Current architectures of multicore machines are becoming increasingly complex due to hierarchical designs. Consequently, to achieve better performance stability, reproducibility and predictability requires a deep understanding of the interactions between multi-threaded applications and the underlying hardware. In this thesis, we study two important aspects for the performance of multi-threaded applications. We show that performance stability is an important criteria to consider in the process of performance evaluation, and thread placement is an effective technique in termes of program performance stability and improvement. We first study the variability of program execution times, defining a rigourous performance evaluation protocol, and analysing the reasons of such variability and its implications for program performance measurement. Then, we study the relation between the inter-thread data sharing and thread placement strategies on hierarchical machines. We consider various strategies where the same placement is applied for the whole execution of the program. While some of them rely on the characteristics of the application, others are not. We also present other thread placement strategies that allow thread migrations in order to exploit data sharing during different program phases.Les architectures des machines multi-coeurs actuelles deviennent de plus en plus complexes à cause du modèle de conception hiérarchique adopté. Par conséquent, assurer une meilleure stabilité, reproductibilité et prédictibilité des performances sur ces machines nécessite une compréhension approfondie des interactions qui existent entre les applications multi-threads et le matériel sous-jacent. Dans cette thèse, nous étudions deux aspects importants pour les performances des applications multi-threads. Nous montrons que la stabilité des performances est un critère important à considérer dans le processus d'évaluation des performances, et que le placement des threads est une technique efficace en termes de stabilité et d'amélioration des performances des programmes. Nous commençons par étudier la variabilité des temps d'exécution des programmes, nous définissons un protocole rigoureux d'évaluation des performances, puis nous analysons les raisons de cette variabilité et ses implications pour la mesure des performances. Ensuite, nous étudions la relation entre le partage des données entre threads et les stratégies de placement des threads sur machines hiérarchiques. Nous considérons plusieurs stratégies où le même placement est appliqué pour toute la durée d'exécution du programme. Alors que certaines reposent sur les caractéristiques des applications, d'autres non. Nous présentons aussi d'autres stratégies de placement des threads autorisant la migration des threads afin d'exploiter le partage des données au cours des différentes phases d'un programme
Improving Power Efficiency Through Fine-Grain Performance Monitoring in HPC Clusters
International audienceNowadays, power and energy consumption are of paramount importance. Further, reaching the Exascale target will not be possible in the short term without major breakthroughs in software and hardware technologies to meet power consumption constraints. In this context, this papers discusses the design and implementation of a system-wide tool to monitor, analyze and control power/energy consumption in HPC clusters. We developed a lightweight tool that relies on a fine-grain sampling of two CPU performance metrics: instructions throughput (IPC) and last level cache bandwidth. Thanks to the information provided by these metrics about hardware resources' activity, and using DVFS to control power/performance, we show that it is possible to achieve up to 16% energy savings at the cost of less than 3% performance degradation on real HPC applications
Une Ă©tude empirique des performances des applications OpenMP sur les plateformes multi-coeurs
Dans cette thèse, nous étudions deux aspects importants pour les performances des applications multi-threads. Nous montrons que la stabilité des performances est une métrique importante dans le processus d'évaluation des performances, et que le placement des threads est une technique efficace en termes de stabilité et d'amélioration des performances. Nous commençons par étudier la variabilité des temps d'exécution des programmes, nous analysons les raisons de cette variabilité et ses implications pour la mesure des performances. Ensuite, nous étudions la relation entre le partage des données entre threads et les stratégies de placement des threads sur machines hiérarchiques. Nous considérons plusieurs stratégies où le même placement est appliqué pour toute la durée d'exécution du programme. Alors que certaines reposent sur les caractéristiques des applications, d'autres non. Nous présentons aussi d'autres stratégies de placement des threads autorisant la migration des threads.In this thesis, we study two important aspects for the performance of multi-threaded applications. We show that performance stability is an important metric in the process of performance evaluation, and thread placement is an effective technique in termes of program performance stability and improvement. We first study the variability of program execution times, analysing the reasons of such variability and its implications for program performance measurement. Then, we study the relation between the inter-thread data sharing and thread placement strategies on hierarchical machines. We consider various strategies where the same placement is applied for the whole execution of the program. While some of them rely on the characteristics of the application, others are not. We also present other thread placement strategies that allow thread migrations.VERSAILLES-BU Sciences et IUT (786462101) / SudocSudocFranceF
Study of Variations of Native Program Execution Times on Multi-Core Architectures
International audienceProgram performance optimisations, feedback-directed iterative compilation and auto-tuning systems all assume a fixed estimation of execution time given a fixed input data for the program. However, in practice we observe non-negligible program performance variations on hardware platforms. While these variations are insignificant for sequential applications, we show that parallel native OpenMP programs have less performance stability. This article does not try to quantify nor to qualify the factors influencing the variations of program execution times, that we let for a future work. This article demonstrates three observations: 1) The performance variations of sequential applications is insignificant. 2) OpenMP program execution times on multi-core platforms show important variations. 3) The distribution of the execution times is not a Gaussian distribution in almost all cases. We finish by a discussion explaining why considering the minimal or the mean execution time within a sample of experiments is not the best estimation of program performance
Analysing the Variability of OpenMP Programs Performances on Multicore Architectures
International audienceIn [8], we demonstrated that contrary to sequential applications, parallel OpenMP appli- cations su er from a severe instability in performances. That is, running the same parallel OpenMP application with the same data input multiple times may exhibit a high variability of execution times. In this article, we continue our research e ort to analyse the reason of such performance variability. With the architectural complexity of the new state of the art hardware designs, comes a need to better understand the interactions between the operating system layers, the applications and the underlying hardware platforms. The ability to characterise and to quantify those interactions can be useful in the processes of performance evaluation and analysis, compiler optimisations and operating system job scheduling allowing to achieve better performance stability, reproducibility and predictability. Under- standing the performance instability in current multicore architectures is even more complicated by the variety of factors and sources in uencing the applications performances. This article focus on the e ects of thread binding, co-running processes, L2 cache sharing, automatic hardware prefetcher and memory page sizes
Performance evaluation and analysis of thread pinning strategies on multi-core platforms: Case study of SPEC OMP applications on intel architectures
International audienceWith the introduction of multi-core processors, thread affinity has quickly appeared to be one of the most important factors to accelerate program execution times. The current article presents a complete experimental study on the performance of various thread pinning strategies. We investigate four application independent thread pinning strategies and five application sensitive ones based on cache sharing. We made extensive performance evaluation on three different multi-core machines reflecting three usual utilisation: workstation machine, server machine and high performance machine. In overall, we show that fixing thread affinities (whatever the tested strategy) is a better choice for improving program performance on HPC ccNUMA machines compared to OS-based thread placement. This means that the current Linux OS scheduling strategy is not necessarily the best choice in terms of performance on ccNUMA machines, even if it is a good choice in terms of cores usage ratio and work balancing. On smaller Core2 and Nehalem machines, we show that the benefit of thread pinning is not satisfactory in terms of speedups versus OS-based scheduling, but the performance stability is much better
Measuring and Analysing the Variations of Program Execution Times on Multicore Platforms: Case Study
The recent growth in the number of precessing units in today's multicore processor architectures enables multiple threads to execute simultanesiouly achieving better performances by exploiting thread level parallelism. With the architectural complexity of these new state of the art designs, comes a need to better understand the interactions between the operating system layers, the applications and the underlying hardware platforms. The ability to characterise and to quantify those interactions can be useful in the processes of performance evaluation and analysis, compiler optimisations and operating system job scheduling allowing to achieve better performance stability, reproducibility and predictability. We consider in our study performances instability as variations in program execution times. While these variations are statistically insignificant for large sequential applications, we observe that parallel native OpenMP programs have less performance stability. Understanding the performance instability in current multicore architectures is even more complicated by the variety of factors and sources influencing the applications performances.L'accroissement des unités de calculs dans les nouvelles architectures des processeurs multicoeurs permet à plusieurs processus de s'exécuter simultanément afin d'obtenir des meilleures performances en exploitant un parallélisme de tâches. Avec la croissante complexité de ce nouveau type d'architectures, il est primordial de bien comprendre les interactions qui existent entre les couches du système d'exploitation, les applications et l'architecture matérielle. L'habilité de bien caractériser et de quantifier ces interactions peut être utile dans les processus d'évaluation et d'analyse des performances, des optimisations de code appliquées par le compilateur et pour l'ordonnanceur de tâches du système d'exploitation. Une bonne compréhension de ces interactions peut conduire à une meilleure stabilité, reproductibilité et prédictibilité des performances. Nous considérons dans notre étude que l'instabilité des performances est la variabilité dans les temps d'exécution des programmes. Bien que ces variations sont insignifiantes pour les applications séquentielles, nous avons observé que les programmes parallèles écrits avec le standard OpenMP ont moins de stabilité dans les performances. Comprendre cette instabilité dans le cadre des architectures multicoeurs est rendu encore plus compliqué par la variété des facteurs et des sources influençant les performances des applications