161 research outputs found

    Static Validation of Barriers and Worksharing Constructs in OpenMP Applications

    Get PDF
    International audienceThe OpenMP specification requires that all threads in a team execute the same sequence of worksharing and barrier regions. An improper use of such directive may lead to deadlocks. In this paper we propose a static analysis to ensure this property is verified. The well-defined semantic of OpenMP programs makes compiler analysis more effective. We propose a new compile-time method to identify in OpenMP codes the potential improper uses of barriers and work-sharing constructs, and the execution paths that are responsible for these issues. We implemented our method in a GCC compiler plugin and show the small im-pact of our analysis on performance for NAS-OMP benchmarks and a test case for a production industrial code

    Static Local Concurrency Errors Detection in MPI-RMA Programs

    Get PDF
    International audienceCommunications are a critical part of HPC simulations, and one of the main focuses of application developers when scaling on supercomputers. While classical message passing (also called two-sided communications) is the dominant communication paradigm, one-sided communications are often praised to be efficient to overlap communications with computations, but challenging to program. Their usage is then generally abstracted through languages and memory abstractions to ease programming (e.g. PGAS). Therefore, little work has been done to help programmers use intermediate runtime layers, such as MPI-RMA, that is often reserved to expert programmers. Indeed, programming with MPI-RMA presents several challenges that require handling the asynchronous nature of one-sided communications to ensure the proper semantics of the program while ensuring its memory consistency. To help programmers detect memory errors such as race conditions as early as possible, this paper proposes a new static analysis of MPI-RMA codes that shows to the programmer the errors that can be detected at compile time. The detection is based on a novel local concurrency errors detection algorithm that tracks accesses through BFS searches on the Control Flow Graphs of a program. We show on several tests and an MPI-RMA variant of the GUPS benchmark that the static analysis allows to detect such errors on user codes. The error codes are integrated in the MPI Bugs Initiative opensource test suite

    Analyse statique/dynamique pour la validation et l'amélioration des applications parallèles multi-modèles

    Get PDF
    Supercomputing plays an important role in several innovative fields, speeding up prototyping or validating scientific theories. However, supercomputers are evolving rapidly with now millions of processing units, posing the questions of their programmability. Despite the emergence of more widespread and functional parallel programming models, developing correct and effective parallel applications still remains a complex task. Although debugging solutions have emerged to address this issue, they often come with restrictions. However programming model evolutions stress the requirement for a convenient validation tool able to handle hybrid applications. Indeed as current scientific applications mainly rely on the Message Passing Interface (MPI) parallel programming model, new hardwares designed for Exascale with higher node-level parallelism clearly advocate for an MPI+X solutions with X a thread-based model such as OpenMP. But integrating two different programming models inside the same application can be error-prone leading to complex bugs - mostly detected unfortunately at runtime. In an MPI+X program not only the correctness of MPI should be ensured but also its interactions with the multi-threaded model, for example identical MPI collective operations cannot be performed by multiple nonsynchronized threads. This thesis aims at developing a combination of static and dynamic analysis to enable an early verification of hybrid HPC applications. The first pass statically verifies the thread level required by an MPI+OpenMP application and outlines execution paths leading to potential deadlocks. Thanks to this analysis, the code is selectively instrumented, displaying an error and synchronously interrupting all processes if the actual scheduling leads to a deadlock situation.L’utilisation du parallélisme des architectures actuelles dans le domaine du calcul hautes performances, oblige à recourir à différents langages parallèles. Ainsi, l’utilisation conjointe de MPI pour le parallélisme gros grain, à mémoire distribuée et OpenMP pour du parallélisme de thread, fait partie des pratiques de développement d’applications pour supercalculateurs. Des erreurs, liées à l’utilisation conjointe de ces langages de parallélisme, sont actuellement difficiles à détecter et cela limite l’écriture de codes, permettant des interactions plus poussées entre ces niveaux de parallélisme. Des outils ont été proposés afin de palier ce problème. Cependant, ces outils sont généralement focalisés sur un type de modèle et permettent une vérification dite statique (à la compilation) ou dynamique (à l’exécution). Pourtant une combinaison statique/- dynamique donnerait des informations plus pertinentes. En effet, le compilateur est en mesure de donner des informations relatives au comportement général du code, indépendamment du jeu d’entrée. C’est par exemple le cas des problèmes liés aux communications collectives du modèle MPI. Cette thèse a pour objectif de développer des analyses statiques/dynamiques permettant la vérification d’une application parallèle mélangeant plusieurs modèles de programmation, afin de diriger les développeurs vers un code parallèle multi-modèles correct et performant. La vérification se fait en deux étapes. Premièrement, de potentielles erreurs sont détectées lors de la phase de compilation. Ensuite, un test au runtime est ajouté pour savoir si le problème va réellement se produire. Grâce à ces analyses combinées, nous renvoyons des messages précis aux utilisateurs et évitons les situations de blocage

    Towards A Quasi High Level Compiler Comparative and Attributive Model for OpenMP Programs

    Get PDF
    In order to understand the behavior of OpenMP programs, special tools and adaptive techniques are needed for performance analysis. However, these tools provide low level profile information at the assembly and functions boundaries via instrumentation at the binary or code level, which are very hard to interpret. Hence, this thesis proposes a new model for OpenMP enabled compilers that assesses the performance differences in well defined formulations by dividing OpenMP program conditions into four distinct states which account for all the possible cases that an OpenMP program can take. An improved version of the standard performance metrics is proposed: speedup, overhead and efficiency based on the model categorization that is state\u27s aware. Moreover, an algorithmic approach to find patterns between OpenMP compilers is proposed which is verified along with the model formulations experimentally. Finally, the thesis reveals the mathematical model behind the optimum performance for any OpenMP program

    A toolchain to verify the parallelization of OmpSs-2 applications

    Get PDF
    Programming models for task-based parallelization based on compile-time directives are very effective at uncovering the parallelism available in HPC applications. Despite that, the process of correctly annotating complex applications is error-prone and may hinder the general adoption of these models. In this paper, we target the OmpSs-2 programming model and present a novel toolchain able to detect parallelization errors coming from non-compliant OmpSs-2 applications. Our toolchain verifies the compliance with the OmpSs-2 programming model using local task analysis to deal with each task separately, and structural induction to extend the analysis to the whole program. To improve the effectiveness of our tools, we also introduce some ad-hoc verification annotations, which can be used manually or automatically to disable the analysis of specific code regions. Experiments run on a sample of representative kernels and applications show that our toolchain can be successfully used to verify the parallelization of complex real-world applications.This project is supported by the European Union’s Horizon 2021 research and innovation programme under grant agreement No 754304 (DEEP-EST), by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871669 (AMPERE) and the Project HPCEUROPA3 (INFRAIA-2016-1-730897), by the Ministry of Economy of Spain through the Severo Ochoa Center of Excellence Program (SEV-2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), and by the Generalitat de Catalunya (2017-SGR-1481).Peer ReviewedPostprint (author's final draft

    Rapid Parallelization by Collaboration

    Get PDF
    The widespread adoption of Chip Multiprocessors has renewed the emphasis on the use of parallelism to improve performance. The present and growing diversity in hardware architectures and software environments, however, continues to pose difficulties in the effective use of parallelism thus delaying a quick and smooth transition to the concurrency era. In this document, we describe the research being conducted at the Computer Science Department at Columbia University on a system called COMPASS that aims to simplify this transition by providing advice to programmers considering parallelizing their code. The advice proffered to the programmer is based on the wisdom collected from programmers who have already parallelized some code. The utility of COMPASS rests, not only on its ability to collect the wisdom unintrusively but also on its ability to automatically seek, find and synthesize this wisdom into advice that is tailored to the code the user is considering parallelizing and to the environment in which the optimized program will execute in. COMPASS provides a platform and an extensible framework for sharing human expertise about code parallelization -- widely and on diverse hardware and software. By leveraging the "Wisdom of Crowds" model which has been conjunctured to scale exponentially and which has successfully worked for Wikis, COMPASS aims to enable rapid parallelization of code and thus continue to extend the benefits for Moore's law scaling to science and society

    Investigating tools and techniques for improving software performance on multiprocessor computer systems

    Get PDF
    The availability of modern commodity multicore processors and multiprocessor computer systems has resulted in the widespread adoption of parallel computers in a variety of environments, ranging from the home to workstation and server environments in particular. Unfortunately, parallel programming is harder and requires more expertise than the traditional sequential programming model. The variety of tools and parallel programming models available to the programmer further complicates the issue. The primary goal of this research was to identify and describe a selection of parallel programming tools and techniques to aid novice parallel programmers in the process of developing efficient parallel C/C++ programs for the Linux platform. This was achieved by highlighting and describing the key concepts and hardware factors that affect parallel programming, providing a brief survey of commonly available software development tools and parallel programming models and libraries, and presenting structured approaches to software performance tuning and parallel programming. Finally, the performance of several parallel programming models and libraries was investigated, along with the programming effort required to implement solutions using the respective models. A quantitative research methodology was applied to the investigation of the performance and programming effort associated with the selected parallel programming models and libraries, which included automatic parallelisation by the compiler, Boost Threads, Cilk Plus, OpenMP, POSIX threads (Pthreads), and Threading Building Blocks (TBB). Additionally, the performance of the GNU C/C++ and Intel C/C++ compilers was examined. The results revealed that the choice of parallel programming model or library is dependent on the type of problem being solved and that there is no overall best choice for all classes of problem. However, the results also indicate that parallel programming models with higher levels of abstraction require less programming effort and provide similar performance compared to explicit threading models. The principle conclusion was that the problem analysis and parallel design are an important factor in the selection of the parallel programming model and tools, but that models with higher levels of abstractions, such as OpenMP and Threading Building Blocks, are favoured
    • …
    corecore