81 research outputs found

    MPI Thread-Level Checking for MPI+OpenMP Applications

    Get PDF
    International audienceMPI is the most widely used parallel programming model. But the reducing amount of memory per compute core tends to push MPI to be mixed with shared-memory approaches like OpenMP. In such cases, the interoperability of those two models is challenging. The MPI 2.0 standard defines the so-called thread level to indicate how MPI will interact with threads. But even if hybrid programs are more common, there is still a lack in debugging tools and more precisely in thread level compliance. To fill this gap, we propose a static analysis to verify the thread-level required by an application. This work extends PARCOACH, a GCC plugin focused on the detection of MPI collective errors in MPI and MPI+OpenMP programs. We validated our analysis on computational benchmarks and applications and measured a low overhead

    Analyse statique/dynamique pour la validation et l'amélioration des applications parallèles multi-modèles

    Get PDF
    Supercomputing plays an important role in several innovative fields, speeding up prototyping or validating scientific theories. However, supercomputers are evolving rapidly with now millions of processing units, posing the questions of their programmability. Despite the emergence of more widespread and functional parallel programming models, developing correct and effective parallel applications still remains a complex task. Although debugging solutions have emerged to address this issue, they often come with restrictions. However programming model evolutions stress the requirement for a convenient validation tool able to handle hybrid applications. Indeed as current scientific applications mainly rely on the Message Passing Interface (MPI) parallel programming model, new hardwares designed for Exascale with higher node-level parallelism clearly advocate for an MPI+X solutions with X a thread-based model such as OpenMP. But integrating two different programming models inside the same application can be error-prone leading to complex bugs - mostly detected unfortunately at runtime. In an MPI+X program not only the correctness of MPI should be ensured but also its interactions with the multi-threaded model, for example identical MPI collective operations cannot be performed by multiple nonsynchronized threads. This thesis aims at developing a combination of static and dynamic analysis to enable an early verification of hybrid HPC applications. The first pass statically verifies the thread level required by an MPI+OpenMP application and outlines execution paths leading to potential deadlocks. Thanks to this analysis, the code is selectively instrumented, displaying an error and synchronously interrupting all processes if the actual scheduling leads to a deadlock situation.L’utilisation du parallélisme des architectures actuelles dans le domaine du calcul hautes performances, oblige à recourir à différents langages parallèles. Ainsi, l’utilisation conjointe de MPI pour le parallélisme gros grain, à mémoire distribuée et OpenMP pour du parallélisme de thread, fait partie des pratiques de développement d’applications pour supercalculateurs. Des erreurs, liées à l’utilisation conjointe de ces langages de parallélisme, sont actuellement difficiles à détecter et cela limite l’écriture de codes, permettant des interactions plus poussées entre ces niveaux de parallélisme. Des outils ont été proposés afin de palier ce problème. Cependant, ces outils sont généralement focalisés sur un type de modèle et permettent une vérification dite statique (à la compilation) ou dynamique (à l’exécution). Pourtant une combinaison statique/- dynamique donnerait des informations plus pertinentes. En effet, le compilateur est en mesure de donner des informations relatives au comportement général du code, indépendamment du jeu d’entrée. C’est par exemple le cas des problèmes liés aux communications collectives du modèle MPI. Cette thèse a pour objectif de développer des analyses statiques/dynamiques permettant la vérification d’une application parallèle mélangeant plusieurs modèles de programmation, afin de diriger les développeurs vers un code parallèle multi-modèles correct et performant. La vérification se fait en deux étapes. Premièrement, de potentielles erreurs sont détectées lors de la phase de compilation. Ensuite, un test au runtime est ajouté pour savoir si le problème va réellement se produire. Grâce à ces analyses combinées, nous renvoyons des messages précis aux utilisateurs et évitons les situations de blocage

    Master of Science

    Get PDF
    thesisConcurrent programs are extremely important for efficiently programming future HPC systems. Large scientific programs may employ multiple processes or threads to run on HPC systems for days. Reliability is an essential requirement of existing concurrent programs. Therefore, verification of concurrent programs becomes increasingly important. Today we have two significant challenges in developing concurrent program verification tools: The first is scalability. Since new types of concurrent programs keep being created, verification tools need to scale to handle all these new types of programs. The second is providing formal coverage guarantee. Dynamic verification tools always face a huge schedule space. Both these capabilities must exist for testing programs that follow multiple concurrency models. Most current dynamic verification tools can only explore either thread level or process level schedules. Consequently, they fail to verify hybrid programs. Exploring mixed process and thread level schedules is not an ideal solution because the state space will grow exponentially in both levels. It is hard to systematically traverse these mixed schedules. Therefore, our approach is to determinize all concurrent APIs except one API whose schedules will then be explored. To improve search efficiency, we proposed a random-walk based heuristic algorithm. We observed many concurrent programs and concluded some common structures of them. Based on the existence of these structures, we can make dynamic verification tools focusing on specific regions and bypassing regions of less interest. We propose a random sampling of executions in the regions of less interest

    LLOV: A Fast Static Data-Race Checker for OpenMP Programs

    Get PDF
    In the era of Exascale computing, writing efficient parallel programs is indispensable and at the same time, writing sound parallel programs is highly difficult. While parallel programming is easier with frameworks such as OpenMP, the possibility of data races in these programs still persists. In this paper, we propose a fast, lightweight, language agnostic, and static data race checker for OpenMP programs based on the LLVM compiler framework. We compare our tool with other state-of-the-art data race checkers on a variety of well-established benchmarks. We show that the precision, accuracy, and the F1 score of our tool is comparable to other checkers while being orders of magnitude faster. To the best of our knowledge, this work is the only tool among the state-of-the-art data race checkers that can verify a FORTRAN program to be data race free

    LLOV: A Fast Static Data-Race Checker for OpenMP Programs

    Full text link
    In the era of Exascale computing, writing efficient parallel programs is indispensable and at the same time, writing sound parallel programs is very difficult. Specifying parallelism with frameworks such as OpenMP is relatively easy, but data races in these programs are an important source of bugs. In this paper, we propose LLOV, a fast, lightweight, language agnostic, and static data race checker for OpenMP programs based on the LLVM compiler framework. We compare LLOV with other state-of-the-art data race checkers on a variety of well-established benchmarks. We show that the precision, accuracy, and the F1 score of LLOV is comparable to other checkers while being orders of magnitude faster. To the best of our knowledge, LLOV is the only tool among the state-of-the-art data race checkers that can verify a C/C++ or FORTRAN program to be data race free.Comment: Accepted in ACM TACO, August 202

    Performance Optimization Strategies for Transactional Memory Applications

    Get PDF
    This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications

    Feedback Driven Annotation and Refactoring of Parallel Programs

    Get PDF

    Proceedings, MSVSCC 2018

    Get PDF
    Proceedings of the 12th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 19, 2018 at VMASC in Suffolk, Virginia. 155 pp

    OSCAR. A Noise Injection Framework for Testing Concurrent Software

    Get PDF
    “Moore’s Law” is a well-known observable phenomenon in computer science that describes a visible yearly pattern in processor’s die increase. Even though it has held true for the last 57 years, thermal limitations on how much a processor’s core frequencies can be increased, have led to physical limitations to their performance scaling. The industry has since then shifted towards multicore architectures, which offer much better and scalable performance, while in turn forcing programmers to adopt the concurrent programming paradigm when designing new software, if they wish to make use of this added performance. The use of this paradigm comes with the unfortunate downside of the sudden appearance of a plethora of additional errors in their programs, stemming directly from their (poor) use of concurrency techniques. Furthermore, these concurrent programs themselves are notoriously hard to design and to verify their correctness, with researchers continuously developing new, more effective and effi- cient methods of doing so. Noise injection, the theme of this dissertation, is one such method. It relies on the “probe effect” — the observable shift in the behaviour of concurrent programs upon the introduction of noise into their routines. The abandonment of ConTest, a popular proprietary and closed-source noise injection framework, for testing concurrent software written using the Java programming language, has left a void in the availability of noise injection frameworks for this programming language. To mitigate this void, this dissertation proposes OSCAR — a novel open-source noise injection framework for the Java programming language, relying on static bytecode instrumentation for injecting noise. OSCAR will provide a free and well-documented noise injection tool for research, pedagogical and industry usage. Additionally, we propose a novel taxonomy for categorizing new and existing noise injection heuristics, together with a new method for generating and analysing concurrent software traces, based on string comparison metrics. After noising programs from the IBM Concurrent Benchmark with different heuristics, we observed that OSCAR is highly effective in increasing the coverage of the interleaving space, and that the different heuristics provide diverse trade-offs on the cost and benefit (time/coverage) of the noise injection process.Resumo A “Lei de Moore” é um fenómeno, bem conhecido na área das ciências da computação, que descreve um padrão evidente no aumento anual da densidade de transístores num processador. Mesmo mantendo-se válido nos últimos 57 anos, o aumento do desempenho dos processadores continua garrotado pelas limitações térmicas inerentes `a subida da sua frequência de funciona- mento. Desde então, a industria transitou para arquiteturas multi núcleo, com significativamente melhor e mais escalável desempenho, mas obrigando os programadores a adotar o paradigma de programação concorrente ao desenhar os seus novos programas, para poderem aproveitar o desempenho adicional que advém do seu uso. O uso deste paradigma, no entanto, traz consigo, por consequência, a introdução de uma panóplia de novos erros nos programas, decorrentes diretamente da utilização (inadequada) de técnicas de programação concorrente. Adicionalmente, estes programas concorrentes são conhecidos por serem consideravelmente mais difíceis de desenhar e de validar, quanto ao seu correto funcionamento, incentivando investi- gadores ao desenvolvimento de novos métodos mais eficientes e eficazes de o fazerem. A injeção de ruído, o tema principal desta dissertação, é um destes métodos. Esta baseia-se no “efeito sonda” (do inglês “probe effect”) — caracterizado por uma mudança de comportamento observável em programas concorrentes, ao terem ruído introduzido nas suas rotinas. Com o abandono do Con- Test, uma framework popular, proprietária e de código fechado, de análise dinâmica de programas concorrentes através de injecção de ruído, escritos com recurso `a linguagem de programação Java, viu-se surgir um vazio na oferta de framework de injeção de ruído, para esta mesma linguagem. Para mitigar este vazio, esta dissertação propõe o OSCAR — uma nova framework de injeção de ruído, de código-aberto, para a linguagem de programação Java, que utiliza manipulação estática de bytecode para realizar a introdução de ruído. O OSCAR pretende oferecer uma ferramenta livre e bem documentada de injeção de ruído para fins de investigação, pedagógicos ou até para a indústria. Adicionalmente, a dissertação propõe uma nova taxonomia para categorizar os dife- rentes tipos de heurísticas de injecção de ruídos novos e existentes, juntamente com um método para gerar e analisar traces de programas concorrentes, com base em métricas de comparação de strings. Após inserir ruído em programas do IBM Concurrent Benchmark, com diversas heurísticas, ob- servámos que o OSCAR consegue aumentar significativamente a dimensão da cobertura do espaço de estados de programas concorrentes. Adicionalmente, verificou-se que diferentes heurísticas produzem um leque variado de prós e contras, especialmente em termos de eficácia versus eficiência
    corecore