Search CORE

161 research outputs found

Static Validation of Barriers and Worksharing Constructs in OpenMP Applications

Author: P. Petersen
V. Basupalli
Y. Meng
Y. Zhang
Y.-J. Kim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

International audienceThe OpenMP specification requires that all threads in a team execute the same sequence of worksharing and barrier regions. An improper use of such directive may lead to deadlocks. In this paper we propose a static analysis to ensure this property is verified. The well-defined semantic of OpenMP programs makes compiler analysis more effective. We propose a new compile-time method to identify in OpenMP codes the potential improper uses of barriers and work-sharing constructs, and the execution paths that are responsible for these issues. We implemented our method in a GCC compiler plugin and show the small im-pact of our analysis on performance for NAS-OMP benchmarks and a test case for a production industrial code

Crossref

INRIA a CCSD electronic archive server

HAL-CEA

Recommended from our members

Symbolic Analysis of Concurrency Errors in OpenMP Programs

Author: Liao C.
Ma H.
Quinlan D.
Wang L.
Yang Z.
Publication venue: Lawrence Livermore National Laboratory
Publication date: 07/11/2012
Field of study

UNT Digital Library

Static Local Concurrency Errors Detection in MPI-RMA Programs

Author: Aitkaci Tassadit Célia
Barthou Denis
Saillard Emmanuelle
Sergent Marc
Publication venue: HAL CCSD
Publication date: 18/11/2022
Field of study

International audienceCommunications are a critical part of HPC simulations, and one of the main focuses of application developers when scaling on supercomputers. While classical message passing (also called two-sided communications) is the dominant communication paradigm, one-sided communications are often praised to be efficient to overlap communications with computations, but challenging to program. Their usage is then generally abstracted through languages and memory abstractions to ease programming (e.g. PGAS). Therefore, little work has been done to help programmers use intermediate runtime layers, such as MPI-RMA, that is often reserved to expert programmers. Indeed, programming with MPI-RMA presents several challenges that require handling the asynchronous nature of one-sided communications to ensure the proper semantics of the program while ensuring its memory consistency. To help programmers detect memory errors such as race conditions as early as possible, this paper proposes a new static analysis of MPI-RMA codes that shows to the programmer the errors that can be detected at compile time. The detection is based on a novel local concurrency errors detection algorithm that tracks accesses through BFS searches on the Control Flow Graphs of a program. We show on several tests and an MPI-RMA variant of the GUPS benchmark that the static analysis allows to detect such errors on user codes. The error codes are integrated in the MPI Bugs Initiative opensource test suite

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Analyse statique/dynamique pour la validation et l'amélioration des applications parallèles multi-modèles

Author: Saillard Emmanuelle
Publication venue: HAL CCSD
Publication date: 24/09/2015
Field of study

Supercomputing plays an important role in several innovative fields, speeding up prototyping or validating scientific theories. However, supercomputers are evolving rapidly with now millions of processing units, posing the questions of their programmability. Despite the emergence of more widespread and functional parallel programming models, developing correct and effective parallel applications still remains a complex task. Although debugging solutions have emerged to address this issue, they often come with restrictions. However programming model evolutions stress the requirement for a convenient validation tool able to handle hybrid applications. Indeed as current scientific applications mainly rely on the Message Passing Interface (MPI) parallel programming model, new hardwares designed for Exascale with higher node-level parallelism clearly advocate for an MPI+X solutions with X a thread-based model such as OpenMP. But integrating two different programming models inside the same application can be error-prone leading to complex bugs - mostly detected unfortunately at runtime. In an MPI+X program not only the correctness of MPI should be ensured but also its interactions with the multi-threaded model, for example identical MPI collective operations cannot be performed by multiple nonsynchronized threads. This thesis aims at developing a combination of static and dynamic analysis to enable an early verification of hybrid HPC applications. The first pass statically verifies the thread level required by an MPI+OpenMP application and outlines execution paths leading to potential deadlocks. Thanks to this analysis, the code is selectively instrumented, displaying an error and synchronously interrupting all processes if the actual scheduling leads to a deadlock situation.L’utilisation du parallélisme des architectures actuelles dans le domaine du calcul hautes performances, oblige à recourir à différents langages parallèles. Ainsi, l’utilisation conjointe de MPI pour le parallélisme gros grain, à mémoire distribuée et OpenMP pour du parallélisme de thread, fait partie des pratiques de développement d’applications pour supercalculateurs. Des erreurs, liées à l’utilisation conjointe de ces langages de parallélisme, sont actuellement difficiles à détecter et cela limite l’écriture de codes, permettant des interactions plus poussées entre ces niveaux de parallélisme. Des outils ont été proposés afin de palier ce problème. Cependant, ces outils sont généralement focalisés sur un type de modèle et permettent une vérification dite statique (à la compilation) ou dynamique (à l’exécution). Pourtant une combinaison statique/- dynamique donnerait des informations plus pertinentes. En effet, le compilateur est en mesure de donner des informations relatives au comportement général du code, indépendamment du jeu d’entrée. C’est par exemple le cas des problèmes liés aux communications collectives du modèle MPI. Cette thèse a pour objectif de développer des analyses statiques/dynamiques permettant la vérification d’une application parallèle mélangeant plusieurs modèles de programmation, afin de diriger les développeurs vers un code parallèle multi-modèles correct et performant. La vérification se fait en deux étapes. Premièrement, de potentielles erreurs sont détectées lors de la phase de compilation. Ensuite, un test au runtime est ajouté pour savoir si le problème va réellement se produire. Grâce à ces analyses combinées, nous renvoyons des messages précis aux utilisateurs et évitons les situations de blocage

Thèses en Ligne

Theses.fr

Towards A Quasi High Level Compiler Comparative and Attributive Model for OpenMP Programs

Author: Mokbel Mohammed F.
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2010
Field of study

In order to understand the behavior of OpenMP programs, special tools and adaptive techniques are needed for performance analysis. However, these tools provide low level profile information at the assembly and functions boundaries via instrumentation at the binary or code level, which are very hard to interpret. Hence, this thesis proposes a new model for OpenMP enabled compilers that assesses the performance differences in well defined formulations by dividing OpenMP program conditions into four distinct states which account for all the possible cases that an OpenMP program can take. An improved version of the standard performance metrics is proposed: speedup, overhead and efficiency based on the model categorization that is state\u27s aware. Moreover, an algorithmic approach to find patterns between OpenMP compilers is proposed which is verified along with the model formulations experimentally. Finally, the thesis reveals the mathematical model behind the optimum performance for any OpenMP program

Scholarship at UWindsor

Recommended from our members

Stable Multithreading: A New Paradigm for Reliable and Secure Threads

Author: Cui Heming
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2015
Field of study

Multi threaded programs have become pervasive and critical due to the rise of the multi core hardware and the accelerating computational demand. Unfortunately, despite decades of research and engineering effort, these programs remain notoriously difficult to get right, and they are plagued with harmful concurrency bugs that can cause wrong outputs, program crashes, security breaches, and so on. Our research reveals that a root cause of this difficulty is that multithreaded programs have too many possible thread interleavings (or schedules) at runtime. Even given only a single input, a program may run into a great number of schedules, depending on factors such as hardware timing and OS scheduling. Considering all inputs, the number of schedules is even much greater. It is extremely challenging to understand, test, analyze, or verify this huge number of schedules for a multi threaded program and make sure that all these schedules are free of concurrency bugs. Thus, multi threaded programs are extremely difficult to get right. To reduce the number of possible schedules for all inputs, we looked into the relation between inputs and schedules of real-world programs, and made an exciting discovery: many programs need only a small set of schedules to efficiently process a wide range of inputs! Leveraging this discovery, we have proposed a new idea called Stable Multithreading (or StableMT) that reuses each schedule on a wide range of inputs, greatly reducing the number of possible schedules for all inputs. By addressing the root cause that makes multithreading difficult to get right, StableMT makes understanding, testing, analyzing, and verification of multithreaded programs much easier. To realize StableMT, we have built three StableMT systems, TERN, PEREGRINE, and PARROT, with each addressing a distinct research challenge. Evaluation on a wide range of 108 popular multithreaded programs with our latest StableMT system, PARROT, shows that StableMT is simple, fast, and deployable. All PARROT's source code, entire benchmarks, and raw evaluation results are available at http://github.com/columbia/smt-mc. To encourage deployment, we have applied StableMT to improve several reliability techniques, including: (1) making reproducing real world concurrency bugs much easier; (2) greatly improving the precision of static program analysis, leading to the detection of several new harmful data races in heavily tested programs; and (3) greatly increasing the coverage of model checking, a systematic testing technique, by many orders of magnitudes. StableMT has attracted the research community's interests, and some techniques and ideas in our StableMT systems have been leveraged by other researchers to compute a small set of schedules to cover all or most inputs for multi threaded programs

Columbia University Academic Commons

A toolchain to verify the parallelization of OmpSs-2 applications

Author: Ayguadé Parra Eduard
Beltran Querol Vicenç
Economo Simone
Royuela Alcázar Sara
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Programming models for task-based parallelization based on compile-time directives are very effective at uncovering the parallelism available in HPC applications. Despite that, the process of correctly annotating complex applications is error-prone and may hinder the general adoption of these models. In this paper, we target the OmpSs-2 programming model and present a novel toolchain able to detect parallelization errors coming from non-compliant OmpSs-2 applications. Our toolchain verifies the compliance with the OmpSs-2 programming model using local task analysis to deal with each task separately, and structural induction to extend the analysis to the whole program. To improve the effectiveness of our tools, we also introduce some ad-hoc verification annotations, which can be used manually or automatically to disable the analysis of specific code regions. Experiments run on a sample of representative kernels and applications show that our toolchain can be successfully used to verify the parallelization of complex real-world applications.This project is supported by the European Union’s Horizon 2021 research and innovation programme under grant agreement No 754304 (DEEP-EST), by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871669 (AMPERE) and the Project HPCEUROPA3 (INFRAIA-2016-1-730897), by the Ministry of Economy of Spain through the Severo Ochoa Center of Excellence Program (SEV-2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), and by the Generalitat de Catalunya (2017-SGR-1481).Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Rapid Parallelization by Collaboration

Author: Kaiser Gail E.
Sethumadhavan Lakshminarasimhan
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

The widespread adoption of Chip Multiprocessors has renewed the emphasis on the use of parallelism to improve performance. The present and growing diversity in hardware architectures and software environments, however, continues to pose difficulties in the effective use of parallelism thus delaying a quick and smooth transition to the concurrency era. In this document, we describe the research being conducted at the Computer Science Department at Columbia University on a system called COMPASS that aims to simplify this transition by providing advice to programmers considering parallelizing their code. The advice proffered to the programmer is based on the wisdom collected from programmers who have already parallelized some code. The utility of COMPASS rests, not only on its ability to collect the wisdom unintrusively but also on its ability to automatically seek, find and synthesize this wisdom into advice that is tailored to the code the user is considering parallelizing and to the environment in which the optimized program will execute in. COMPASS provides a platform and an extensible framework for sharing human expertise about code parallelization -- widely and on diverse hardware and software. By leveraging the "Wisdom of Crowds" model which has been conjunctured to scale exponentially and which has successfully worked for Wikis, COMPASS aims to enable rapid parallelization of code and thus continue to extend the benefits for Moore's law scaling to science and society

CiteSeerX

Columbia University Academic Commons

Investigating tools and techniques for improving software performance on multiprocessor computer systems

Author: Tristram Waide Barrington
Publication venue: Faculty of Science, Computer Science
Publication date
Field of study

The availability of modern commodity multicore processors and multiprocessor computer systems has resulted in the widespread adoption of parallel computers in a variety of environments, ranging from the home to workstation and server environments in particular. Unfortunately, parallel programming is harder and requires more expertise than the traditional sequential programming model. The variety of tools and parallel programming models available to the programmer further complicates the issue. The primary goal of this research was to identify and describe a selection of parallel programming tools and techniques to aid novice parallel programmers in the process of developing efficient parallel C/C++ programs for the Linux platform. This was achieved by highlighting and describing the key concepts and hardware factors that affect parallel programming, providing a brief survey of commonly available software development tools and parallel programming models and libraries, and presenting structured approaches to software performance tuning and parallel programming. Finally, the performance of several parallel programming models and libraries was investigated, along with the programming effort required to implement solutions using the respective models. A quantitative research methodology was applied to the investigation of the performance and programming effort associated with the selected parallel programming models and libraries, which included automatic parallelisation by the compiler, Boost Threads, Cilk Plus, OpenMP, POSIX threads (Pthreads), and Threading Building Blocks (TBB). Additionally, the performance of the GNU C/C++ and Intel C/C++ compilers was examined. The results revealed that the choice of parallel programming model or library is dependent on the type of problem being solved and that there is no overall best choice for all classes of problem. However, the results also indicate that parallel programming models with higher levels of abstraction require less programming effort and provide similar performance compared to explicit threading models. The principle conclusion was that the problem analysis and parallel design are an important factor in the selection of the parallel programming model and tools, but that models with higher levels of abstractions, such as OpenMP and Threading Building Blocks, are favoured

South East Academic Libraries System (SEALS)