323,475 research outputs found

    Localizing Defects in Multithreaded Programs by Mining Dynamic Call Graphs

    Get PDF
    Writing multithreaded software for multicore computers confronts many developers with the difficulty of finding parallel programming errors. In the past, most parallel debugging techniques have concentrated on finding race conditions due to wrong usage of synchronization constructs. A widely unexplored issue, however, is that a wrong usage of non-parallel programming constructs may also cause wrong parallel application behavior. This paper presents a novel defect-localization technique for multithreaded shared-memory programs that is based on analyzing execution anomalies. Compared to race detectors that report just on wrong synchronization, this method can detect a wider range of defects affecting parallel execution. It works on a condensed representation of the call graphs of multithreaded applications and employs data-mining techniques to locate a method containing a defect. Our results from controlled application experiments show that we found race conditions, but also other programming errors leading to incorrect parallel program behavior. On average, our approach reduced in our benchmark the amount of code to be inspected to just 7.1% of all methods

    Performance comparison of parallel programming environments for implementing AIAC algorithms

    No full text
    International audienceAIAC algorithms (Asynchronous Iterations Asynchronous Communications) are a particular class of parallel iterative algorithms. Their asynchronous nature makes them more efficient than their synchronous counterparts in numerous cases as has already been shown in previous works. The first goal of this article is to compare several parallel programming environments in order to see if there is one of them which is best suited to efficiently implement AIAC algorithms. The main criterion for this comparison consists in the performances achieved in a global context of grid computing for two classical scientific problems. Nevertheless, we also take into account two secondary criteria which are the ease of programming and the ease of deployment. The second goal of this study is to extract from this comparison the important features that a parallel programming environment must have in order to be suited for the implementation of AIAC algorithms

    Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

    Get PDF
    Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model

    Realtime system control by means of path expressions

    Get PDF
    A high-level, algebraic programming method for the online control of actions in a real-time, parallel processing environment is described. The method is based on the interaction of path expressions. On the basis of a set of path expressions, evocation of actions can be controlled in real-time in a fully automated way. It is shown how intelligent system behavior can be obtained by a combination of rules given as path expressions. Each of these rules specifies some partial behavior to which the system must comply. The control system operates as a rule-based action planning system that works online in an asynchronous environment. The development of the prototype system PBOS (Path Based Operating System) has demonstrated that path expression specification naturally integrates with a real-time, multitasking control syste

    Safe Parallelism: Compiler Analysis Techniques for Ada and OpenMP

    Get PDF
    There is a growing need to support parallel computation in Ada to cope with the performance requirements of the most advanced functionalities of safety-critical systems. In that regard, the use of parallel programming models is paramount to exploit the benefits of parallelism. Recent works motivate the use of OpenMP for being a de facto standard in high-performance computing for programming shared memory architectures. These works address two important aspects towards the introduction of OpenMP in Ada: the compatibility of the OpenMP syntax with the Ada language, and the interoperability of the OpenMP and the Ada runtimes, demonstrating that OpenMP complements and supports the structured parallelism approach of the tasklet model. This paper addresses a third fundamental aspect: functional safety from a compiler perspective. Particularly, it focuses on race conditions and considers the fine-grain and unstructured capabilities of OpenMP. Hereof, this paper presents a new compiler analysis technique that: (1) identifies potential race conditions in parallel Ada programs based on OpenMP or Ada tasks or both, and (2) provides solutions for the detected races.This work was supported by the Spanish Ministry of Science and Innovation under contract TIN2015-65316-P, and by the FCT (Portuguese Foundation for Science and Technology) within the CISTER Research Unit (CEC/04234).Peer ReviewedPostprint (author's final draft

    Using GPI-2 for Distributed Memory Paralleliziation of the Caffe Toolbox to Speed up Deep Neural Network Training

    Full text link
    Deep Neural Network (DNN) are currently of great inter- est in research and application. The training of these net- works is a compute intensive and time consuming task. To reduce training times to a bearable amount at reasonable cost we extend the popular Caffe toolbox for DNN with an efficient distributed memory communication pattern. To achieve good scalability we emphasize the overlap of computation and communication and prefer fine granu- lar synchronization patterns over global barriers. To im- plement these communication patterns we rely on the the Global address space Programming Interface version 2 (GPI-2) communication library. This interface provides a light-weight set of asynchronous one-sided communica- tion primitives supplemented by non-blocking fine gran- ular data synchronization mechanisms. Therefore, Caf- feGPI is the name of our parallel version of Caffe. First benchmarks demonstrate better scaling behavior com- pared with other extensions, e.g., the Intel TM Caffe. Even within a single symmetric multiprocessing machine with four graphics processing units, the CaffeGPI scales bet- ter than the standard Caffe toolbox. These first results demonstrate that the use of standard High Performance Computing (HPC) hardware is a valid cost saving ap- proach to train large DDNs. I/O is an other bottleneck to work with DDNs in a standard parallel HPC setting, which we will consider in more detail in a forthcoming paper
    corecore