3,940 research outputs found

    Deductive Verification of Parallel Programs Using Why3

    Full text link
    The Message Passing Interface specification (MPI) defines a portable message-passing API used to program parallel computers. MPI programs manifest a number of challenges on what concerns correctness: sent and expected values in communications may not match, resulting in incorrect computations possibly leading to crashes; and programs may deadlock resulting in wasted resources. Existing tools are not completely satisfactory: model-checking does not scale with the number of processes; testing techniques wastes resources and are highly dependent on the quality of the test set. As an alternative, we present a prototype for a type-based approach to programming and verifying MPI like programs against protocols. Protocols are written in a dependent type language designed so as to capture the most common primitives in MPI, incorporating, in addition, a form of primitive recursion and collective choice. Protocols are then translated into Why3, a deductive software verification tool. Source code, in turn, is written in WhyML, the language of the Why3 platform, and checked against the protocol. Programs that pass verification are guaranteed to be communication safe and free from deadlocks. We verified several parallel programs from textbooks using our approach, and report on the outcome.Comment: In Proceedings ICE 2015, arXiv:1508.0459

    POSH: Paris OpenSHMEM: A High-Performance OpenSHMEM Implementation for Shared Memory Systems

    Get PDF
    In this paper we present the design and implementation of POSH, an Open-Source implementation of the OpenSHMEM standard. We present a model for its communications, and prove some properties on the memory model defined in the OpenSHMEM specification. We present some performance measurements of the communication library featured by POSH and compare them with an existing one-sided communication library. POSH can be downloaded from \url{http://www.lipn.fr/~coti/POSH}. % 9 - 67Comment: This is an extended version (featuring the full proofs) of a paper accepted at ICCS'1

    Exascale Deep Learning for Climate Analytics

    Full text link
    We extract pixel-level masks of extreme weather patterns using variants of Tiramisu and DeepLabv3+ neural networks. We describe improvements to the software frameworks, input pipeline, and the network training algorithms necessary to efficiently scale deep learning on the Piz Daint and Summit systems. The Tiramisu network scales to 5300 P100 GPUs with a sustained throughput of 21.0 PF/s and parallel efficiency of 79.0%. DeepLabv3+ scales up to 27360 V100 GPUs with a sustained throughput of 325.8 PF/s and a parallel efficiency of 90.7% in single precision. By taking advantage of the FP16 Tensor Cores, a half-precision version of the DeepLabv3+ network achieves a peak and sustained throughput of 1.13 EF/s and 999.0 PF/s respectively.Comment: 12 pages, 5 tables, 4, figures, Super Computing Conference November 11-16, 2018, Dallas, TX, US

    Teaching Parallel Programming Using Java

    Full text link
    This paper presents an overview of the "Applied Parallel Computing" course taught to final year Software Engineering undergraduate students in Spring 2014 at NUST, Pakistan. The main objective of the course was to introduce practical parallel programming tools and techniques for shared and distributed memory concurrent systems. A unique aspect of the course was that Java was used as the principle programming language. The course was divided into three sections. The first section covered parallel programming techniques for shared memory systems that include multicore and Symmetric Multi-Processor (SMP) systems. In this section, Java threads was taught as a viable programming API for such systems. The second section was dedicated to parallel programming tools meant for distributed memory systems including clusters and network of computers. We used MPJ Express-a Java MPI library-for conducting programming assignments and lab work for this section. The third and the final section covered advanced topics including the MapReduce programming model using Hadoop and the General Purpose Computing on Graphics Processing Units (GPGPU).Comment: 8 Pages, 6 figures, MPJ Express, MPI Java, Teaching Parallel Programmin

    Quantum to classical transition via fuzzy measurements on high gain spontaneous parametric down-conversion

    Full text link
    We consider the high gain spontaneous parametric down-conversion in a non collinear geometry as a paradigmatic scenario to investigate the quantum-to-classical transition by increasing the pump power, that is, the average number of generated photons. The possibility of observing quantum correlations in such macroscopic quantum system through dichotomic measurement will be analyzed by addressing two different measurement schemes, based on different dichotomization processes. More specifically, we will investigate the persistence of non-locality in an increasing size n/2-spin singlet state by studying the change in the correlations form as nn increases, both in the ideal case and in presence of losses. We observe a fast decrease in the amount of Bell's inequality violation for increasing system size. This theoretical analysis is supported by the experimental observation of macro-macro correlations with an average number of photons of about 10^3. Our results enlighten the practical extreme difficulty of observing non-locality by performing such a dichotomic fuzzy measurement.Comment: 15 pages, 18 figure

    PARCOACH: Combining static and dynamic validation of MPI collective communications

    Get PDF
    International audienceNowadays most scientific applications are parallelized based on MPI communications. Collective MPI communications have to be executed in the same order by all processes in their communicator and the same number of times, otherwise it is not conforming to the standard and a deadlock or other undefined behavior can occur. As soon as the control-flow involving these collective operations becomes more complex, in particular including conditionals on process ranks, ensuring the correction of such code is error-prone. We propose in this paper a static analysis to detect when such situation occurs, combined with a code transformation that prevents from deadlocking. We focus on blocking MPI collective operations in SPMD applications, assuming MPI calls are not nested in multithreaded regions. We show on several benchmarks the small impact on performance and the ease of integration of our techniques in the development process
    • …
    corecore