3,940 research outputs found
Deductive Verification of Parallel Programs Using Why3
The Message Passing Interface specification (MPI) defines a portable
message-passing API used to program parallel computers. MPI programs manifest a
number of challenges on what concerns correctness: sent and expected values in
communications may not match, resulting in incorrect computations possibly
leading to crashes; and programs may deadlock resulting in wasted resources.
Existing tools are not completely satisfactory: model-checking does not scale
with the number of processes; testing techniques wastes resources and are
highly dependent on the quality of the test set.
As an alternative, we present a prototype for a type-based approach to
programming and verifying MPI like programs against protocols. Protocols are
written in a dependent type language designed so as to capture the most common
primitives in MPI, incorporating, in addition, a form of primitive recursion
and collective choice. Protocols are then translated into Why3, a deductive
software verification tool. Source code, in turn, is written in WhyML, the
language of the Why3 platform, and checked against the protocol. Programs that
pass verification are guaranteed to be communication safe and free from
deadlocks.
We verified several parallel programs from textbooks using our approach, and
report on the outcome.Comment: In Proceedings ICE 2015, arXiv:1508.0459
POSH: Paris OpenSHMEM: A High-Performance OpenSHMEM Implementation for Shared Memory Systems
In this paper we present the design and implementation of POSH, an
Open-Source implementation of the OpenSHMEM standard. We present a model for
its communications, and prove some properties on the memory model defined in
the OpenSHMEM specification. We present some performance measurements of the
communication library featured by POSH and compare them with an existing
one-sided communication library. POSH can be downloaded from
\url{http://www.lipn.fr/~coti/POSH}. % 9 - 67Comment: This is an extended version (featuring the full proofs) of a paper
accepted at ICCS'1
Exascale Deep Learning for Climate Analytics
We extract pixel-level masks of extreme weather patterns using variants of
Tiramisu and DeepLabv3+ neural networks. We describe improvements to the
software frameworks, input pipeline, and the network training algorithms
necessary to efficiently scale deep learning on the Piz Daint and Summit
systems. The Tiramisu network scales to 5300 P100 GPUs with a sustained
throughput of 21.0 PF/s and parallel efficiency of 79.0%. DeepLabv3+ scales up
to 27360 V100 GPUs with a sustained throughput of 325.8 PF/s and a parallel
efficiency of 90.7% in single precision. By taking advantage of the FP16 Tensor
Cores, a half-precision version of the DeepLabv3+ network achieves a peak and
sustained throughput of 1.13 EF/s and 999.0 PF/s respectively.Comment: 12 pages, 5 tables, 4, figures, Super Computing Conference November
11-16, 2018, Dallas, TX, US
Teaching Parallel Programming Using Java
This paper presents an overview of the "Applied Parallel Computing" course
taught to final year Software Engineering undergraduate students in Spring 2014
at NUST, Pakistan. The main objective of the course was to introduce practical
parallel programming tools and techniques for shared and distributed memory
concurrent systems. A unique aspect of the course was that Java was used as the
principle programming language. The course was divided into three sections. The
first section covered parallel programming techniques for shared memory systems
that include multicore and Symmetric Multi-Processor (SMP) systems. In this
section, Java threads was taught as a viable programming API for such systems.
The second section was dedicated to parallel programming tools meant for
distributed memory systems including clusters and network of computers. We used
MPJ Express-a Java MPI library-for conducting programming assignments and lab
work for this section. The third and the final section covered advanced topics
including the MapReduce programming model using Hadoop and the General Purpose
Computing on Graphics Processing Units (GPGPU).Comment: 8 Pages, 6 figures, MPJ Express, MPI Java, Teaching Parallel
Programmin
Quantum to classical transition via fuzzy measurements on high gain spontaneous parametric down-conversion
We consider the high gain spontaneous parametric down-conversion in a non
collinear geometry as a paradigmatic scenario to investigate the
quantum-to-classical transition by increasing the pump power, that is, the
average number of generated photons. The possibility of observing quantum
correlations in such macroscopic quantum system through dichotomic measurement
will be analyzed by addressing two different measurement schemes, based on
different dichotomization processes. More specifically, we will investigate the
persistence of non-locality in an increasing size n/2-spin singlet state by
studying the change in the correlations form as increases, both in the
ideal case and in presence of losses. We observe a fast decrease in the amount
of Bell's inequality violation for increasing system size. This theoretical
analysis is supported by the experimental observation of macro-macro
correlations with an average number of photons of about 10^3. Our results
enlighten the practical extreme difficulty of observing non-locality by
performing such a dichotomic fuzzy measurement.Comment: 15 pages, 18 figure
PARCOACH: Combining static and dynamic validation of MPI collective communications
International audienceNowadays most scientific applications are parallelized based on MPI communications. Collective MPI communications have to be executed in the same order by all processes in their communicator and the same number of times, otherwise it is not conforming to the standard and a deadlock or other undefined behavior can occur. As soon as the control-flow involving these collective operations becomes more complex, in particular including conditionals on process ranks, ensuring the correction of such code is error-prone. We propose in this paper a static analysis to detect when such situation occurs, combined with a code transformation that prevents from deadlocking. We focus on blocking MPI collective operations in SPMD applications, assuming MPI calls are not nested in multithreaded regions. We show on several benchmarks the small impact on performance and the ease of integration of our techniques in the development process
- …