581,208 research outputs found
PETRI NET BASED MODELING OF PARALLEL PROGRAMS EXECUTING ON DISTRIBUTED MEMORY MULTIPROCESSOR SYSTEMS
The development of parallel programs following the paradigm of communicating sequen-
tial processes to be executed on distributed memory multiprocessor systems is addressed.
The key issue in programming parallel machines today is to provide computerized tools
supporting the development of efficient parallel software, i.e. software effectively har-
nessing the power of parallel processing systems. The critical situations where a parallel
programmer needs help is in expressing a parallel algorithm in a programming language,
in getting a parallel program to work and in tuning it to get optimum performance (for
example speedup). .
We show that the Petri net formalism is higly suitable as a performance modeling
technique for asynchronous parallel systems, by introducing a model taking care of the
parallel program, parallel architecture and mapping influences on overall system perfor-
mance. PRM -net (Program-Resource- Mapping) models comprise a Petri net model of the
multiple flows of control in a parallel program, a Petri net model of the parallel hardware
and the process-to-processor mapping information into a single integrated performance
model. Automated analysis of PRM-net models addresses correctness and performance
of parallel programs mapped to parallel hardware. Questions upon the correctness of
parallel programs can be answered by investigating behavioural properties of Petri net
programs like liveness, reachability, boundedness, mutualy exclusiveness etc. Peformance
of parallel programs is usefully considered only in concern with a dedicated target hard-
ware. For this reason it is essential to integrate multiprocessor hardware characteristics
into the specification of a parallel program. The integration is done by assigning the
concurrent processes to physical processing devices and communication patterns among
parallel processes to communication media connecting processing elements yielding an in-
tegrated, Petri net based performance model. Evaluation of the integrated model applies
simulation and markovian analysis to derive expressions characterising the peformance of
the program being developed.
Synthesis and decomposition rules for hierarchical models naturally give raise to
use PRM-net models for graphical, performance oriented parallel programming, support-
ing top-down (stepwise refinement) as well as bottom-up development approaches. The
graphical representation of Petri net programs visualizes phenomena like parallelism, syn-
chronisation, communication, sequential and alternative execution. Modularity of pro-
gram blocks aids reusability, prototyping is promoted by automated code generation on
the basis of high level program specifications
A communication-ordered task graph allocation algorithm
technical reportThe inherently asynchronous nature of the data flow computation model allows the exploitation of maximum parallelism in program execution?? While this computational model holds great promise several problems must be solved in order to achieve a high degree of program performance?? The allocation and scheduling of programs on MIMD distributed memory parallel hardware is necessary for the implementation of e cient parallel systems?? Finding optimal solutions requires that maxi mum parallelism be achieved consistent with resource limits and minimizing communication costs and has been proven to be in the class of NP complete problems?? This paper addresses the problem of static allocation of tasks to distributed memory MIMD systems where simultaneous computation and communication is a factor?? This paper discusses similarities and di erences between several recent heuristic allocation approaches and identi es common problems inherent in these approaches?? This paper presents a new algorithm scheme and heuristics that resolves the identi ed problems and shows signi cant performance bene ts?
A communication-ordered task graph allocation algorithm
technical reportThe inherently asynchronous nature of the data flow computation model allows the exploitation of maximum parallelism in program execution. While this computational model holds great promise, several problems must be solved in order to achieve a high degree of program performance. The allocation and scheduling of programs on MIMD distributed memory parallel hardware, is necessary for the implementation of efficient parallel systems. Finding optimal solutions requires that maximum parallelism be achieved consistent with resource limits and minimizing communication costs, and has been proven to be in the class of NP-complete problems. This paper addresses the problem of static allocation of tasks to distributed memory MIMD systems where simultaneous computation and communication is a factor. This paper discusses similarities and differences between several recent heuristic allocation approaches and identifies common problems inherent in these approaches. This paper presents a new algorithm scheme and heuristics that resolves the identified problems and shows significant performance benefits
libcppa - Designing an Actor Semantic for C++11
Parallel hardware makes concurrency mandatory for efficient program
execution. However, writing concurrent software is both challenging and
error-prone. C++11 provides standard facilities for multiprogramming, such as
atomic operations with acquire/release semantics and RAII mutex locking, but
these primitives remain too low-level. Using them both correctly and
efficiently still requires expert knowledge and hand-crafting. The actor model
replaces implicit communication by sharing with an explicit message passing
mechanism. It applies to concurrency as well as distribution, and a lightweight
actor model implementation that schedules all actors in a properly
pre-dimensioned thread pool can outperform equivalent thread-based
applications. However, the actor model did not enter the domain of native
programming languages yet besides vendor-specific island solutions. With the
open source library libcppa, we want to combine the ability to build reliable
and distributed systems provided by the actor model with the performance and
resource-efficiency of C++11.Comment: 10 page
Hybrid performance modeling and prediction of large-scale computing systems
Performance is a key feature of large-scale computing systems. However, the achieved performance when a certain program is executed is significantly lower than the maximal theoretical performance of the large-scale computing system. The model-based performance evaluation may be used to support the performance-oriented program development for large-scale computing systems. In this paper we present a hybrid approach for performance modeling and prediction of parallel and distributed computing systems, which combines mathematical modeling and discrete-event simulation. We use mathematical modeling to develop parameterized performance models for components of the system. Thereafter, we use discrete-event simulation to describe the structure of system and the interaction among its components. As a result, we obtain a high-level performance model, which combines the evaluation speed of mathematical models with the structure awareness and fidelity of the simulation model. We evaluate empirically our approach with a real-world material science program that comprises more than 15,000 lines of codePeer ReviewedPostprint (published version
Verification of MPI programs using Spin
technical reportVerification of distributed systems is a complex yet important process. Concurrent systems are vulnerable to problems such as deadlock, starvation, and race conditions. Parallel programs written using the MPI (Message Passing Interface) Standard are no exception. Spin can be used to formally verify a parallel program if it is given an accurate model written is Spin's process meta language (Promela). In this paper, we describe a generalized framework for verification of MPI-based parallel programs using the Spin model checker. Only select MPI calls are covered, but this framework could potentially be extended to include all of the MPI Standard. Our reduced MPI implementation (written in Promela) is designed to follow the MPI Standard as well as allow for the flexibility provided in certain aspects (like buffering). We also present a few examples to illustrate the use of our MPI implementation in Promela
- …