Search CORE

25,228 research outputs found

PETRI NET BASED MODELING OF PARALLEL PROGRAMS EXECUTING ON DISTRIBUTED MEMORY MULTIPROCESSOR SYSTEMS

Author: Ferscha A.
Haring G.
Publication venue: Periodica Polytechnica Electrical Engineering (Archives)
Publication date: 01/01/1991
Field of study

The development of parallel programs following the paradigm of communicating sequen- tial processes to be executed on distributed memory multiprocessor systems is addressed. The key issue in programming parallel machines today is to provide computerized tools supporting the development of efficient parallel software, i.e. software effectively har- nessing the power of parallel processing systems. The critical situations where a parallel programmer needs help is in expressing a parallel algorithm in a programming language, in getting a parallel program to work and in tuning it to get optimum performance (for example speedup). . We show that the Petri net formalism is higly suitable as a performance modeling technique for asynchronous parallel systems, by introducing a model taking care of the parallel program, parallel architecture and mapping influences on overall system perfor- mance. PRM -net (Program-Resource- Mapping) models comprise a Petri net model of the multiple flows of control in a parallel program, a Petri net model of the parallel hardware and the process-to-processor mapping information into a single integrated performance model. Automated analysis of PRM-net models addresses correctness and performance of parallel programs mapped to parallel hardware. Questions upon the correctness of parallel programs can be answered by investigating behavioural properties of Petri net programs like liveness, reachability, boundedness, mutualy exclusiveness etc. Peformance of parallel programs is usefully considered only in concern with a dedicated target hard- ware. For this reason it is essential to integrate multiprocessor hardware characteristics into the specification of a parallel program. The integration is done by assigning the concurrent processes to physical processing devices and communication patterns among parallel processes to communication media connecting processing elements yielding an in- tegrated, Petri net based performance model. Evaluation of the integrated model applies simulation and markovian analysis to derive expressions characterising the peformance of the program being developed. Synthesis and decomposition rules for hierarchical models naturally give raise to use PRM-net models for graphical, performance oriented parallel programming, support- ing top-down (stepwise refinement) as well as bottom-up development approaches. The graphical representation of Petri net programs visualizes phenomena like parallelism, syn- chronisation, communication, sequential and alternative execution. Modularity of pro- gram blocks aids reusability, prototyping is promoted by automated code generation on the basis of high level program specifications

Periodica Polytechnica (Budapest University of Technology and Economics)

Performance Debugging and Tuning using an Instruction-Set Simulator

Author: Magnusson Peter S.
Montelius Johan
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/1997
Field of study

Instruction-set simulators allow programmers a detailed level of insight into, and control over, the execution of a program, including parallel programs and operating systems. In principle, instruction set simulation can model any target computer and gather any statistic. Furthermore, such simulators are usually portable, independent of compiler tools, and deterministic-allowing bugs to be recreated or measurements repeated. Though often viewed as being too slow for use as a general programming tool, in the last several years their performance has improved considerably. We describe SIMICS, an instruction set simulator of SPARC-based multiprocessors developed at SICS, in its rôle as a general programming tool. We discuss some of the benefits of using a tool such as SIMICS to support various tasks in software engineering, including debugging, testing, analysis, and performance tuning. We present in some detail two test cases, where we've used SimICS to support analysis and performance tuning of two applications, Penny and EQNTOTT. This work resulted in improved parallelism in, and understanding of, Penny, as well as a performance improvement for EQNTOTT of over a magnitude. We also present some early work on analyzing SPARC/Linux, demonstrating the ability of tools like SimICS to analyze operating systems

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

MaSiF: Machine learning guided auto-tuning of parallel skeletons

Author: Cole Murray
Collins Alexander
Fensch Christian
Leather Hugh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2013
Field of study

Heriot Watt Pure

A Case Study in Coordination Programming: Performance Evaluation of S-Net vs Intel's Concurrent Collections

Author: Gijsbers Bert
Grelck Clemens
Shafarenko Alex
Tveretina Olga
Zaichenkov Pavel
Publication venue
Publication date: 01/01/2014
Field of study

We present a programming methodology and runtime performance case study comparing the declarative data flow coordination language S-Net with Intel's Concurrent Collections (CnC). As a coordination language S-Net achieves a near-complete separation of concerns between sequential software components implemented in a separate algorithmic language and their parallel orchestration in an asynchronous data flow streaming network. We investigate the merits of S-Net and CnC with the help of a relevant and non-trivial linear algebra problem: tiled Cholesky decomposition. We describe two alternative S-Net implementations of tiled Cholesky factorization and compare them with two CnC implementations, one with explicit performance tuning and one without, that have previously been used to illustrate Intel CnC. Our experiments on a 48-core machine demonstrate that S-Net manages to outperform CnC on this problem.Comment: 9 pages, 8 figures, 1 table, accepted for PLC 2014 worksho

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

International Migration, Integration and Social Cohesion online publications