Search CORE

222 research outputs found

VAMPIR: Visualization and Analysis of MPI Resources

Author: Arnold Alfred
Hoppe Hans-Christian
Nagel Wolfgang E.
Solchenbach Karl
Weber Michael
Publication venue: 'Forschungszentrum Julich, Zentralbibliothek'
Publication date: 04/02/2010
Field of study

Performance analysis most often is based on the detailed knowledge of program behavior. One option to get this information is tracing. Based on the research tool PARvis, the visualization environment VAMPIR was developed at KFA which now supports the new message passing standard MPI. VAMPIR translates a given trace file into a variety of graphical views, e.g., state diagrams, activity charts, time-line displays, and statistics. Moreover, it supports an animation mode that can help to locate performance bottlenecks, and it provides flexible filter operations to reduce the amount of information displayed. The most interesting part of VAMPIR is the powerful zooming feature that allows to identify problems at any level of detail

Sächsische Landesbibliothek - Staats- und Universitätsbibliothek Dresden (SLUB): Qucosa

Implementation-Oblivious Transparent Checkpoint-Restart for MPI

Author: Belyaev Leonid
Cooperman Gene
Jain Twinkle
Schafer Derek
Skjellum Anthony
Xu Yao
Publication venue
Publication date: 26/09/2023
Field of study

This work presents experience with traditional use cases of checkpointing on a novel platform. A single codebase (MANA) transparently checkpoints production workloads for major available MPI implementations: "develop once, run everywhere". The new platform enables application developers to compile their application against any of the available standards-compliant MPI implementations, and test each MPI implementation according to performance or other features.Comment: 17 pages, 4 figure

arXiv.org e-Print Archive

MESSAGE-PASSING INTERFACE (MPI)

Author: Anthony Skjellum
Ewing Lusk
MISSISSIPPI Mississippi State
William Gropp
Publication venue
Publication date: 12/02/2020
Field of study

CiteSeerX

High Performance Air Quality Simulation in the European CrossGrid Project

Author: Doallo Ramón
González Patricia
Martín María J.
Mouriño J. Carlos
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 30/01/2012
Field of study

This paper focuses on one of the applications involved into the CrossGrid project, the STEM-II air pollution model used to simulate the environment of As Pontes Power Plant in A Coruna (Spain). The CrossGrid project offers us a Grid environment oriented towards computation- and data-intensive applications that need interaction with an external user. The air pollution model needs the interaction of an expert in order to make decisions about modifications in the industrial process to fulfil the European standard on emissions and air quality. The benefits of using different CrossGrid components for running the application on a Grid infrastructure are shown in this paper, and some preliminary results on the CrossGrid testbed are displayed

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Automated Correctness Analysis of MPI Programs with Intel Message Checker

Author: DeSouza J.
Konovalov A.
Krukov V.
Kuhn B.
Samofalov V.
Zheltov S.
Publication venue: John von Neumann Institute for Computing
Publication date: 01/01/2006
Field of study

Juelich Shared Electronic Resources

Implementation of MPICH on top of MPLi̲te

Author: Selvarajan Shoba
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2002
Field of study

The goal of this thesis is to develop a new Channel Interface device for the MPICH implementation of the MPI (Message Passing Interface) standard using MPLi̲te. MPLi̲te is a lightweight message-passing library that is not a full MPI implementation, but offers high performance. MPICH (Message Passing Interface CHameleon) is a full implementation of the MPI standard that has the p4 library as the underlying communication device for TCP/IP networks. By integrating MPLi̲te as a Channel Interface device in MPICH, a parallel programmer can utilize the full MPI implementation of MPICH as well as the high bandwidth offered by MPLi̲te. There are several layers in the MPICH library where one can tie a new device. The Channel Interface is the lowest layer that requires very few functions to add a new device. By attaching MPLi̲te to MPICH at the lowest level, the Channel Interface, almost all of the performance of the MPLi̲te library can be delivered to the applications using MPICH. MPLi̲te can be implemented either as a blocking or a non-blocking Channel Interface device. The performance was measured on two separate test clusters, the PC and the Alpha mini-clusters, having Gigabit Ethernet connections. The PC cluster has two 1.8 GHz Pentium 4 PCs and the Alpha cluster has two 500 MHz Compaq DS20 workstations. Different network interface cards like Netgear, TrendNet and SysKonnect Gigabit Ethernet cards were used for the measurements. Both the blocking and non-blocking MPICH-MPLi̲te Channel Interface devices perform close to raw TCP, whereas a performance loss of 25-30% is seen in the MPICH-p4 Channel Interface device for larger messages. The superior performance offered by the MPICH-MPLi̲te device compared to the MPICH-p4 device can be easily seen on the SysKonnect cards using jumbo frames. The throughput curve also improves considerably by increasing the Eager/Rendezvous threshold

Digital Repository @ Iowa State University (ISU)

Scaling soft matter physics to thousands of graphics processing units in parallel

Author: Gray A.
Hart A.
Henrich O.
Stratford K.
Publication venue: 'SAGE Publications'
Publication date: 25/03/2015
Field of study

We describe a multi-graphics processing unit (GPU) implementation of the Ludwig application, which specialises in simulating a variety of complex fluids via lattice Boltzmann fluid dynamics coupled to additional physics describing complex fluid constituents. We describe our methodology in augmenting the original central processing unit (CPU) version with GPU functionality in a maintainable fashion. We present several optimisations that maximise performance on the GPU architecture through tuning for the GPU memory hierarchy. We describe how we implement particles within the fluid in such a way to avoid a major diversion of the CPU and GPU codebases, whilst minimising data transfer at each time step. We detail our halo-exchange communication phase for the code, which exploits overlapping to allow efficient parallel scaling to many GPUs. We present results showing that the application demonstrates excellent scaling to at least 8192 GPUs in parallel, the largest system tested at the time of writing. The GPU version (on NVIDIA K20X GPUs) is around 3.5-5 times faster that the CPU version (on fully utilised AMD Opteron 6274 16-core CPUs), comparing equal numbers of CPUs and GPUs

University of Strathclyde Institutional Repository

Edinburgh Research Explorer

Extending the Range of C-XSC: Some Tools and Applications for the use in Parallel and other Environments

Author: Grimmer Markus
Publication venue: Dagstuhl Seminar Proceedings. 08021 - Numerical Validation in Current Hardware Architectures
Publication date: 01/01/2008
Field of study

Dagstuhl Research Online Publication Server