Search CORE

2,696 research outputs found

DART-MPI: An MPI-based Implementation of a PGAS Runtime System

Author: Fürlinger Karl
Glass Colin W.
Gracia José
Idrees Kamran
Mhedheb Yousri
Tao Jie
Zhou Huan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly simplifies the tasks of developing parallel applications, because no explicit communication has to be specified in the program for data exchange between different computing nodes. In this paper we present DART, a runtime environment, which implements the PGAS paradigm on large-scale high-performance computing clusters. A specific feature of our implementation is the use of one-sided communication of the Message Passing Interface (MPI) version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated the performance of the implementation with several low-level kernels in order to determine overheads and limitations in comparison to the underlying MPI-3.Comment: 11 pages, International Conference on Partitioned Global Address Space Programming Models (PGAS14

arXiv.org e-Print Archive

Crossref

A Time-Triggered Constraint-Based Calculus for Avionic Systems

Author: Beji Sofiene
Gherbi Abdelouahed
Hamadou Sardaouna
Mullins John
Publication venue
Publication date: 11/10/2014
Field of study

The Integrated Modular Avionics (IMA) architec- ture and the Time-Triggered Ethernet (TTEthernet) network have emerged as the key components of a typical architecture model for recent civil aircrafts. We propose a real-time constraint-based calculus targeted at the analysis of such concepts of avionic embedded systems. We show our framework at work on the modelisation of both the (IMA) architecture and the TTEthernet network, illustrating their behavior by the well-known Flight Management System (FMS)

arXiv.org e-Print Archive

Crossref

PolyPublie

TRACTABLE DATA-FLOW ANALYSIS FOR DISTRIBUTED SYSTEMS

Author: CHEUNG SC
KRAMER J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1994
Field of study

Automated behavior analysis is a valuable technique in the development and maintainence of distributed systems. In this paper, we present a tractable dataflow analysis technique for the detection of unreachable states and actions in distributed systems. The technique follows an approximate approach described by Reif and Smolka, but delivers a more accurate result in assessing unreachable states and actions. The higher accuracy is achieved by the use of two concepts: action dependency and history sets. Although the technique, does not exhaustively detect all possible errors, it detects nontrivial errors with a worst-case complexity quadratic to the system size. It can be automated and applied to systems with arbitrary loops and nondeterministic structures. The technique thus provides practical and tractable behavior analysis for preliminary designs of distributed systems. This makes it an ideal candidate for an interactive checker in software development tools. The technique is illustrated with case studies of a pump control system and an erroneous distributed program. Results from a prototype implementation are presented

Spiral - Imperial College Digital Repository

Hong Kong University of Science and Technology Institutional Repository

Requirements for implementing real-time control functional modules on a hierarchical parallel pipelined system

Author: Lumia Ronald
Michaloski John L.
Wheatley Thomas E.
Publication venue
Publication date
Field of study

Analysis of a robot control system leads to a broad range of processing requirements. One fundamental requirement of a robot control system is the necessity of a microcomputer system in order to provide sufficient processing capability.The use of multiple processors in a parallel architecture is beneficial for a number of reasons, including better cost performance, modular growth, increased reliability through replication, and flexibility for testing alternate control strategies via different partitioning. A survey of the progression from low level control synchronizing primitives to higher level communication tools is presented. The system communication and control mechanisms of existing robot control systems are compared to the hierarchical control model. The impact of this design methodology on the current robot control systems is explored

NASA Technical Reports Server

Performance Analysis and Modelling of Concurrent Multi-access Data Structures

Author: Atalar Aras
Rukundo Adones
Tsigas Philippas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

The major impediment to scaling concurrent data structures is memory contention when accessing shared data structure access-points, leading to thread serialisation, hindering parallelism. Aiming to address this challenge, significant amount of work in the literature has proposed multi-access techniques that improve concurrent data structure parallelism. However, there is little work on analysing and modelling the execution behaviour of concurrent multi-access data structures especially in a shared memory setting. In this paper, we analyse and model the general execution behaviour of concurrent multi-access data structures in the shared memory setting. We study and analyse the behaviour of the two popular random access patterns: shared (Remote) and exclusive (Local) access, and the behaviour of the two most commonly used atomic primitives for designing lock-free data structures: Compare and Swap, and, Fetch and Add. We model the concurrent multi-accesses by splitting the thread execution procedure into five logical sessions: i) side-work, ii) access-point search iii) access-point acquisition, iv) access-point data acquisition and v) access-point data operation. We model the acquisition of an access-point, as a system of closed queuing networks with parallel servers, and data acquisition in terms of where the data is located within the memory system. We evaluate our model on a set of concurrent data structure designs including a counter, a stack and a FIFO queue. The evaluation is carried out on two state of the art multi-core processors: Intel Xeon Phi CPU 7290 with 72 physical cores and Intel Xeon E5-2695 with 14 physical cores. Our model is able to predict the throughput performance of the given concurrent data structures with 80% to 100% accuracy on both architectures

Chalmers Research

Evaluating the impact of OpenMP 4.0 extensions on relevant parallel workloads

Author: Ayguadé Parra Eduard
Casas Guix Marc
Chasapis Dimitrios
Ferrer Ibáñez Roger
Labarta Mancho Jesús José
Martorell Bofill Xavier
Moreto Planas Miquel
Valero Cortés Mateo
Vidal Ortiz Raul
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

OpenMP has been for many years the most widely used programming model for shared memory architectures. Periodically, new features are proposed and some of them are finally selected for inclusion in the OpenMP standard. The OmpSs programming model developed at the Barcelona Supercomputing Center (BSC) aims to be an OpenMP forerunner that handles the main OpenMP constructs plus some extra features not included in the OpenMP standard. In this paper we show the usefulness of three OmpSs features not currently handled by OpenMP 4.0 by deploying them over three applications of the PARSEC benchmark suite and showing the performance benefits. This paper also shows performance trade-offs between the OmpSs/OpenMP tasking and loop parallelism constructs and shows how a hybrid implementation that combines both approaches is sometimes the best option.This work has been partially supported by the European Research Council under the European Union's 7th FP, ERC Grant Agreement number 321253, by the Spanish Ministry of Science and Innovation under grant TIN2012-34557 and by the HiPEAC Network of Excellence. It has been also supported by the Severo Ochoa Program awarded by the Spanish Government (grant SEV-2011-00067) M. Moreto has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI- 2012-15047. M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Co- fund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP_B 00243).Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Creating portable and efficient packet processing applications

Author: A Korobeynikov
AV Aho
B Wun
CW Fraser
D Bernstein
EA Lee
EJ Johnson
Fulvio Risso
G Memik
J Carlstrom
J Wagner
JA Fisher
JL Hennessy
L Ciminiera
L George
M Baldi
M Baldi
MK Chen
N Shah
Olivier Morandi
P Briggs
Paolo Veglia
Pierluigi Rolando
R Cytron
R Ennals
R Morris
Silvio Valenti
SS Muchnick
T Lindholm
Z Budimlic
Publication venue: Springer
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino