2,696 research outputs found

    DART-MPI: An MPI-based Implementation of a PGAS Runtime System

    Full text link
    A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly simplifies the tasks of developing parallel applications, because no explicit communication has to be specified in the program for data exchange between different computing nodes. In this paper we present DART, a runtime environment, which implements the PGAS paradigm on large-scale high-performance computing clusters. A specific feature of our implementation is the use of one-sided communication of the Message Passing Interface (MPI) version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated the performance of the implementation with several low-level kernels in order to determine overheads and limitations in comparison to the underlying MPI-3.Comment: 11 pages, International Conference on Partitioned Global Address Space Programming Models (PGAS14

    A Time-Triggered Constraint-Based Calculus for Avionic Systems

    Full text link
    The Integrated Modular Avionics (IMA) architec- ture and the Time-Triggered Ethernet (TTEthernet) network have emerged as the key components of a typical architecture model for recent civil aircrafts. We propose a real-time constraint-based calculus targeted at the analysis of such concepts of avionic embedded systems. We show our framework at work on the modelisation of both the (IMA) architecture and the TTEthernet network, illustrating their behavior by the well-known Flight Management System (FMS)

    TRACTABLE DATA-FLOW ANALYSIS FOR DISTRIBUTED SYSTEMS

    No full text
    Automated behavior analysis is a valuable technique in the development and maintainence of distributed systems. In this paper, we present a tractable dataflow analysis technique for the detection of unreachable states and actions in distributed systems. The technique follows an approximate approach described by Reif and Smolka, but delivers a more accurate result in assessing unreachable states and actions. The higher accuracy is achieved by the use of two concepts: action dependency and history sets. Although the technique, does not exhaustively detect all possible errors, it detects nontrivial errors with a worst-case complexity quadratic to the system size. It can be automated and applied to systems with arbitrary loops and nondeterministic structures. The technique thus provides practical and tractable behavior analysis for preliminary designs of distributed systems. This makes it an ideal candidate for an interactive checker in software development tools. The technique is illustrated with case studies of a pump control system and an erroneous distributed program. Results from a prototype implementation are presented

    Requirements for implementing real-time control functional modules on a hierarchical parallel pipelined system

    Get PDF
    Analysis of a robot control system leads to a broad range of processing requirements. One fundamental requirement of a robot control system is the necessity of a microcomputer system in order to provide sufficient processing capability.The use of multiple processors in a parallel architecture is beneficial for a number of reasons, including better cost performance, modular growth, increased reliability through replication, and flexibility for testing alternate control strategies via different partitioning. A survey of the progression from low level control synchronizing primitives to higher level communication tools is presented. The system communication and control mechanisms of existing robot control systems are compared to the hierarchical control model. The impact of this design methodology on the current robot control systems is explored

    Performance Analysis and Modelling of Concurrent Multi-access Data Structures

    Get PDF
    The major impediment to scaling concurrent data structures is memory contention when accessing shared data structure access-points, leading to thread serialisation, hindering parallelism. Aiming to address this challenge, significant amount of work in the literature has proposed multi-access techniques that improve concurrent data structure parallelism. However, there is little work on analysing and modelling the execution behaviour of concurrent multi-access data structures especially in a shared memory setting. In this paper, we analyse and model the general execution behaviour of concurrent multi-access data structures in the shared memory setting. We study and analyse the behaviour of the two popular random access patterns: shared (Remote) and exclusive (Local) access, and the behaviour of the two most commonly used atomic primitives for designing lock-free data structures: Compare and Swap, and, Fetch and Add. We model the concurrent multi-accesses by splitting the thread execution procedure into five logical sessions: i) side-work, ii) access-point search iii) access-point acquisition, iv) access-point data acquisition and v) access-point data operation. We model the acquisition of an access-point, as a system of closed queuing networks with parallel servers, and data acquisition in terms of where the data is located within the memory system. We evaluate our model on a set of concurrent data structure designs including a counter, a stack and a FIFO queue. The evaluation is carried out on two state of the art multi-core processors: Intel Xeon Phi CPU 7290 with 72 physical cores and Intel Xeon E5-2695 with 14 physical cores. Our model is able to predict the throughput performance of the given concurrent data structures with 80% to 100% accuracy on both architectures

    Evaluating the impact of OpenMP 4.0 extensions on relevant parallel workloads

    Get PDF
    OpenMP has been for many years the most widely used programming model for shared memory architectures. Periodically, new features are proposed and some of them are finally selected for inclusion in the OpenMP standard. The OmpSs programming model developed at the Barcelona Supercomputing Center (BSC) aims to be an OpenMP forerunner that handles the main OpenMP constructs plus some extra features not included in the OpenMP standard. In this paper we show the usefulness of three OmpSs features not currently handled by OpenMP 4.0 by deploying them over three applications of the PARSEC benchmark suite and showing the performance benefits. This paper also shows performance trade-offs between the OmpSs/OpenMP tasking and loop parallelism constructs and shows how a hybrid implementation that combines both approaches is sometimes the best option.This work has been partially supported by the European Research Council under the European Union's 7th FP, ERC Grant Agreement number 321253, by the Spanish Ministry of Science and Innovation under grant TIN2012-34557 and by the HiPEAC Network of Excellence. It has been also supported by the Severo Ochoa Program awarded by the Spanish Government (grant SEV-2011-00067) M. Moreto has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI- 2012-15047. M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Co- fund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP_B 00243).Peer ReviewedPostprint (author's final draft
    corecore