27 research outputs found
Employing MPI Collectives for Timing Analysis on Embedded Multi-Cores
Static WCET analysis of parallel programs running on shared-memory multicores suffers from high pessimism. Instead, distributed memory platforms which communicate via messages may be one solution for manycore systems. Message Passing Interface (MPI) is a standard for communication on these platforms. We show how its concept of collective operations can be employed for timing analysis. The idea is that the worst-case execution time (WCET) of a parallel program may be estimated by adding the WCET estimates of sequential program parts to the WCET estimates of communication parts. Therefore, we first analyse the two MPI operations MPI_Allreduce and MPI_Sendrecv. Employing these results, we make a timing analysis of the conjugate gradient (CG) benchmark from the NAS parallel benchmark suite
Hardware extensions for a timing-predictable many-core processor
The requirements for today's embedded hard real-time systems are high: They should deliver high performance, be energy-efficient and always react in time. This leads to the use of processors with several cores. However, when the cores are connected via a shared memory, static timing analysis suffers from high pessimism. We see distributed memory many-core processors as a solution where cores communicate via messages. One of them is the Reduced Complexity Many-Core (RC/MC) architecture. It was developed with the goal of high timing predictability.
In our thesis, we present an approach to estimate the Worst-Case Execution Time (WCET) of programs running on this platform. Furthermore, we extend the RC/MC to improve its timing predictability and its worst-case performance. Our first step is the introduction of ready synchronization, which avoids buffer overflows. Second, we design hardware support for broadcasts and multicasts. Third, the RC/MC is extended with hardware supported barriers.
Each of these techniques is evaluated for its impact. We carry out timing analyses of the hardware operations for broadcasts/multicasts and barriers and compare them with their variants without hardware support. Finally, we present three case studies, where we analyze benchmarks taken from the NAS parallel benchmark suite to evaluate the worst-case performance of our extensions in the context of real use cases
User manual for the optimization and WCET analysis of software with timing analyzable algorithmic skeletons
We recently presented a parallelization approach based on parallel design patterns and leading to structured parallelism. The approach is applicable for the parallelization of sequential code parts of embedded hard real-time software. To reduce work effort it is necessary to rely on tool support. In this context, we here present software for the model-based and multi-objective optimization of a software model with a high degree of parallelism. In addition, we introduce the timing analyzable algorithmic skeletons (TAS) for the fast implementation of the optimized software model. To support the static WCET analysis with the OTAWA toolset, we developed a compact XML format to describe software with TAS instances. Such a model can then easily be translated into the OTAWA XML format representing parallel flow-facts. All software described in this technical report is available under an open source license
WCTT bounds for MPI primitives in the PaterNoster NoC
This paper applies several variants of application independent time-division multiplexing to MPI primitives and investigates their applicability for different scopes of communication. Thereby, the scopes are characterized by the size of the network-on-chip, the number of participating nodes and the message size sent to each receiver or received from each sender, respectively. The evaluation shows that none of the observed variants feature the lowest worst-case traversal time in all situations. Instead there are multiple schedule variants which each perform best in a different scope of communication parameters.</jats:p