39 research outputs found

    Real-time analysis of MPI programs for NoC-based many-cores using time division multiplexing

    Get PDF
    Worst-case execution time (WCET) analysis is crucial for designing hard real-time systems. While the WCET of tasks in a single core system can be upper bounded in isolation, the tasks in a many-core system are subject to shared memory interferences which impose high overestimation of the WCET bounds. However, many-core-based massively parallel applications will enter the area of real-time systems in the years ahead. Explicit message-passing and a clear separation of computation and communication facilitates WCET analysis for those programs. A standard programming model for message-based communication is the message passing interface (MPI). It provides an application independent interface for different standard communication operations (e.g. broadcast, gather, ...). Thereby, it uses efficient communication patterns with deterministic behaviour. In applying these known structures, we target to provide a WCET analysis for communication that is reusable for different applications if the communication is executed on the same underlying platform. Hence, the analysis must be performed once per hardware platform and can be reused afterwards with only adapting several parameters such as the number of nodes participating in that communication. Typically, the processing elements of many-core platforms are connected via a Network-on-Chip (NoC) and apply techniques such as time-division multiplexing (TDM) to provide guaranteed services for the network. Hence, the hardware and the applied technique for guaranteed service needs to facilitate this reusability of the analysis as well. In this work we review different general-purpose TDM schedules that enable a WCET approximation independent of the placement of tasks on processing elements of a many-core which uses a NoC with torus topology. Furthermore, we provide two new schedules that show a similar performance as the state-of-the-art schedules but additionally serve situations where the presented state-of-the-art schedules perform poorly. Based on these schedules a procedure for the WCET analysis of the communication patterns used in MPI is proposed. Finally, we show how to apply the results of the analysis to calculate the WCET upper bound for a complete MPI program. Detailed insights in the performance of the applied TDM schedules are provided by comparing the schedules to each other in terms of timing. Additionally, we discuss the exhibited timing of the general-purpose schedules compared to a state-of-the-art application specific TDM schedule to put in relation both types of schedules. We apply the proposed procedure to several standard types of communication provided in MPI and compare different patterns that are used to implement a specific communication. Our evaluation investigates the communications’ building blocks of the timing bounds and shows the tremendous impact of choosing the appropriate communication pattern. Finally, a case study demonstrates the application of the presented procedure to a complete MPI program. With the method proposed in this work it is possible to perform a reusable WCET timing analysis for the communication in a NoC that is independent of the placement of tasks on the chip. Moreover, as the applied schedules are not optimized for a specific application but can be used for all applications in the same way, there are only marginal changes in the timing of the communication when the software is adapted or updated. Thus, there is no need to perform the timing analysis from scratch in such cases

    Employing MPI Collectives for Timing Analysis on Embedded Multi-Cores

    Get PDF
    Static WCET analysis of parallel programs running on shared-memory multicores suffers from high pessimism. Instead, distributed memory platforms which communicate via messages may be one solution for manycore systems. Message Passing Interface (MPI) is a standard for communication on these platforms. We show how its concept of collective operations can be employed for timing analysis. The idea is that the worst-case execution time (WCET) of a parallel program may be estimated by adding the WCET estimates of sequential program parts to the WCET estimates of communication parts. Therefore, we first analyse the two MPI operations MPI_Allreduce and MPI_Sendrecv. Employing these results, we make a timing analysis of the conjugate gradient (CG) benchmark from the NAS parallel benchmark suite

    An Inducible and Reversible Mouse Genetic Rescue System

    Get PDF
    Inducible and reversible regulation of gene expression is a powerful approach for uncovering gene function. We have established a general method to efficiently produce reversible and inducible gene knockout and rescue in mice. In this system, which we named iKO, the target gene can be turned on and off at will by treating the mice with doxycycline. This method combines two genetically modified mouse lines: a) a KO line with a tetracycline-dependent transactivator replacing the endogenous target gene, and b) a line with a tetracycline-inducible cDNA of the target gene inserted into a tightly regulated (TIGRE) genomic locus, which provides for low basal expression and high inducibility. Such a locus occurs infrequently in the genome and we have developed a method to easily introduce genes into the TIGRE site of mouse embryonic stem (ES) cells by recombinase-mediated insertion. Both KO and TIGRE lines have been engineered for high-throughput, large-scale and cost-effective production of iKO mice. As a proof of concept, we have created iKO mice in the apolipoprotein E (ApoE) gene, which allows for sensitive and quantitative phenotypic analyses. The results demonstrated reversible switching of ApoE transcription, plasma cholesterol levels, and atherosclerosis progression and regression. The iKO system shows stringent regulation and is a versatile genetic system that can easily incorporate other techniques and adapt to a wide range of applications

    PIMP my many-core: pipeline-integrated message passing

    Get PDF

    User manual for the optimization and WCET analysis of software with timing analyzable algorithmic skeletons

    Get PDF
    We recently presented a parallelization approach based on parallel design patterns and leading to structured parallelism. The approach is applicable for the parallelization of sequential code parts of embedded hard real-time software. To reduce work effort it is necessary to rely on tool support. In this context, we here present software for the model-based and multi-objective optimization of a software model with a high degree of parallelism. In addition, we introduce the timing analyzable algorithmic skeletons (TAS) for the fast implementation of the optimized software model. To support the static WCET analysis with the OTAWA toolset, we developed a compact XML format to describe software with TAS instances. Such a model can then easily be translated into the OTAWA XML format representing parallel flow-facts. All software described in this technical report is available under an open source license
    corecore