3 research outputs found

    Automatic implementation of TTEthernet-based time-triggered avionics applications

    Get PDF
    International audienceThe design of safety-critical embedded systems such as those used in avionics still involves largely manual phases. But in avionics the definition of standard interfaces embodied in standards such as ARINC 653 or TTEthernet should allow the definition of fully automatic code generation flows that reduce the costs while improving the quality of the generated code, much like compilers have done when replacing manual assembly coding. In this paper, we briefly present such a fully automatic implementation tool, called Lopht, for ARINC653-based time-triggered systems, and then explain how it is currently extended to include support for TTEthernet networks

    Analyse d’interférences mémoires sur les clusters de calcul du pluri-coeurs Kalray MPPA3

    Get PDF
    The Kalray MPPA3 Coolidge many-core processor is one of the few off-the-shelf high-performance processors amenable to full-fledged static timing analysis. And yet, even on this processor, providing tight execution time upper bounds may prove difficult. In this paper, we consider the sub-problem of bounding the timing overhead due to memory access interferences inside one MPPA3 shared memory compute cluster. This includes interferences between computing cores and interferences between the instruction and data accesses of a given core. We start with a detailed analysis of the MPPA3 compute cluster, with emphasis on three key components: the Prefetch Buffer (PFB), which performs speculative instruction loads, the fixed-priority (FP) arbiter between instruction and data accesses of a core, whose behavior is highly dependent (in the worst case) on interferences from other cores, and the SAP (bursty Round Robin) arbiters guarding access to memory banks. We provide a full-fledged interference analysis covering both levels. This analysis is rooted in a novel modeling of memory access patterns, which describes their worst- case and best-case burstiness, a key factor influencing the MPPA3 arbitration. We evaluate our interference model on multiple applications, ranging from real-life avionics code specified in SCADE to linear algebra code. We also suggests methods for reducing execution time and improving analysis precision by means of code generation.Le pluri-cœurs Kalray MPPA3 Coolidge est un des seuls processeurs haute-performance sur étagère à permettre le calcul de bornes statiques (non-probabilistes) sur le temps d’exécution. Mais même sur ce processeur le calcul de bornes serrées est difficile. Dans cet article, nous traitons le sous-problème du calcul de bornes supérieures sur les interférences dues aux accès concurrents aux bancs de mémoire partagée. De plus, notre analyse se concentre sur un seul cluster de calcul de l’architecture-cible, et s’intéresse seulement aux interférences entre cœurs de calcul du cluster et aux interférences entre accès instruction et données d’un seul cœur. Nous commençons par une analyse détaillée du cluster de calcul MPPA3, mettant l’accent sur trois composants-clefs: le tampon de préchargement anticipé (Prefetch Buffer, ou PFB) qui réalise des préchargements de code spéculatifs, l’arbitre à priorité fixe (FP) entre les accès au code et aux données d’un même cœur de calcul, dont le comportement est dépendant (au pire cas) des interférences d’autres cœurs, et les arbitres SAP (Round Robin avec support pour les rafales) qui contrôlent l’accès aux bancs de mémoire partagée. Nous développons une analyse d’interférences complète par rapport au domaine choisi. Notre analyse est fondée sur une nouvelle modélisation des motifs d’accès à la mémoire, qui permet la représentation du groupage des accès en rafales (dans le pire et dans le meilleur des cas). Ce facteur a une influence très forte sur l’arbitrage MPPA. Nous évaluons notre approche d’analyse d’interférences sur plusieurs applications allant de tâches avioniques appartenant à une application de production spécifiée en SCADE, et jusqu’à du code d’algèbre linéaire représentatif pour les applications de type “jumeau numérique” ou “machine learning”. Nous suggérons aussi des méthodes permettant de réduire le temps d’exécution et d’améliorer la précision de l’analyse par des choix de génération de code

    Reconciling performance and predictability on a many-core through off-line mapping

    Get PDF
    International audienceWe start from a general-purpose many-core architecture designed for average-case performance and ease of use. In particular, its distributed shared memory programming model allows the use of a code generation flow based on the (unmodified) gcc compiler chain. We modify this architecture and extend the code generation flow to allow the construction of efficient hard real-time systems starting from dependent task specifications. We rely on a static (off-line) real-time scheduling paradigm well-adapted to embedded control and signal processing applications with regular control structure. We modify the architecture (and in particular the on-chip network) to allow the implementation of static schedules with very high (clock cycle) temporal precision. On the software side, we define application mapping rules ensuring that the timing precision provided by the hardware is not lost. These mapping rules include requirements on the allocation of data variables to specific RAM banks and on the use of locks to ensure the absence of contentions during access to shared resources. Applications complying with these rules can be written manually or automatically obtained using a new mapping tool that takes all the allocation and scheduling decisions. Compilation of the resulting C code is still done using the (unmodified) gcc compiler chain. The resulting platform provides good performance, and at the same provides very high timing precision, as shown by two case studies (an embedded controller and an implementation of the FFT). We conclude our paper with a presentation of some ongoing work on the subject: A case study (an implementation of the H.264 decoder) meant to test the limitations of our method
    corecore