9 research outputs found

    Automatic WCET Analysis of Real-Time Parallel Applications

    Get PDF
    National audienceTomorrow’s real-time embedded systems will be built upon multicore architectures. This raises two challenges. First, shared resources should be arbitrated in such a way that the WCET of independent threads running concurrently can be computed: in this paper, we assume that time-predictable multicore architectures are available. The second challenge is to develop software that achieves a high level of performance without impairing timing predictability. We investigate parallel software based on the POSIX threads standard and we show how the WCET of a parallel program can be analysed. We report experimental results obtained for typical parallel programs with an extended version of the OTAWA toolset

    Contention in multicore hardware shared resources: Understanding of the state of the art

    Get PDF
    The real-time systems community has over the years devoted considerable attention to the impact on execution timing that arises from contention on access to hardware shared resources. The relevance of this problem has been accentuated with the arrival of multicore processors. From the state of the art on the subject, there appears to be considerable diversity in the understanding of the problem and in the “approach” to solve it. This sparseness makes it difficult for any reader to form a coherent picture of the problem and solution space. This paper draws a tentative taxonomy in which each known approach to the problem can be categorised based on its specific goals and assumptions.Postprint (published version

    Timing Analysis of Parallel Software Using Abstract Execution

    Get PDF
    Abstract. A major trend in computer architecture is multi-core processors. To fully exploit this type of parallel processor chip, programs running on it will have to be parallel as well. This means that even hard real-time embedded systems will be parallel. Therefore, it is of utmost importance that methods to analyze the timing properties of parallel real-time systems are developed. This paper presents an algorithm that is founded on abstract interpretation and derives safe approximations of the execution times of parallel programs. The algorithm is formulated and proven correct for a simple parallel language with parallel threads, shared memory and synchronization via locks

    Analyse d’interférences mémoires sur les clusters de calcul du pluri-coeurs Kalray MPPA3

    Get PDF
    The Kalray MPPA3 Coolidge many-core processor is one of the few off-the-shelf high-performance processors amenable to full-fledged static timing analysis. And yet, even on this processor, providing tight execution time upper bounds may prove difficult. In this paper, we consider the sub-problem of bounding the timing overhead due to memory access interferences inside one MPPA3 shared memory compute cluster. This includes interferences between computing cores and interferences between the instruction and data accesses of a given core. We start with a detailed analysis of the MPPA3 compute cluster, with emphasis on three key components: the Prefetch Buffer (PFB), which performs speculative instruction loads, the fixed-priority (FP) arbiter between instruction and data accesses of a core, whose behavior is highly dependent (in the worst case) on interferences from other cores, and the SAP (bursty Round Robin) arbiters guarding access to memory banks. We provide a full-fledged interference analysis covering both levels. This analysis is rooted in a novel modeling of memory access patterns, which describes their worst- case and best-case burstiness, a key factor influencing the MPPA3 arbitration. We evaluate our interference model on multiple applications, ranging from real-life avionics code specified in SCADE to linear algebra code. We also suggests methods for reducing execution time and improving analysis precision by means of code generation.Le pluri-cœurs Kalray MPPA3 Coolidge est un des seuls processeurs haute-performance sur étagère à permettre le calcul de bornes statiques (non-probabilistes) sur le temps d’exécution. Mais même sur ce processeur le calcul de bornes serrées est difficile. Dans cet article, nous traitons le sous-problème du calcul de bornes supérieures sur les interférences dues aux accès concurrents aux bancs de mémoire partagée. De plus, notre analyse se concentre sur un seul cluster de calcul de l’architecture-cible, et s’intéresse seulement aux interférences entre cœurs de calcul du cluster et aux interférences entre accès instruction et données d’un seul cœur. Nous commençons par une analyse détaillée du cluster de calcul MPPA3, mettant l’accent sur trois composants-clefs: le tampon de préchargement anticipé (Prefetch Buffer, ou PFB) qui réalise des préchargements de code spéculatifs, l’arbitre à priorité fixe (FP) entre les accès au code et aux données d’un même cœur de calcul, dont le comportement est dépendant (au pire cas) des interférences d’autres cœurs, et les arbitres SAP (Round Robin avec support pour les rafales) qui contrôlent l’accès aux bancs de mémoire partagée. Nous développons une analyse d’interférences complète par rapport au domaine choisi. Notre analyse est fondée sur une nouvelle modélisation des motifs d’accès à la mémoire, qui permet la représentation du groupage des accès en rafales (dans le pire et dans le meilleur des cas). Ce facteur a une influence très forte sur l’arbitrage MPPA. Nous évaluons notre approche d’analyse d’interférences sur plusieurs applications allant de tâches avioniques appartenant à une application de production spécifiée en SCADE, et jusqu’à du code d’algèbre linéaire représentatif pour les applications de type “jumeau numérique” ou “machine learning”. Nous suggérons aussi des méthodes permettant de réduire le temps d’exécution et d’améliorer la précision de l’analyse par des choix de génération de code

    Multi-core devices for safety-critical systems: a survey

    Get PDF
    Multi-core devices are envisioned to support the development of next-generation safety-critical systems, enabling the on-chip integration of functions of different criticality. This integration provides multiple system-level potential benefits such as cost, size, power, and weight reduction. However, safety certification becomes a challenge and several fundamental safety technical requirements must be addressed, such as temporal and spatial independence, reliability, and diagnostic coverage. This survey provides a categorization and overview at different device abstraction levels (nanoscale, component, and device) of selected key research contributions that support the compliance with these fundamental safety requirements.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2015-65316-P, Basque Government under grant KK-2019-00035 and the HiPEAC Network of Excellence. The Spanish Ministry of Economy and Competitiveness has also partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013-14717).Peer ReviewedPostprint (author's final draft

    A time-predictable many-core processor design for critical real-time embedded systems

    Get PDF
    Critical Real-Time Embedded Systems (CRTES) are in charge of controlling fundamental parts of embedded system, e.g. energy harvesting solar panels in satellites, steering and breaking in cars, or flight management systems in airplanes. To do so, CRTES require strong evidence of correct functional and timing behavior. The former guarantees that the system operates correctly in response of its inputs; the latter ensures that its operations are performed within a predefined time budget. CRTES aim at increasing the number and complexity of functions. Examples include the incorporation of \smarter" Advanced Driver Assistance System (ADAS) functionality in modern cars or advanced collision avoidance systems in Unmanned Aerial Vehicles (UAVs). All these new features, implemented in software, lead to an exponential growth in both performance requirements and software development complexity. Furthermore, there is a strong need to integrate multiple functions into the same computing platform to reduce the number of processing units, mass and space requirements, etc. Overall, there is a clear need to increase the computing power of current CRTES in order to support new sophisticated and complex functionality, and integrate multiple systems into a single platform. The use of multi- and many-core processor architectures is increasingly seen in the CRTES industry as the solution to cope with the performance demand and cost constraints of future CRTES. Many-cores supply higher performance by exploiting the parallelism of applications while providing a better performance per watt as cores are maintained simpler with respect to complex single-core processors. Moreover, the parallelization capabilities allow scheduling multiple functions into the same processor, maximizing the hardware utilization. However, the use of multi- and many-cores in CRTES also brings a number of challenges related to provide evidence about the correct operation of the system, especially in the timing domain. Hence, despite the advantages of many-cores and the fact that they are nowadays a reality in the embedded domain (e.g. Kalray MPPA, Freescale NXP P4080, TI Keystone II), their use in CRTES still requires finding efficient ways of providing reliable evidence about the correct operation of the system. This thesis investigates the use of many-core processors in CRTES as a means to satisfy performance demands of future complex applications while providing the necessary timing guarantees. To do so, this thesis contributes to advance the state-of-the-art towards the exploitation of parallel capabilities of many-cores in CRTES contributing in two different computing domains. From the hardware domain, this thesis proposes new many-core designs that enable deriving reliable and tight timing guarantees. From the software domain, we present efficient scheduling and timing analysis techniques to exploit the parallelization capabilities of many-core architectures and to derive tight and trustworthy Worst-Case Execution Time (WCET) estimates of CRTES.Los sistemas críticos empotrados de tiempo real (en ingles Critical Real-Time Embedded Systems, CRTES) se encargan de controlar partes fundamentales de los sistemas integrados, e.g. obtención de la energía de los paneles solares en satélites, la dirección y frenado en automóviles, o el control de vuelo en aviones. Para hacerlo, CRTES requieren fuerte evidencias del correcto comportamiento funcional y temporal. El primero garantiza que el sistema funciona correctamente en respuesta de sus entradas; el último asegura que sus operaciones se realizan dentro de unos limites temporales establecidos previamente. El objetivo de los CRTES es aumentar el número y la complejidad de las funciones. Algunos ejemplos incluyen los sistemas inteligentes de asistencia a la conducción en automóviles modernos o los sistemas avanzados de prevención de colisiones en vehiculos aereos no tripulados. Todas estas nuevas características, implementadas en software,conducen a un crecimiento exponencial tanto en los requerimientos de rendimiento como en la complejidad de desarrollo de software. Además, existe una gran necesidad de integrar múltiples funciones en una sóla plataforma para así reducir el número de unidades de procesamiento, cumplir con requisitos de peso y espacio, etc. En general, hay una clara necesidad de aumentar la potencia de cómputo de los actuales CRTES para soportar nueva funcionalidades sofisticadas y complejas e integrar múltiples sistemas en una sola plataforma. El uso de arquitecturas multi- y many-core se ve cada vez más en la industria CRTES como la solución para hacer frente a la demanda de mayor rendimiento y las limitaciones de costes de los futuros CRTES. Las arquitecturas many-core proporcionan un mayor rendimiento explotando el paralelismo de aplicaciones al tiempo que proporciona un mejor rendimiento por vatio ya que los cores se mantienen más simples con respecto a complejos procesadores de un solo core. Además, las capacidades de paralelización permiten programar múltiples funciones en el mismo procesador, maximizando la utilización del hardware. Sin embargo, el uso de multi- y many-core en CRTES también acarrea ciertos desafíos relacionados con la aportación de evidencias sobre el correcto funcionamiento del sistema, especialmente en el ámbito temporal. Por eso, a pesar de las ventajas de los procesadores many-core y del hecho de que éstos son una realidad en los sitemas integrados (por ejemplo Kalray MPPA, Freescale NXP P4080, TI Keystone II), su uso en CRTES aún precisa de la búsqueda de métodos eficientes para proveer evidencias fiables sobre el correcto funcionamiento del sistema. Esta tesis ahonda en el uso de procesadores many-core en CRTES como un medio para satisfacer los requisitos de rendimiento de aplicaciones complejas mientras proveen las garantías de tiempo necesarias. Para ello, esta tesis contribuye en el avance del estado del arte hacia la explotación de many-cores en CRTES en dos ámbitos de la computación. En el ámbito del hardware, esta tesis propone nuevos diseños many-core que posibilitan garantías de tiempo fiables y precisas. En el ámbito del software, la tesis presenta técnicas eficientes para la planificación de tareas y el análisis de tiempo para aprovechar las capacidades de paralelización en arquitecturas many-core, y también para derivar estimaciones de peor tiempo de ejecución (Worst-Case Execution Time, WCET) fiables y precisas

    Scheduling techniques to improve the worst-case execution time of real-time parallel applications on heterogeneous platforms

    Get PDF
    The key to providing high performance and energy-efficient execution for hard real-time applications is the time predictable and efficient usage of heterogeneous multiprocessors. However, schedulability analysis of parallel applications executed on unrelated heterogeneous multiprocessors is challenging and has not been investigated adequately by earlier works. The unrelated model is suitable to represent many of the multiprocessor platforms available today because a task (i.e., sequential code) may exhibit a different work-case-execution-time (WCET) on each type of processor on an unrelated heterogeneous multiprocessors platform. A parallel application can be realistically modeled as a directed acyclic graph (DAG), where the nodes are sequential tasks and the edges are dependencies among the tasks. This thesis considers a sporadic DAG model which is used broadly to analyze and verify the real-time requirements of parallel applications. A global work-conserving scheduler can efficiently utilize an unrelated platform by executing the tasks of a DAG on different processor types. However, it is challenging to compute an upper bound on the worst-case schedule length of the DAG, called makespan, which is used to verify whether the deadline of a DAG is met or not. There are two main challenges. First, because of the heterogeneity of the processors, the WCET for each task of the DAG depends on which processor the task is executing on during actual runtime. Second, timing anomalies are the main obstacle to compute the makespan even for the simpler case when all the processors are of the same type, i.e., homogeneous multiprocessors. To that end, this thesis addresses the following problem: How we can schedule multiple sporadic DAGs on unrelated multiprocessors such that all the DAGs meet their deadlines. Initially, the thesis focuses on homogeneous multiprocessors that is a special case of unrelated multiprocessors to understand and tackle the main challenge of timing anomalies. A novel timing-anomaly-free scheduler is proposed which can be used to compute the makespan of a DAG just by simulating the execution of the tasks based on this proposed scheduler. A set of representative task-based parallel OpenMP applications from the BOTS benchmark suite are modeled as DAGs to investigate the timing behavior of real-world applications. A simulation framework is developed to evaluate the proposed method. Furthermore, the thesis targets unrelated multiprocessors and proposes a global scheduler to execute the tasks of a single DAG to an unrelated multiprocessors platform. Based on the proposed scheduler, methods to compute the makespan of a single DAG are introduced. A set of representative parallel applications from the BOTS benchmark suite are modeled as DAGs that execute on unrelated multiprocessors. Furthermore, synthetic DAGs are generated to examine additional structures of parallel applications and various platform capabilities. A simulation framework that simulates the execution of the tasks of a DAG on an unrelated multiprocessor platform is introduced to assess the effectiveness of the proposed makespan computations. Finally, based on the makespan computation of a single DAG this thesis presents the design and schedulability analysis of global and federated scheduling of sporadic DAGs that execute on unrelated multiprocessors

    A time-predictable parallel programing model for real-time systems

    Get PDF
    The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of the high-performance and the embedded computing domains. Critical real-time embedded systems are increasingly concerned with providing higher performance to implement advanced functionalities in a predictable way. OpenMP, the de-facto parallel programming model for shared memory architectures in the high-performance computing domain, is gaining the attention to be used in embedded platforms. The reason is that OpenMP is a mature language that allows to efficiently exploit the huge computational capabilities of parallel embedded architectures. Moreover, OpenMP allows to express parallelism on top of the current technologies used in embedded designs (e.g., C/C++ applications). At a lower level, OpenMP provides a powerful task-centric model that allows to define very sophisticated types of regular and irregular parallelism. While OpenMP provides relevant features for embedded systems, both the programming interface and the execution model are completely agnostic to the timing requirements of real-time systems. This thesis evaluates the use of OpenMP to develop future critical real-time embedded systems. The first contribution analyzes the OpenMP specification from a timing perspective. It proposes new features to be incorporated in the OpenMP standard and a set of guidelines to implement critical real-time systems with OpenMP. The second contribution develops new methods to analyze and predict the timing behavior of parallel applications, so that the notion of parallelism can be safely incorporated into critical real-time systems. Finally, the proposed techniques are evaluated with both synthetic applications and real use cases parallelized with OpenMP. With the above contributions, this thesis pushes the limits of the use of task-based parallel programming models in general, and OpenMP in particular, in critical real-time embedded domains.Los recientes avances tecnológicos y tendencias de mercado estan causando un interesante fenómeno hacia la convergencia de dos dominios: la computacion de altas prestaciones y la computacion embebida. Hay cada vez mas interés en que los sistemas embebidos criticos de tiempo real proporcionen un mayor rendimiento para implementar funcionalidades avanzadas de una manera predecible. OpenMP, el modelo de programación paralela estándar para arquitecturas de memoria compartida en el dominio de la computación de altas prestaciones, está ganando atención para ser utilizado en systemas embebidos. La razón es que OpenMP es un lenguaje asentado que permite explotar eficientemente las enormes capacidades computacionales de las arquitecturas paralelas embebidas. Además, OpenMP permite expresar paralelismo sobre las tecnologías actuales utilizadas en los diseños embebidos (por ejemplo, aplicaciones C/C++). A un nivel inferior, OpenMP proporciona un potente modelo centrado en tareas que permite expresar tipos muy sofisticados de paralelismo regular e irregular. Si bien OpenMP proporciona funciones relevantes para los sistemas embebidos, tanto la interfaz de programación como el modelo de ejecución son completamente ajenos a los requisitos temporales de los sistemas de tiempo real. Esta tesis evalúa el uso de OpenMP para desarrollar los futuros sistemas embebidos criticos de timepo real. La primera contribución analiza la especificación de OpenMP desde una perspectiva temporal. Se propone nuevas funcionalidades que podrian ser incorporadas en el estándar de OpenMP y un conjunto de pautas para implementar sistemas críticos de tiempo real con OpenMP. La segunda contribución desarrolla nuevos métodos para analizar y predecir el comportamiento temporal de las aplicaciones paralelas, de modo que la noción de paralelismo se pueda incorporar de manera segura en sistemas críticos de tiempo real. Finalmente, las técnicas propuestas se evaluan con aplicaciones sintéticas y casos de uso reales paralelizados con OpenMP. Con las mencionadas contribuciones, esta tesis amplía los límites en el uso de los modelos de programación paralela basados en tarea en general, y de OpenMP en particular, en dominios embebidos criticos de tiempo real
    corecore