393 research outputs found

    Contention-aware performance monitoring counter support for real-time MPSoCs

    Get PDF
    Tasks running in MPSoCs experience contention delays when accessing MPSoC’s shared resources, complicating task timing analysis and deriving execution time bounds. Understanding the Actual Contention Delay (ACD) each task suffers due to other corunning tasks, and the particular hardware shared resources in which contention occurs, is of prominent importance to increase confidence on derived execution time bounds of tasks. And, whenever those bounds are violated, ACD provides information on the reasons for overruns. Unfortunately, existing MPSoC designs considered in real-time domains offer limited hardware support to measure tasks’ ACD losing all these potential benefits. In this paper we propose the Contention Cycle Stack (CCS), a mechanism that extends performance monitoring counters to track specific events that allow estimating the ACD that each task suffers from every contending task on every hardware shared resource. We build the CCS using a set of specialized low-overhead Performance Monitoring Counters for the Cobham Gaisler GR740 (NGMP) MPSoC – used in the space domain – for which we show CCS’s benefits.The research leading to these results has received funding from the European Space Agency under contracts 4000109680, 4000110157 and NPI 4000102880, and the Ministry of Science and Technology of Spain under contract TIN-2015-65316-P. Jaume Abella has been partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    On the tailoring of CAST-32A certification guidance to real COTS multicore architectures

    Get PDF
    The use of Commercial Off-The-Shelf (COTS) multicores in real-time industry is on the rise due to multicores' potential performance increase and energy reduction. Yet, the unpredictable impact on timing of contention in shared hardware resources challenges certification. Furthermore, most safety certification standards target single-core architectures and do not provide explicit guidance for multicore processors. Recently, however, CAST-32A has been presented providing guidance for software planning, development and verification in multicores. In this paper, from a theoretical level, we provide a detailed review of CAST-32A objectives and the difficulty of reaching them under current COTS multicore design trends; at experimental level, we assess the difficulties of the application of CAST-32A to a real multicore processor, the NXP P4080.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal grant RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    WCET Optimizations and Architectural Support for Hard Real-Time Systems

    Get PDF
    As time predictability is critical to hard real-time systems, it is not only necessary to accurately estimate the worst-case execution time (WCET) of the real-time tasks but also desirable to improve either the WCET of the tasks or time predictability of the system, because the real-time tasks with lower WCETs are easy to schedule and more likely to meat their deadlines. As a real-time system is an integration of software and hardware, the optimization can be achieved through two ways: software optimization and time-predictable architectural support. In terms of software optimization, we fi rst propose a loop-based instruction prefetching approach to further improve the WCET comparing with simple prefetching techniques such as Next-N-Line prefetching which can enhance both the average-case performance and the worst-case performance. Our prefetching approach can exploit the program controlow information to intelligently prefetch instructions that are most likely needed. Second, as inter-thread interferences in shared caches can signi cantly a ect the WCET of real-time tasks running on multicore processors, we study three multicore-aware code positioning methods to reduce the inter-core L2 cache interferences between co-running real-time threads. One strategy focuses on decreasing the longest WCET among the co-running threads, and two other methods aim at achieving fairness in terms of the amount or percentage of WCET reduction among co-running threads. In the aspect of time-predictable architectural support, we introduce the concept of architectural time predictability (ATP) to separate timing uncertainty concerns caused by hardware from software, which greatly facilitates the advancement of time-predictable processor design. We also propose a metric called Architectural Time-predictability Factor (ATF) to measure architectural time predictability quantitatively. Furthermore, while cache memories can generally improve average-case performance, they are harmful to time predictability and thus are not desirable for hard real-time and safety-critical systems. In contrast, Scratch-Pad Memories (SPMs) are time predictable, but they may lead to inferior performance. Guided by ATF, we propose and evaluate a variety of hybrid on-chip memory architectures to combine both caches and SPMs intelligently to achieve good time predictability and high performance. Detailed implementation and experimental results discussion are presented in this dissertation

    Computing Safe Contention Bounds for Multicore Resources with Round-Robin and FIFO Arbitration

    Get PDF
    Numerous researchers have studied the contention that arises among tasks running in parallel on a multicore processor. Most of those studies seek to derive a tight and sound upper-bound for the worst-case delay with which a processor resource may serve an incoming request, when its access is arbitrated using time-predictable policies such as round-robin or FIFO. We call this value upper-bound delay ( ubd ). Deriving trustworthy ubd statically is possible when sufficient public information exists on the timing latency incurred on access to the resource of interest. Unfortunately however, that is rarely granted for commercial-of-the-shelf (COTS) processors. Therefore, the users resort to measurement observations on the target processor and thus compute a “measured” ubdm . However, using ubdm to compute worst-case execution time values for programs running on COTS multicore processors requires qualification on the soundness of the result. In this paper, we present a measurement-based methodology to derive a ubdm under round-robin (RoRo) and first-in-first-out (FIFO) arbitration, which accurately approximates ubd from above, without needing latency information from the hardware provider. Experimental results, obtained on multiple processor configurations, demonstrate the robustness of the proposed methodology.The research leading to this work has received funding from: the European Union’s Horizon 2020 research and innovation programme under grant agreement No 644080(SAFURE); the European Space Agency under Contract 789.2013 and NPI Contract 40001102880; and COST Action IC1202, Timing Analysis On Code-Level (TACLe). This work has also been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717. The authors would like to thanks Paul Caheny for his help with the proofreading of this document.Peer ReviewedPostprint (author's final draft

    An Overview of Approaches Towards the Timing Analysability of Parallel Architecture

    Get PDF
    In order to meet performance/low energy/integration requirements, parallel architectures (multithreaded cores and multi-cores) are more and more considered in the design of embedded systems running critical software. The objective is to run several applications concurrently. When applications have strict real-time constraints, two questions arise: a) how can the worst-case execution time (WCET) of each application be computed while concurrent applications might interfere? b)~how can the tasks be scheduled so that they are guarantee to meet their deadlines? The second question has received much attention for several years~cite{CFHS04,DaBu11}. Proposed schemes generally assume that the first question has been solved, and in addition that they do not impact the WCETs. In effect, the first question is far from been answered even if several approaches have been proposed in the literature. In this paper, we present an overview of these approaches from the point of view of static WCET analysis techniques

    A Survey on Cache Management Mechanisms for Real-Time Embedded Systems

    Get PDF
    © ACM, 2015. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Computing Surveys, {48, 2, (November 2015)} http://doi.acm.org/10.1145/2830555Multicore processors are being extensively used by real-time systems, mainly because of their demand for increased computing power. However, multicore processors have shared resources that affect the predictability of real-time systems, which is the key to correctly estimate the worst-case execution time of tasks. One of the main factors for unpredictability in a multicore processor is the cache memory hierarchy. Recently, many research works have proposed different techniques to deal with caches in multicore processors in the context of real-time systems. Nevertheless, a review and categorization of these techniques is still an open topic and would be very useful for the real-time community. In this article, we present a survey of cache management techniques for real-time embedded systems, from the first studies of the field in 1990 up to the latest research published in 2014. We categorize the main research works and provide a detailed comparison in terms of similarities and differences. We also identify key challenges and discuss future research directions.King Saud University NSER

    Smart hardware designs for probabilistically-analyzable processor architectures

    Get PDF
    Future Critical Real-Time Embedded Systems (CRTES), like those is planes, cars or trains, require more and more guaranteed performance in order to satisfy the increasing performance demands of advanced complex software features. While increased performance can be achieved by deploying processor techniques currently used in High-Performance Computing (HPC) and mainstream domains, their use challenges the software timing analysis, a necessary step in CRTES' verification and validation. Cache memories are known to have high impact in performance, and in fact, current CRTES include multicores usually with several levels of cache. In this line, this Thesis aims at increasing the guaranteed performance of CRTES by using techniques for caches building upon time randomization and providing probabilistic guarantees of tasks' execution time. In this Thesis, we first focus on on improving cache placement and replacement to improve guaranteed performance. For placement, different existing policies are explored in a multi-level cache setup, and a solution is reached in which different of those policies are combined. For cache replacement, we analyze a pathological scenario that no cache policy so far accounts and propose several policies that fix this pathological scenario. For shared caches in multicore we observe that contention is mainly caused by private writes that go through to the shared cache, yet using a pure write-back policy also has its drawbacks. We propose a hybrid approach to mitigate this contention. Building on this solution, the next contribution tackles a problem caused by the need of some reliability mechanisms in CRTES. Implementing reliability close to the processor's core has a significant impact in performance. A look-ahead error detection solution is proposed to greatly mitigate the performance impact. The next contribution proposes the first hardware prefetcher for CRTES with arbitrary cache hierarchies. Given its speculative nature, prefetchers that have a guaranteed positive impact on performance are difficult to design. We present a framework that provides execution time guarantees and obtains a performance benefit. Finally, we focus on the impact of timing anomalies in CRTES with caches. For the first time, a definition and taxonomy of timing anomalies is given for Measurement-Based Timing Analysis. Then, we focus on a specific timing anomaly that can happen with caches and provide a solution to account for it in the execution time estimates.Los Sistemas Empotrados de Tiempo-Real Crítico (SETRC), como los de los aviones, coches o trenes, requieren más y más rendimiento garantizado para satisfacer la demanda al alza de rendimiento para funciones complejas y avanzadas de software. Aunque el incremento en rendimiento puede ser adquirido utilizando técnicas de arquitectura de procesadores actualmente utilizadas en la Computación de Altas Prestaciones (CAP) i en los dominios convencionales, este uso presenta retos para el análisis del tiempo de software, un paso necesario en la verificación y validación de SETRC. Las memorias caches son conocidas por su gran impacto en rendimiento y, de hecho, los actuales SETRC incluyen multicores normalmente con diversos niveles de cache. En esta línea, esta Tesis tiene como objetivo mejorar el rendimiento garantizado de los SETRC utilizando técnicas para caches y utilizando métodos como la randomización del tiempo y proveyendo garantías probabilísticas de tiempo de ejecución de las tareas. En esta Tesis, primero nos centramos en mejorar la colocación y el reemplazo de caches para mejorar el rendimiento garantizado. Para la colocación, diferentes políticas son exploradas en un sistema cache multi-nivel, y se llega a una solución donde diversas de estas políticas son combinadas. Para el reemplazo, analizamos un escenario patológico que ninguna política actual tiene en cuenta, y proponemos varias políticas que solucionan este escenario patológico. Para caches compartidas en multicores, observamos que la contención es causada principalmente por escrituras privadas que van a través de la cache compartida, pero usar una política de escritura retardada pura también tiene sus consecuencias. Proponemos un enfoque híbrido para mitigar la contención. Sobre esta solución, la siguiente contribución ataca un problema causado por la necesidad de mecanismos de fiabilidad en SETRC. Implementar fiabilidad cerca del núcleo del procesador tiene un impacto significativo en rendimiento. Una solución basada en anticipación se propone para mitigar el impacto en rendimiento. La siguiente contribución propone el primer prefetcher hardware para SETRC con una jerarquía de caches arbitraria. Por primera vez, se da una definición y taxonomía de anomalías temporales para Análisis Temporal Basado en Medidas. Después, nos centramos en una anomalía temporal concreta que puede pasar con caches y ofrecemos una solución que la tiene en cuenta en las estimaciones del tiempo de ejecución.Postprint (published version

    On the Determinism of Multi-core Processors

    Get PDF
    Hard real time systems are evolving in order to respond to the increasing demand in complex functionalities while taking advantage of newer hardware. Software development for safety critical systems has to comply with strict requirements that will facilitate the certification process. During this process, each part of the system is evaluated, requiring a certain level of assurance in order to provide confidence in the product. In particular there must be a level of confidence that the system behaves deterministically that may be based on functionality, resources and time. The success of system verification depends greatly on the capacity to determine its exact behavior. Nonetheless, hardware evolved in order to maximize the average computation power throughput with little to no regard to the deterministic aspect. Therefore modern architectural features of processors, like pipelines, cache memories and co-processors, make it hard to verify that all the needed properties are respected. The multi-core is furthermore difficult to analyze as the architecture employs mechanisms that compromise strong spatial and temporal partitioning when using shared resources without rigorous access control like shared caches or shared input/outputs. In this paper we identify and analyze the main sources of nondeterminism of the multi-cores with regard to the timing estimation. Precise determination of the worst case execution time is a challenging task even in single-core architectures. The problems are accentuated in the multi-core context mainly due to the resource sharing that can lead to highly complex interactions or to nondeterminism. Most of the units that generate behaviors that are hard to take into account can be deactivated, but it is not always easy to predict the impact on the performance. Nevertheless some of the features cannot be disabled (such as the out of order execution or some nondeterministic crossbar access policies) which leads to the invalidation of the respective platform for applications with high criticality level. We will address the problematic units, propose configuration or architecture guidelines and estimate their impact on the performance and determinism of the system

    Improving time predictability of shared hardware resources in real-time multicore systems : emphasis on the space domain

    Get PDF
    Critical Real-Time Embedded Systems (CRTES) follow a verification and validation process on the timing and functional correctness. This process includes the timing analysis that provides Worst-Case Execution Time (WCET) estimates to provide evidence that the execution time of the system, or parts of it, remain within the deadlines. A key design principle for CRTES is the incremental qualification, whereby each software component can be subject to verification and validation independently of any other component, with obvious benefits for cost. At timing level, this requires time composability, such that the timing behavior of a function is not affected by other functions. CRTES are experiencing an unprecedented growth with rising performance demands that have motivated the use of multicore architectures. Multicores can provide the performance required and bring the potential of integrating several software functions onto the same hardware. However, multicore contention in the access to shared hardware resources creates a dependence of the execution time of a task with the rest of the tasks running simultaneously. This dependence threatens time predictability and jeopardizes time composability. In this thesis we analyze and propose hardware solutions to be applied on current multicore designs for CRTES to improve time predictability and time composability, focusing on the on-chip bus and the memory controller. At hardware level, we propose new bus and memory controller designs that control and mitigate contention between different cores and allow to have time composability by design, also in the context of mixed-criticality systems. At analysis level, we propose contention prediction models that factor the impact of contenders and don¿t need modifications to the hardware. We also propose a set of Performance Monitoring Counters (PMC) that provide evidence about the contention. We give an special emphasis on the Space domain focusing on the Cobham Gaisler NGMP multicore processor, which is currently assessed by the European Space Agency for its future missions.Los Sistemas Críticos Empotrados de Tiempo Real (CRTES) siguen un proceso de verificación y validación para su correctitud funcional y temporal. Este proceso incluye el análisis temporal que proporciona estimaciones de el peor caso del tiempo de ejecución (WCET) para dar evidencia de que el tiempo de ejecución del sistema, o partes de él, permanecen dentro de los límites temporales. Un principio de diseño clave para los CRTES es la cualificación incremental, por la que cada componente de software puede ser verificado y validado independientemente del resto de componentes, con beneficios obvios para el coste. A nivel temporal, esto requiere composabilidad temporal, por la que el comportamiento temporal de una función no se ve afectado por otras funciones. CRTES están experimentando un crecimiento sin precedentes con crecientes demandas de rendimiento que han motivado el uso the arquitecturas multi-núcleo (multicore). Los procesadores multi-núcleo pueden proporcionar el rendimiento requerido y tienen el potencial de integrar varias funcionalidades software en el mismo hardware. A pesar de ello, la interferencia entre los diferentes núcleos que aparece en los recursos compartidos de os procesadores multi núcleo crea una dependencia del tiempo de ejecución de una tarea con el resto de tareas ejecutándose simultáneamente en el procesador. Esta dependencia amenaza la predictabilidad temporal y compromete la composabilidad temporal. En esta tésis analizamos y proponemos soluciones hardware para ser aplicadas en los diseños multi núcleo actuales para CRTES que mejoran la predictabilidad y composabilidad temporal, centrándose en el bus y el controlador de memoria internos al chip. A nivel de hardware, proponemos nuevos diseños de buses y controladores de memoria que controlan y mitigan la interferencia entre los diferentes núcleos y permiten tener composabilidad temporal por diseño, también en el contexto de sistemas de criticalidad mixta. A nivel de análisis, proponemos modelos de predicción de la interferencia que factorizan el impacto de los núcleos y no necesitan modificaciones hardware. También proponemos un conjunto de Contadores de Control del Rendimiento (PMC) que proporcionoan evidencia de la interferencia. En esta tésis, damós especial importancia al dominio espacial, centrándonos en el procesador mutli núcleo Cobham Gaisler NGMP, que está siendo actualmente evaluado por la Agencia Espacial Europea para sus futuras misiones

    Towards WCET Analysis of Multicore Architectures Using UPPAAL

    Get PDF
    To take full advantage of the increasingly used shared-memory multicore architectures, software algorithms will need to be parallelized over multiple threads. This means that threads will have to share resources (e.g. some level of cache) and communicate and synchronize with each other. There already exist software libraries (e.g. OpenMP) used to explicitly parallelize available sequential C/C++ and Fortran code, which means that parallel code could be easily obtained. To be able to use parallel software running on multicore architectures in embedded systems with hard real-time constraints, new WCET (Worst-Case Execution Time) analysis methods and tools must be developed. This paper investigates a method based on model-checking a system of timed automata using the UPPAAL tool box. It is found that it is possible to perform WCET analysis on (small) parallel systems using UPPAAL. We also show how to model thread synchronization using spinlock-like primitives
    corecore