30 research outputs found

    pTNoC: Probabilistically time-analyzable tree-based NoC for mixed-criticality systems

    Get PDF
    The use of networks-on-chip (NoC) in real-time safety-critical multicore systems challenges deriving tight worst-case execution time (WCET) estimates. This is due to the complexities in tightly upper-bounding the contention in the access to the NoC among running tasks. Probabilistic Timing Analysis (PTA) is a powerful approach to derive WCET estimates on relatively complex processors. However, so far it has only been tested on small multicores comprising an on-chip bus as communication means, which intrinsically does not scale to high core counts. In this paper we propose pTNoC, a new tree-based NoC design compatible with PTA requirements and delivering scalability towards medium/large core counts. pTNoC provides tight WCET estimates by means of asymmetric bandwidth guarantees for mixed-criticality systems with negligible impact on average performance. Finally, our implementation results show the reduced area and power costs of the pTNoC.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under the PROXIMA Project (www.proxima-project.eu), grant agreement no 611085. This work has also been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Mladen Slijepcevic is funded by the Obra Social Fundación la Caixa under grant Doctorado “la Caixa” - Severo Ochoa. Carles Hern´andez is jointly funded by the Spanish Ministry of Economy and Competitiveness (MINECO) and FEDER funds through grant TIN2014-60404-JIN. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    A Novel Method for Online Detection of Faults Affecting Execution-Time in Multicore-Based Systems

    Get PDF
    This article proposes a bounded interference method, based on statistical evaluations, for online detection and tolerance of any fault capable of causing a deadline miss. The proposed method requires data that can be gathered during the profiling and worst-case execution time (WCET) analysis phase. This article describes the method, its application, and then it presents an avionic mixed-criticality use case for experimental evaluation, considering both dual-core and quad-core platforms. Results show that faults that can cause a timing violation are correctly identified while other faults that do not introduce a significant temporal interference can be tolerated to avoid high recovery overheads

    Multi-core devices for safety-critical systems: a survey

    Get PDF
    Multi-core devices are envisioned to support the development of next-generation safety-critical systems, enabling the on-chip integration of functions of different criticality. This integration provides multiple system-level potential benefits such as cost, size, power, and weight reduction. However, safety certification becomes a challenge and several fundamental safety technical requirements must be addressed, such as temporal and spatial independence, reliability, and diagnostic coverage. This survey provides a categorization and overview at different device abstraction levels (nanoscale, component, and device) of selected key research contributions that support the compliance with these fundamental safety requirements.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2015-65316-P, Basque Government under grant KK-2019-00035 and the HiPEAC Network of Excellence. The Spanish Ministry of Economy and Competitiveness has also partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013-14717).Peer ReviewedPostprint (author's final draft

    On the interplay of computation and memory regulation in multicore real-time systems

    Full text link
    The ever-increasing demand for high performance in the time-critical embedded domain has pushed the adoption of powerful yet unpredictable heterogeneous Systems-on-a-Chip. The shared memory subsystem, which is known to be a major source of unpredictability, has been extensively studied, and many mitigation techniques have been proposed. Among them, performance-counter-based regulation techniques have seen widespread adoption. However, the problem of combining performance-based regulation with time-domain isolation has not received enough attention. In this article, we discuss our current work-in-progress on SHCReg (Software Hardware Co-design Regulator). First, we assess the limitations and benefits of combined CPU and memory budgeting. Next, we outline a full-stack hardware/software codesign architecture that aims at improving the interplay between CPU and memory isolation for mixed-criticality tasks running on the same core.National Science FoundationAccepted manuscrip

    Modeling RTL Fault Models Behavior to Increase the Confidence on TSIM-based Fault Injection

    Get PDF
    Future high-performance safety-relevant applications require microcontrollers delivering higher performance than the existing certified ones. However, means for assessing their dependability are needed so that they can be certified against safety critical certification standars (e.g ISO26262). Dependability assessment analyses performed at high level of abstraction inject single faults to investigate the effects these have in the system. In this work we show that single faults do not comprise the whole picture, due to fault multiplicities and reactivations. Later we prove that, by injecting complex fault models that consider multiplicities and reactivations in higher levels of abstraction, results are substantially different, thus indicating that a change in the methodology is needed.The research leading to these results has received funding from the Ministry of Science and Technology of Spain under contract TIN2015-65316-P and the HiPEAC Network of Excellence. Carles Hern´andez is jointly funded by the Spanish Ministry of Economy and Competitiveness (MINECO) and FEDER funds through grant TIN2014-60404-JIN. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Postprint (author's final draft

    Ordonnancement des systèmes avec différents niveaux de criticité

    Get PDF
    Real-time safety-critical systems must complete their tasks within a given time limit. Failure to successfully perform their operations, or missing a deadline, can have severe consequences such as destruction of property and/or loss of life. Examples of such systems include automotive systems, drones and avionics among others. Safety guarantees must be provided before these systems can be deemed usable. This is usually done through certification performed by a certification authority.Safety evaluation and certification are complicated and costly even for smaller systems.One answer to these difficulties is the isolation of the critical functionality. Executing tasks of different criticalities on separate platforms prevents non-critical tasks from interfering with critical ones, provides a higher guaranty of safety and simplifies the certification process limiting it to only the critical functions. But this separation, in turn, introduces undesirable results portrayed by an inefficient resource utilization, an increase in the cost, weight, size and energy consumption which can put a system in a competitive disadvantage.To overcome the drawbacks of isolation, Mixed Criticality (MC) systems can be used. These systems allow functionalities with different criticalities to execute on the same platform. In 2007, Vestal proposed a model to represent MC-systems where tasks have multiple Worst Case Execution Times (WCETs), one for each criticality level. In addition, correctness conditions for scheduling policies were formally defined, allowing lower criticality jobs to miss deadlines or be even dropped in cases of failure or emergency situations.The introduction of multiple WCETs and different conditions for correctness increased the difficulty of the scheduling problem for MC-systems. Conventional scheduling policies and schedulability tests proved inadequate and the need for new algorithms arose. Since then, a lot of work has been done in this field.In this thesis, we contribute to the study of schedulability in MC-systems. The workload of a system is represented as a set of jobs that can describe the execution over the hyper-period of tasks or over a duration in time. This model allows us to study the viability of simulation-based correctness tests in MC-systems. We show that simulation tests can still be used in mixed-criticality systems, but in this case, the schedulability of the worst case scenario is no longer sufficient to guarantee the schedulability of the system even for the fixed priority scheduling case. We show that scheduling policies are not predictable in general, and define the concept of weak-predictability for MC-systems. We prove that a specific class of fixed priority policies are weakly predictable and propose two simulation-based correctness tests that work for weakly-predictable policies.We also demonstrate that contrary to what was believed, testing for correctness can not be done only through a linear number of preemptions.The majority of the related work focuses on systems of two criticality levels due to the difficulty of the problem. But for automotive and airborne systems, industrial standards define four or five criticality levels, which motivated us to propose a scheduling algorithm that schedules mixed-criticality systems with theoretically any number of criticality levels. We show experimentally that it has higher success rates compared to the state of the art.We illustrate how our scheduling algorithm, or any algorithm that generates a single time-triggered table for each criticality mode, can be used as a recovery strategy to ensure the safety of the system in case of certain failures.Finally, we propose a high level concurrency language and a model for designing an MC-system with coarse grained multi-core interference.Les systèmes temps-réel critiques doivent exécuter leurs tâches dans les délais impartis. En cas de défaillance, des événements peuvent avoir des catastrophes économiques. Des classifications des défaillances par rapport aux niveaux des risques encourus ont été établies, en particulier dans les domaines des transports aéronautique et automobile. Des niveaux de criticité sont attribués aux différentes fonctions des systèmes suivant les risques encourus lors d'une défaillance et des probabilités d'apparition de celles-ci. Ces différents niveaux de criticité influencent les choix d'architecture logicielle et matérielle ainsi que le type de composants utilisés pour sa réalisation. Les systèmes temps-réels modernes ont tendance à intégrer sur une même plateforme de calcul plusieurs applications avec différents niveaux de criticité. Cette intégration est nécessaire pour des systèmes modernes comme par exemple les drones (UAV) afin de réduire le coût, le poids et la consommation d'énergie. Malheureusement, elle conduit à des difficultés importantes lors de leurs conceptions. En plus, ces systèmes doivent être certifiés en prenant en compte ces différents niveaux de criticités.Il est bien connu que le problème d'ordonnancement des systèmes avec différents niveaux de criticités représente un des plus grand défi dans le domaine de systèmes temps-réel. Les techniques traditionnelles proposent comme solution l’isolation complète entre les niveaux de criticité ou bien une certification globale au plus haut niveau. Malheureusement, une telle solution conduit à une mauvaise des ressources et à la perte de l’avantage de cette intégration. En 2007, Vestal a proposé un modèle pour représenter les systèmes avec différents niveaux de criticité dont les tâches ont plusieurs temps d’exécution, un pour chaque niveau de criticité. En outre, les conditions de validité des stratégies d’ordonnancement ont été définies de manière formelle, permettant ainsi aux tâches les moins critiques d’échapper aux délais, voire d’être abandonnées en cas de défaillance ou de situation d’urgence.Les politiques de planification conventionnelles et les tests d’ordonnoncement se sont révélés inadéquats.Dans cette thèse, nous contribuons à l’étude de l’ordonnancement dans les systèmes avec différents niveaux de criticité. La surcharge d'un système est représentée sous la forme d'un ensemble de tâches pouvant décrire l'exécution sur l'hyper-période de tâches ou sur une durée donnée. Ce modèle nous permet d’étudier la viabilité des tests de correction basés sur la simulation pour les systèmes avec différents niveaux de criticité. Nous montrons que les tests de simulation peuvent toujours être utilisés pour ces systèmes, et la possibilité de l’ordonnancement du pire des scénarios ne suffit plus, même pour le cas de l’ordonnancement avec priorité fixe. Nous montrons que les politiques d'ordonnancement ne sont généralement pas prévisibles. Nous définissons le concept de faible prévisibilité pour les systèmes avec différents niveaux de criticité et nous montrons ensuite qu'une classe spécifique de stratégies à priorité fixe sont faiblement prévisibles. Nous proposons deux tests de correction basés sur la simulation qui fonctionnent pour des stratégies faiblement prévisibles.Nous montrons également que, contrairement à ce que l’on croyait, le contrôle de l’exactitude ne peut se faire que par l’intermédiaire d’un nombre linéaire de préemptions.La majorité des travaux reliés à notre domaine portent sur des systèmes à deux niveaux de criticité en raison de la difficulté du problème. Mais pour les systèmes automobiles et aériens, les normes industrielles définissent quatre ou cinq niveaux de criticité, ce qui nous a motivés à proposer un algorithme de planification qui planifie les systèmes à criticité mixte avec théoriquement un nombre quelconque de niveaux de criticité. Nous montrons expérimentalement que le taux de réussite est supérieur à celui de l’état de la technique

    Mixed-Criticality Systems on Commercial-Off-the-Shelf Multi-Processor Systems-on-Chip

    Get PDF
    Avionics and space industries are struggling with the adoption of technologies like multi-processor system-on-chips (MPSoCs) due to strict safety requirements. This thesis propose a new reference architecture for MPSoC-based mixed-criticality systems (MCS) - i.e., systems integrating applications with different level of criticality - which are a common use case for aforementioned industries. This thesis proposes a system architecture capable of granting partitioning - which is, for short, the property of fault containment. It is based on the detection of spatial and temporal interference, and has been named the online detection of interference (ODIn) architecture. Spatial partitioning requires that an application is not able to corrupt resources used by a different application. In the architecture proposed in this thesis, spatial partitioning is implemented using type-1 hypervisors, which allow definition of resource partitions. An application running in a partition can only access resources granted to that partition, therefore it cannot corrupt resources used by applications running in other partitions. Temporal partitioning requires that an application is not able to unexpectedly change the execution time of other applications. In the proposed architecture, temporal partitioning has been solved using a bounded interference approach, composed of an offline analysis phase and an online safety net. The offline phase is based on a statistical profiling of a metric sensitive to temporal interference’s, performed in nominal conditions, which allows definition of a set of three thresholds: 1. the detection threshold TD; 2. the warning threshold TW ; 3. the α threshold. Two rules of detection are defined using such thresholds: Alarm rule When the value of the metric is above TD. Warning rule When the value of the metric is in the warning region [TW ;TD] for more than α consecutive times. ODIn’s online safety-net exploits performance counters, available in many MPSoC architectures; such counters are configured at bootstrap to monitor the selected metric(s), and to raise an interrupt request (IRQ) in case the metric value goes above TD, implementing the alarm rule. The warning rule is implemented in a software detection module, which reads the value of performance counters when the monitored task yields control to the scheduler and reset them if there is no detection. ODIn also uses two additional detection mechanisms: 1. a control flow check technique, based on compile-time defined block signatures, is implemented through a set of watchdog processors, each monitoring one partition. 2. a timeout is implemented through a system watchdog timer (SWDT), which is able to send an external signal when the timeout is violated. The recovery actions implemented in ODIn are: • graceful degradation, to react to IRQs of WDPs monitoring non-critical applications or to warning rule violations; it temporarily stops non-critical applications to grant resources to the critical application; • hard recovery, to react to the SWDT, to the WDP of the critical application, or to alarm rule violations; it causes a switch to a hot stand-by spare computer. Experimental validation of ODIn was performed on two hardware platforms: the ZedBoard - dual-core - and the Inventami board - quad-core. A space benchmark and an avionic benchmark were implemented on both platforms, composed by different modules as showed in Table 1 Each version of the final application was evaluated through fault injection (FI) campaigns, performed using a specifically designed FI system. There were three types of FI campaigns: 1. HW FI, to emulate single event effects; 2. SW FI, to emulate bugs in non-critical applications; 3. artificial bug FI, to emulate a bug in non-critical applications introducing unexpected interference on the critical application. Experimental results show that ODIn is resilient to all considered types of faul

    Enabling Software Technologies for Critical COTS-based Spacecraft Systems

    Get PDF
    In this position article, we motivate the necessity to introduce three software methods in spacecraft computing platforms in order to enable to use COTS components: SIHFT, mixed-criticality, and probabilistic timing analysis. We investigate the benefits and the drawbacks of these techniques, especially in terms of safety, by also analyzing the standards to identify the current limitations that do not allow such techniques to be used. Finally, we recap current and future works, highlighting possible changes to standards

    Real-time systems on multicore platforms: managing hardware resources for predictable execution

    Full text link
    Shared hardware resources in commodity multicore processors are subject to contention from co-running threads. The resultant interference can lead to highly-variable performance for individual applications. This is particularly problematic for real-time applications, which require predictable timing guarantees. It also leads to a pessimistic estimate of the Worst Case Execution Time (WCET) for every real-time application. More CPU time needs to be reserved, thus less applications can enter the system. As the average execution time is usually far less than the WCET, a significant amount of reserved CPU resource would be wasted. Previous works have attempted partitioning the shared resources, amongst either CPUs or processes, to improve performance isolation. However, they have not proven to be both efficient and effective. In this thesis, we propose several mechanisms and frameworks that manage the shared caches and memory buses on multicore platforms. Firstly, we introduce a multicore real-time scheduling framework with the foreground/background scheduling model. Combining real-time load balancing with background scheduling, CPU utilization is greatly improved. Besides, a memory bus management mechanism is implemented on top of the background scheduling, making sure bus contention is under control while utilizing unused CPU cycles. Also, cache partitioning is thoroughly studied in this thesis, with a cache-aware load balancing algorithm and a dynamic cache partitioning framework proposed. Lastly, we describe a system architecture to integrate the above solutions all together. It tackles one of the toughest problems in OS innovation, legacy support, by converting existing OSes into libraries in a virtualized environment. Thus, within a single multicore platform, we benefit from the fine-grained resource control of a real-time OS and the richness of functionality of a general-purpose OS
    corecore