11 research outputs found

    Timing Predictability in Future Multi-Core Avionics Systems

    Full text link

    Near-optimal scheduling and decision-making models for reactive and proactive fault tolerance mechanisms

    Get PDF
    As High Performance Computing (HPC) systems increase in size to fulfill computational power demand, the chance of failure occurrences dramatically increases, resulting in potentially large amounts of lost computing time. Fault Tolerance (FT) mechanisms aim to mitigate the impact of failure occurrences to the running applications. However, the overhead of FT mechanisms increases proportionally to the HPC systems\u27 size. Therefore, challenges arise in handling the expensive overhead of FT mechanisms while minimizing the large amount of lost computing time due to failure occurrences. In this dissertation, a near-optimal scheduling model is built to determine when to invoke a hybrid checkpoint mechanism, by means of stochastic processes and calculus of variations. The obtained schedule minimizes the waste time caused by checkpoint mechanism and failure occurrences. Generally, the checkpoint/restart mechanisms periodically save application states and load the saved state, upon failure occurrences. Furthermore, to handle various FT mechanisms, an adaptive decision-making model has been developed to determine the best FT strategy to invoke at each decision point. The best mechanism at each decision point is selected among considered FT mechanisms to globally minimize the total waste time for an application execution by means of a dynamic programming approach. In addition, the model is adaptive to deal with changes in failure rate over time

    Safety and security of cyber-physical systems

    Get PDF
    The number of embedded controllers in charge of physical systems has rapidly increased over the past years. Embedded controllers are present in every aspect of our lives, from our homes to our vehicles and factories. The complexity of these systems is also more than ever. These systems are expected to deliver many features and high performance without trading off in robustness and assurance. As systems increase in complexity, however, the cost of formally verifying their correctness and eliminating security vulnerabilities can quickly explode. On top of the unintentional bugs and problems, malicious attacks on cyber-physical systems (CPS) can also lead to adverse outcomes on physical plants. Some of the recent attacks on CPS are focused on causing physical damage to the plants or the environment. Such intruders make their way into the system using cyber exploits but then initiate actions that can destabilize and even damage the underlying (physical) systems. Given the reality mentioned above and the reliability standards of the industry, there is a need to embrace new CPS design paradigms where faults and security vulnerabilities are the norms rather than an anomaly. Such imperfections must be assumed to exist in every system and component unless it is formally verified and scanned. Faults and vulnerabilities should be safely handled and the CPS must be able to recover from them at run-time. Our goal in this work is to introduce and investigate a few designs compatible with this paradigm. The architectures and techniques proposed in this dissertation do not rely on the testing and complete system verification. Instead, they enforce safety at the highest level of the system and extend guaranteed safety from a few certified components to the entire system. These solutions are carefully curated to utilize unverified components and provide guaranteed performance

    Theoretical Analysis of Real-Time Scheduling on Resources with Performance Degradation and Periodic Rejuvenation

    No full text
    In 1973, Liu and Layland published their seminal paper on schedulability analysis of real-time system for both EDF and RM schedulers. In this work, they provide schedulability conditions and schedulability utilization bounds for both EDF and RM scheduling algorithms, respectively. In the following four decades, scheduling algorithms, utilization bounds and schedulability analyses for real-time tasks have been studied intensively. Amongst those studies, most of the research relies on a strong assumption that the performance of a computing resource does not change during its lifetime. Unfortunately, for many long standing real-time systems, such as data acquisition systems (DAQ), deep-space exploration programs and SCADA systems for power, water and other national infrastructures, the performance of computational resources suffer notably performance degradations after a long and continuous execution period. To overcome the performance degradation in long standing systems, countermeasures, which are also called system rejuvenation approaches in the literature, were introduced and studied in depth in the last two decades. Rejuvenation approaches recover system performance when being invoked and hence benefit most long standing applications. However, for applications with real-time requirements, the system downtime caused by rejuvenation process, along with the decreasing performance during the system's available time, makes the existing real-time scheduling theories difficult to be applied directly. To address this problem, this thesis studies the schedulability issues of a real-time task set running on long standing computing systems that suffers performance degradation and uses rejuvenation mechanism to recover. Our first study in the thesis focuses on a simpler resource model, i.e. the periodic resource model, which only considers periodic rejuvenations. We introduce a method, i.e., Periodic Resource Integration, to combine multiple periodic resources into a single equivalent periodic resource and provide the schedulability analysis based on the combined periodic resource for real-time tasks. By integrating multiple periodic resources into one, existing real-time scheduling researches on single periodic resource can be directly applied on multiple periodic resources. In our second study, we extend the periodic resource mode to a new resource model, the P2-resource model, in our second work to characterize resources with both the performance degradation and the periodic rejuvenation. We formally define the P2-resource and analyze the schedulability of real-time task sets on a P2-resource. In particular, we first analyze the resource supply status of a given P2-resource and provide its supply bound and linear supply bound functions. We then developed the schedulability conditions for a task set running on a P2-resource with EDF or RM scheduling algorithms, respectively. We further derive utilization bounds of both EDF and RM scheduling algorithms, respectively, for schedulability test purposes. With the P2-resource model and the schedulability analysis on a single P2-resource, we further extend our work to multiple P2-resources. In this research, we 1) analyze the schedulability of a real-time task set on multiple P2-resources under fixed-priority scheduling algorithm, 2) introduce the GP-RM-P2 algorithm and 3) provide the utilization bound for this algorithm. Simulation results show that in most cases, the sufficient bounds we provide are tight. As the rejuvenation technology keeps advancing, many systems are now able to perform rejuvenations in different system layers. To accommodate this new advance, we study the schedulability conditions of a real-time task set on a single P2-resource with both cold or warm rejuvenations. We introduce a new resource model, the P 2-resource with duel-level rejuvenation, i.e., P 2D-resource, to accommodate this new feature. We first study the supply bound and the linear supply bound of a given P2 D-resource. We then study the sufficient utilization bounds for both RM and EDF scheduling algorithms, respectively

    THEORETICAL ANALYSIS OF REAL-TIME SCHEDULING ON RESOURCES WITH PERFORMANCE DEGRADATION AND PERIODIC REJUVENATION

    No full text
    In 1973, Liu and Layland [81] published their seminal paper on schedulability analysis of real-time system for both EDF and RM schedulers. In this work, they provide schedulability conditions and schedulability utilization bounds for both EDF and RM scheduling algorithms, respectively. In the following four decades, scheduling algorithms, utilization bounds and schedulability analyses for real-time tasks have been studied intensively. Amongst those studies, most of the research rely on a strong assumption that the performance of a computing resource does not change during its lifetime. Unfortunately, for many long standing real-time systems, such as data acquisition systems (DAQ) [74, 99], deep-space exploration programs [120, 119] and SCADA systems for power, water and other national infrastructures [121, 26], the performance of computational resources suffer notably performance degradations after a long and continuous execution period [61]. To overcome the performance degradation in long standing systems, countermeasures, which are also called system rejuvenation approaches in the literature [123, 61, 126], were introduced and studied in depth in the last two decades. Rejuvenation approaches recover system performance when being invoked and hence benefit most long standing applications [30, 102, 11, 12, 39]. However, for applications with real-time requirements, the system downtime caused by rejuvenation process, along with the decreasing performance during the system’s available time, makes the existing real-time scheduling theories difficult to be applied directly. To address this problem, this thesis studies the schedulability issues of a realtime task set running on long standing computing systems that suffers performance degradation and uses rejuvenation mechanism to recover. Our first study in the thesis focus on a simpler resource model, i.e. the periodic resource model, which only considers periodic rejuvenations. We introduce a method, i.e., Periodic Resource Integration, to combine multiple periodic resources into a single equivalent periodic resource and provide the schedulability analysis based on the combined periodic resource for real-time tasks. By integrating multiple periodic resources into one, existing real-time scheduling researches on single periodic resource can be directly applied on multiple periodic resources. In our second study, we extend the periodic resource mode to a new resource model, the P2-resource model, in our second work to characterize resources with both the performance degradation and the periodic rejuvenation. We formally define the P2-resource and analyze the schedulability of real-time task sets on a P2-resource. In particular, we first analyze the resource supply status of a given P2-resource and provide its supply bound and linear supply bound functions. We then developed the schedulability conditions for a task set running on a P2-resource with EDF or RM scheduling algorithms, respectively. We further derive utilization bounds of both EDF and RM scheduling algorithms, respectively, for schedulability test purposes. With the P2-resource model and the schedulability analysis on a single P2- resource, we further extend our work to multiple P2-resources. In this research, we 1) analyze the schedulability of a real-time task set on multiple P2-resources under fixedpriority scheduling algorithm, 2) introduce the GP-RM-P2 algorithm and 3) provide the utilization bound for this algorithm. Simulation results show that in most cases, the sufficient bounds we provide are tight. As the rejuvenation technology keeps advancing, many systems are now able to perform rejuvenations in different system layers. To accommodate this new advances, we study the schedulability conditions of a real-time task set on a single P2-resource with both cold or warm rejuvenations. We introduce a new resource model, the P2-resource with duel-level rejuvenation, i.e., P2D-resource, to accommodate this new feature. We first study the supply bound and the linear supply bound of a given P2D-resource. We then study the sufficient utilization bounds for both RM and EDF scheduling algorithms, respectively.Ph.D. in Computer Science, July 201

    Service-based Fault Tolerance for Cyber-Physical Systems: A Systems Engineering Approach

    Get PDF
    Cyber-physical systems (CPSs) comprise networked computing units that monitor and control physical processes in feedback loops. CPSs have potential to change the ways people and computers interact with the physical world by enabling new ways to control and optimize systems through improved connectivity and computing capabilities. Compared to classical control theory, these systems involve greater unpredictability which may affect the stability and dynamics of the physical subsystems. Further uncertainty is introduced by the dynamic and open computing environments with rapidly changing connections and system configurations. However, due to interactions with the physical world, the dependable operation and tolerance of failures in both cyber and physical components are essential requirements for these systems.The problem of achieving dependable operations for open and networked control systems is approached using a systems engineering process to gain an understanding of the problem domain, since fault tolerance cannot be solved only as a software problem due to the nature of CPSs, which includes close coordination among hardware, software and physical objects. The research methodology consists of developing a concept design, implementing prototypes, and empirically testing the prototypes. Even though modularity has been acknowledged as a key element of fault tolerance, the fault tolerance of highly modular service-oriented architectures (SOAs) has been sparsely researched, especially in distributed real-time systems. This thesis proposes and implements an approach based on using loosely coupled real-time SOA to implement fault tolerance for a teleoperation system.Based on empirical experiments, modularity on a service level can be used to support fault tolerance (i.e., the isolation and recovery of faults). Fault recovery can be achieved for certain categories of faults (i.e., non-deterministic and aging-related) based on loose coupling and diverse operation modes. The proposed architecture also supports the straightforward integration of fault tolerance patterns, such as FAIL-SAFE, HEARTBEAT, ESCALATION and SERVICE MANAGER, which are used in the prototype systems to support dependability requirements. For service failures, systems rely on fail-safe behaviours, diverse modes of operation and fault escalation to backup services. Instead of using time-bounded reconfiguration, services operate in best-effort capabilities, providing resilience for the system. This enables, for example, on-the-fly service changes, smooth recoveries from service failures and adaptations to new computing environments, which are essential requirements for CPSs.The results are combined into a systems engineering approach to dependability, which includes an analysis of the role of safety-critical requirements for control system software architecture design, architectural design, a dependability-case development approach for CPSs and domain-specific fault taxonomies, which support dependability case development and system reliability analyses. Other contributions of this work include three new patterns for fault tolerance in CPSs: DATA-CENTRIC ARCHITECTURE, LET IT CRASH and SERVICE MANAGER. These are presented together with a pattern language that shows how they relate to other patterns available for the domain
    corecore