187 research outputs found

    Software-based methods for Operating system dependability

    Get PDF
    Guaranteeing correct system behaviour in modern computer systems has become essential, in particular for safety-critical computer-based systems. However all modern systems are susceptible to transient faults that can disrupt the intended operation and function of such systems. In order to evaluate the sensitivity of such systems, different methods have been developed, and among them Fault Injection is considered a valid approach widely adopted. This document presents a fault injection tool, called Kernel-based Fault-Injection Tool Open-source (KITO), to analyze the effects of faults in memory elements containing kernel data structures belonging to a Unix-based Operating System and, in particular, elements involved in resources synchronization. This tool was evaluated in different stages of its development with different experimental analyses by performing Faults Injections in the Operating System, while the system was subject to stress from benchmark programs that use different elements of the Linux kernel. The results showed that KITO was capable of generating faults in different elements of the operating systems with limited intrusiveness, and that the data structures belonging to synchronization aspects of the kernel are susceptible to an appreciable set of possible errors ranging from performance degradation to complete system failure, thus preventing benchmark applications to perform their task. Finally, aiming at overcoming the vulnerabilities discovered with KITO, a couple of solutions have been proposed consisting in the implementation of hardening techniques in the source code of the Linux kernel, such as Triple Modular Redundancy and Error Detection And Correction codes. An experimental fault injection analysis has been conducted to evaluate the effectiveness of the proposed solutions. Results have shown that it is possible to successfully detect and correct the noxious effects generated by single faults in the system with a limited performance overhead in kernel data structures of the Linux kernel

    Toward the hardening of real-time operating systems

    Get PDF
    Safety and Mission-critical systems are evolving daily, requiring increasing levels of complexity in their design. While bare-metal single CPU systems were dedicated to such systems in the past, nowadays, multicore CPUs, GPUs, and other accelerators require more complex software management, with the need for an operating system controlling everything. The presence of the operating system opens more challenges to securing the final system’s full dependability. This paper analyses the hardening scenarios based on the evidence gathered by selective fault injection analysis of Real-Time Operating systems. While solutions might be delivered in different fashions, the emphasis on the paper is on the right approach to spot the sensitive part of the Operating system, saving the design from massive overheads

    Operating System Support for Redundant Multithreading

    Get PDF
    Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardware suffering from permanent and transient faults will continue to increase in future chip generations. Researchers proposed various solutions to this issue with different downsides: Specialized hardware components make hardware more expensive in production and consume additional energy at runtime. Fault-tolerant algorithms and libraries enforce specific programming models on the developer. Compiler-based fault tolerance requires the source code for all applications to be available for recompilation. In this thesis I present ASTEROID, an operating system architecture that integrates applications with different reliability needs. ASTEROID is built on top of the L4/Fiasco.OC microkernel and extends the system with Romain, an operating system service that transparently replicates user applications. Romain supports single- and multi-threaded applications without requiring access to the application's source code. Romain replicates applications and their resources completely and thereby does not rely on hardware extensions, such as ECC-protected memory. In my thesis I describe how to efficiently implement replication as a form of redundant multithreading in software. I develop mechanisms to manage replica resources and to make multi-threaded programs behave deterministically for replication. I furthermore present an approach to handle applications that use shared-memory channels with other programs. My evaluation shows that Romain provides 100% error detection and more than 99.6% error correction for single-bit flips in memory and general-purpose registers. At the same time, Romain's execution time overhead is below 14% for single-threaded applications running in triple-modular redundant mode. The last part of my thesis acknowledges that software-implemented fault tolerance methods often rely on the correct functioning of a certain set of hardware and software components, the Reliable Computing Base (RCB). I introduce the concept of the RCB and discuss what constitutes the RCB of the ASTEROID system and other fault tolerance mechanisms. Thereafter I show three case studies that evaluate approaches to protecting RCB components and thereby aim to achieve a software stack that is fully protected against hardware errors

    Special session: Operating systems under test: An overview of the significance of the operating system in the resiliency of the computing continuum

    Get PDF
    The computing continuum's actual trend is facing a growth in terms of devices with any degree of computational capability. Those devices may or may not include a full-stack, including the Operating System layer and the Application layer, or just facing pure bare-metal solutions. In either case, the reliability of the full system stack has to be guaranteed. It is crucial to provide data regarding the impact of faults at all system stack levels and potential hardening solutions to design highly resilient systems. While most of the work usually concentrates on the application reliability, the special session aims to provide a deep comprehension of the impact on the reliability of an embedded system when faults in the hardware substrate of the system stack surface at the Operating System layer. For this reason, we will cover a comparison from an application perspective when hardware faults happen in bare metal vs. real-time OS vs. general-purpose OS. Then we will go deeper within a FreeRTOS to evaluate the contribution of all parts of the OS. Eventually, the Special Session will propose some hardening techniques at the Operating System level by exploiting the scheduling capabilities

    ReSP: A Nonintrusive Transaction-Level Reflective MPSoC Simulation Platform for Design Space Exploration

    Full text link

    Contract Testing for Reliable Embedded Systems

    Get PDF
    Embedded systems comprise diverse technologies complicating their design. By creating virtual prototypes of the target system, Electronic System Level Design, the early analysis of a system composed by electronics and software is possible. However, the concrete interaction between hardware modules and between hardware and software is left for late development stages and real prototype making. Generally, interaction between components is assumed to be correct. However, it has to be assumed on development implicitly because interaction between components is not considered in the functionality design. While single components are mostly thoroughly tested and guarantee certain reliability levels, their interaction is based on often underspecified interfaces. Although component usage is mostly specified, operational constraints are often left out. Finally, not only the interaction between components but also with the environment and the user are not ensured. Generally, only functional integration tests are executed and corner-cases are left out, leaving uncovered faults that only manifest as failures later when their cost is higher. Therefore, this work aims at component interaction through specification of interfaces, test generation and real-time test execution. The specification is based on the design-by-contract approach of software that specifies semantics of component interaction in addition to the syntactical definition through functions. In the first part of this work, a specification for the interaction between hardware modules is given. With the automatic real-time test execution, fulfillment of specified preconditions for correct component operation can be checked. In component-based design, the component is trusted and thus, its functionality is assumed to be correct when certain postconditions are specified. In a correct component assembly, component postconditions fulfill preconditions of other components resulting in an operational system. The specification of preconditions follows the definition of environmental properties, acceptable input sequences for interfacing pins, as well as acceptable signal parameters, such as voltage levels, slope times, delays and glitches. Postconditions are defined by the description of a functionality accompanying constraints, such as timing. These parameters are automatically determined on operation by a testing circuit. Parameters that violate the specification are signaled by the testing circuit and failure is detected. The chosen parameters can give hint of the reason for the failure being an evidence of a circuit fault. In the example of an Inter-Integrated Circuit (I2C) communication system, we define contracts and show comparisons between contract violation, fault categorization and failure occurrence under signal fault injection. To complete this work, support for fault analysis on the electronic system level design is given. For this, the data transfers between the high-level models used in the design are augmented with the defined contract parameters. With a specific interface, digital faults are generated for transactions with violating signal parameters that can be tracked by the system. This way, recovery mechanisms for synchronous communication are proposed and tested. In the second part, the interaction between hardware and software is tackled providing special methods for developing device drivers. For this, we do not only specify the interface between hardware and software but also map the hardware control elements to software, partially generating the software interface for a device. This is necessary because drivers handle devices with internal control elements like registers, data streams and interrupts that cannot be represented on software. This systematic composition of drivers facilitates the development of a device interface called the device mechanism. It is the lowest layer of a two-layer architecture for driver development. The device mechanism carries out the access to the device exporting a pure software interface. This interface is based on the device implementation being, thus, fully specified. Further data processing required for compliance with the operating system or application is carried out in the driver policy, the layer on top of it. With the definition of a software layer for device control, contracts specifying constraints of this interface are proposed. These contracts are based on implementation constraints of the device and on its dynamic behavior. Therefore, an extended finite state machine models the dynamic behavior of the device. Based on it, functions of the device mechanism can be augmented with preconditions on the state or on state machine variables. These conditions are then checked on runtime. After execution of a function, its postconditions are ensured, such as timing. This guarantees that different driver policies, operating systems or firmwares, use this same device mechanism fulfilling its constraints. On the example of a Philips webcam, we develop the complete driver for Linux based on our architecture, creating contracts for its device mechanism. Following the systematic composition and the contract approach, driver bugs are avoided that otherwise violate allowed values for device data and execution orders of device protocols

    Advanced techniques for multi-variant execution

    Get PDF

    Evaluating Reliability against SEE of Embedded Systems: A Comparison of RTOS and Bare-metal Approaches

    Get PDF
    Embedded processors are widely used in critical applications such as space missions, where reliability is mandatory for the success of missions. Due to the increasing application complexity, the number of systems using Real-Time Operating Systems (RTOSs) is quickly growing to manage the execution of multiple applications and meet timing constraints. However, whether operating systems or bare-metal applications provide higher reliability is still being determined. We present a comprehensive reliability analysis of software applications running on a device with bare-metal and FreeRTOS against the same faults based on fault models derived from a proton test. Additionally, the FreeRTOS system has been evaluated with a set of software applications dedicated to evaluating specific RTOS functions, providing an additional evaluation for operations crucial for a real-time operating system

    Timing analysis of an embedded architecture for a real-time power line communications network

    Get PDF
    Tese de mestrado. Engenharia Electrotécnica e de Computadores (Área de especialização de Telecomunicações). Faculdade de Engenharia. Universidade do Porto, Instituto Superior de Engenharia. Instituto Politécnico do Porto.. 200
    corecore