36 research outputs found

    Applying Hypervisor-Based Fault Tolerance Techniques to Safety-Critical Embedded Systems

    Get PDF
    This document details the work conducted through the development of this thesis, and it is structured as follows: • Chapter 1, Introduction, has briefly presented the motivation, objectives, and contributions of this thesis. • Chapter 2, Fundamentals, exposes a series of concepts that are necessary to correctly understand the information presented in the rest of the thesis, such as the concepts of virtualization, hypervisors, or software-based fault tolerance. In addition, this chapter includes an exhaustive review and comparison between the different hypervisors used in scientific studies dealing with safety-critical systems, and a brief review of some works that try to improve fault tolerance in the hypervisor itself, an area of research that is outside the scope of this work, but that complements the mechanism presented and could be established as a line of future work. • Chapter 3, Problem Statement and Related Work, explains the main reasons why the concept of Hypervisor-Based Fault Tolerance was born and reviews the main articles and research papers on the subject. This review includes both papers related to safety-critical embedded systems (such as the research carried out in this thesis) and papers related to cloud servers and cluster computing that, although not directly applicable to embedded systems, may raise useful concepts that make our solution more complete or allow us to establish future lines of work. • Chapter 4, Proposed Solution, begins with a brief comparison of the work presented in Chapter 3 to establish the requirements that our solution must meet in order to be as complete and innovative as possible. It then sets out the architecture of the proposed solution and explains in detail the two main elements of the solution: the Voter and the Health Monitoring partition. • Chapter 5, Prototype, explains in detail the prototyping of the proposed solution, including the choice of the hypervisor, the processing board, and the critical functionality to be redundant. With respect to the voter, it includes prototypes for both the software version (the voter is implemented in a virtual machine) and the hardware version (the voter is implemented as IP cores on the FPGA). • Chapter 6, Evaluation, includes the evaluation of the prototype developed in Chapter 5. As a preliminary step and given that there is no evidence in this regard, an exercise is carried out to measure the overhead involved in using the XtratuM hypervisor versus not using it. Subsequently, qualitative tests are carried out to check that Health Monitoring is working as expected and a fault injection campaign is carried out to check the error detection and correction rate of our solution. Finally, a comparison is made between the performance of the hardware and software versions of Voter. • Chapter 7, Conclusions and Future Work, is dedicated to collect the conclusions obtained and the contributions made during the research (in the form of articles in journals, conferences and contributions to projects and proposals in the industry). In addition, it establishes some lines of future work that could complete and extend the research carried out during this doctoral thesis.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Katzalin Olcoz Herrero.- Secretario: Félix García Carballeira.- Vocal: Santiago Rodríguez de la Fuent

    Run-time management for future MPSoC platforms

    Get PDF
    In recent years, we are witnessing the dawning of the Multi-Processor Systemon- Chip (MPSoC) era. In essence, this era is triggered by the need to handle more complex applications, while reducing overall cost of embedded (handheld) devices. This cost will mainly be determined by the cost of the hardware platform and the cost of designing applications for that platform. The cost of a hardware platform will partly depend on its production volume. In turn, this means that ??exible, (easily) programmable multi-purpose platforms will exhibit a lower cost. A multi-purpose platform not only requires ??exibility, but should also combine a high performance with a low power consumption. To this end, MPSoC devices integrate computer architectural properties of various computing domains. Just like large-scale parallel and distributed systems, they contain multiple heterogeneous processing elements interconnected by a scalable, network-like structure. This helps in achieving scalable high performance. As in most mobile or portable embedded systems, there is a need for low-power operation and real-time behavior. The cost of designing applications is equally important. Indeed, the actual value of future MPSoC devices is not contained within the embedded multiprocessor IC, but in their capability to provide the user of the device with an amount of services or experiences. So from an application viewpoint, MPSoCs are designed to ef??ciently process multimedia content in applications like video players, video conferencing, 3D gaming, augmented reality, etc. Such applications typically require a lot of processing power and a signi??cant amount of memory. To keep up with ever evolving user needs and with new application standards appearing at a fast pace, MPSoC platforms need to be be easily programmable. Application scalability, i.e. the ability to use just enough platform resources according to the user requirements and with respect to the device capabilities is also an important factor. Hence scalability, ??exibility, real-time behavior, a high performance, a low power consumption and, ??nally, programmability are key components in realizing the success of MPSoC platforms. The run-time manager is logically located between the application layer en the platform layer. It has a crucial role in realizing these MPSoC requirements. As it abstracts the platform hardware, it improves platform programmability. By deciding on resource assignment at run-time and based on the performance requirements of the user, the needs of the application and the capabilities of the platform, it contributes to ??exibility, scalability and to low power operation. As it has an arbiter function between different applications, it enables real-time behavior. This thesis details the key components of such an MPSoC run-time manager and provides a proof-of-concept implementation. These key components include application quality management algorithms linked to MPSoC resource management mechanisms and policies, adapted to the provided MPSoC platform services. First, we describe the role, the responsibilities and the boundary conditions of an MPSoC run-time manager in a generic way. This includes a de??nition of the multiprocessor run-time management design space, a description of the run-time manager design trade-offs and a brief discussion on how these trade-offs affect the key MPSoC requirements. This design space de??nition and the trade-offs are illustrated based on ongoing research and on existing commercial and academic multiprocessor run-time management solutions. Consequently, we introduce a fast and ef??cient resource allocation heuristic that considers FPGA fabric properties such as fragmentation. In addition, this thesis introduces a novel task assignment algorithm for handling soft IP cores denoted as hierarchical con??guration. Hierarchical con??guration managed by the run-time manager enables easier application design and increases the run-time spatial mapping freedom. In turn, this improves the performance of the resource assignment algorithm. Furthermore, we introduce run-time task migration components. We detail a new run-time task migration policy closely coupled to the run-time resource assignment algorithm. In addition to detailing a design-environment supported mechanism that enables moving tasks between an ISP and ??ne-grained recon??gurable hardware, we also propose two novel task migration mechanisms tailored to the Network-on-Chip environment. Finally, we propose a novel mechanism for task migration initiation, based on reusing debug registers in modern embedded microprocessors. We propose a reactive on-chip communication management mechanism. We show that by exploiting an injection rate control mechanism it is possible to provide a communication management system capable of providing a soft (reactive) QoS in a NoC. We introduce a novel, platform independent run-time algorithm to perform quality management, i.e. to select an application quality operating point at run-time based on the user requirements and the available platform resources, as reported by the resource manager. This contribution also proposes a novel way to manage the interaction between the quality manager and the resource manager. In order to have a the realistic, reproducible and ??exible run-time manager testbench with respect to applications with multiple quality levels and implementation tradev offs, we have created an input data generation tool denoted Pareto Surfaces For Free (PSFF). The the PSFF tool is, to the best of our knowledge, the ??rst tool that generates multiple realistic application operating points either based on pro??ling information of a real-life application or based on a designer-controlled random generator. Finally, we provide a proof-of-concept demonstrator that combines these concepts and shows how these mechanisms and policies can operate for real-life situations. In addition, we show that the proposed solutions can be integrated into existing platform operating systems

    GPU devices for safety-critical systems: a survey

    Get PDF
    Graphics Processing Unit (GPU) devices and their associated software programming languages and frameworks can deliver the computing performance required to facilitate the development of next-generation high-performance safety-critical systems such as autonomous driving systems. However, the integration of complex, parallel, and computationally demanding software functions with different safety-criticality levels on GPU devices with shared hardware resources contributes to several safety certification challenges. This survey categorizes and provides an overview of research contributions that address GPU devices’ random hardware failures, systematic failures, and independence of execution.This work has been partially supported by the European Research Council with Horizon 2020 (grant agreements No. 772773 and 871465), the Spanish Ministry of Science and Innovation under grant PID2019-107255GB, the HiPEAC Network of Excellence and the Basque Government under grant KK-2019-00035. The Spanish Ministry of Economy and Competitiveness has also partially supported Leonidas Kosmidis with a Juan de la Cierva Incorporación postdoctoral fellowship (FJCI-2020- 045931-I).Peer ReviewedPostprint (author's final draft

    Multi-core devices for safety-critical systems: a survey

    Get PDF
    Multi-core devices are envisioned to support the development of next-generation safety-critical systems, enabling the on-chip integration of functions of different criticality. This integration provides multiple system-level potential benefits such as cost, size, power, and weight reduction. However, safety certification becomes a challenge and several fundamental safety technical requirements must be addressed, such as temporal and spatial independence, reliability, and diagnostic coverage. This survey provides a categorization and overview at different device abstraction levels (nanoscale, component, and device) of selected key research contributions that support the compliance with these fundamental safety requirements.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2015-65316-P, Basque Government under grant KK-2019-00035 and the HiPEAC Network of Excellence. The Spanish Ministry of Economy and Competitiveness has also partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013-14717).Peer ReviewedPostprint (author's final draft

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

    Cache-based Timing Side-channels in Partitioning Hypervisors

    Get PDF
    Dissertação de mestrado em Engenharia Eletrónica Industrial e ComputadoresIn recent years, the automotive industry has seen a technology complexity increase to comply with computing innovations such as autonomous driving, connectivity and mobility. As such, the need to reduce this complexity without compromising the intended metrics is imperative. The advent of hypervisors in the automotive domain presents a solution to reduce the complexity of the systems by enabling software portability and isolation between virtual machines (VMs). Although virtualization creates the illusion of strict isolation and exclusive resource access, the convergence of critical and non-critical systems into shared chips presents a security problem. This shared hardware has microarchitectural features that can be exploited through their temporal behavior, creating sensitive data leakage channels between co-located VMs. In mixed-criticality systems, the exploitation of these channels can lead to safety issues on systems with real-time constraints compromising the whole system. The implemented side-channel attacks demonstrated well-defined channels, across two real-time partitioning hypervisors in mixed-criticality systems, that enable the inference of a co-located VM’s cache activity. Furthermore, these channels have proven to be mitigated using cache coloring as a countermeasure, thus increasing the determinism of the system in detriment of average performance. From a safety perspective, this dissertation emphasizes the need to weigh the tradeoffs of the trending architectural features that target performance over predictability and determinism.Nos últimos anos, a indústria automotiva tem sido objeto de um crescendo na sua complexidade tecnológica de maneira a manter-se a par das mais recentes inovações de computação. Sendo assim, a necessidade de reduzir a complexidade sem comprometer as métricas pretendidas é imperativa. O advento dos hipervisores na indústria automotiva apresenta uma solução para a redução da complexidade dos sistemas, possiblitando a portabilidade do software e o isolamento entrevirtual vachines (VMs). Embora a virtualização crie a ilusão de isolamento e acesso exclusivo a recursos, a convergência de sistemas críticos e não-críticos em chips partilhados representa um problema de segurança. O hardware partilhado tem características microarquiteturais que podem ser exploradas através do seu comportamento temporal, criando canais de fuga de informação crítica entre VMs adjacentes. Em sistemas de criticalidade mista, a exploração destes canais pode comprometer sistemas com limitações de tempo real. Os ataques side-channel implementados revelam canais bem definidos que possibilitam a inferência da atividade de cache de VMs situadas no mesmo processador. Além disso, esses canais provaram serem passíveis de ser mitigados usando cache coloring como estratégia de mitigação, aumentando assim o determinismo do sistema em detrimento da sua performance. De uma perspetiva da segurança, esta dissertação enfatiza a necessidade de pesar os tradeoffs das tendências arquiteturais que priorizam a performance e secundarizam o determinismo e previsibilidade do sistema

    ReZone: disarming TrustZone with TEE privilege reduction

    Get PDF
    In TrustZone-assisted TEEs, the trusted OS has unrestricted access to both secure and normal world memory. Unfortunately, this architectural limitation has opened an aisle of exploration for attackers, which have demonstrated how to leverage a chain of exploits to hijack the trusted OS and gain full control of the system, targeting (i) the rich execution environment (REE), (ii) all trusted applications (TAs), and (iii) the secure monitor. In this paper, we propose REZONE. The main novelty behind REZONE design relies on leveraging TrustZone-agnostic hardware primitives available on commercially off-the-shelf (COTS) platforms to restrict the privileges of the trusted OS. With REZONE, a monolithic TEE is restructured and partitioned into multiple sandboxed domains named zones, which have only access to private resources. We have fully implemented REZONE for the i.MX 8MQuad EVK and integrated it with Android OS and OP-TEE. We extensively evaluated REZONE using microbenchmarks and real-world applications. REZONE can sustain popular applications like DRM-protected video encoding with acceptable performance overheads. We have surveyed 80 CVE vulnerability reports and estimate that REZONE could mitigate 86.84% of them.We thank our shepherd Aastha Mehta and the anonymous reviewers for their comments and suggestions. This work was supported by national funds through Centro ALGORITMI / Universidade do Minho, Instituto Superior Técnico / Universidade de Lisboa, and FCT under project UIDB/50021/2020 and UIDB/00319/2020. David Cerdeira was supported by FCT grant SFRH/BD/146231/2019

    Fault-tolerant satellite computing with modern semiconductors

    Get PDF
    Miniaturized satellites enable a variety space missions which were in the past infeasible, impractical or uneconomical with traditionally-designed heavier spacecraft. Especially CubeSats can be launched and manufactured rapidly at low cost from commercial components, even in academic environments. However, due to their low reliability and brief lifetime, they are usually not considered suitable for life- and safety-critical services, complex multi-phased solar-system-exploration missions, and missions with a longer duration. Commercial electronics are key to satellite miniaturization, but also responsible for their low reliability: Until 2019, there existed no reliable or fault-tolerant computer architectures suitable for very small satellites. To overcome this deficit, a novel on-board-computer architecture is described in this thesis.Robustness is assured without resorting to radiation hardening, but through software measures implemented within a robust-by-design multiprocessor-system-on-chip. This fault-tolerant architecture is component-wise simple and can dynamically adapt to changing performance requirements throughout a mission. It can support graceful aging by exploiting FPGA-reconfiguration and mixed-criticality.  Experimentally, we achieve 1.94W power consumption at 300Mhz with a Xilinx Kintex Ultrascale+ proof-of-concept, which is well within the powerbudget range of current 2U CubeSats. To our knowledge, this is the first COTS-based, reproducible on-board-computer architecture that can offer strong fault coverage even for small CubeSats.European Space AgencyComputer Systems, Imagery and Medi
    corecore