99 research outputs found

    Applying Hypervisor-Based Fault Tolerance Techniques to Safety-Critical Embedded Systems

    Get PDF
    This document details the work conducted through the development of this thesis, and it is structured as follows: • Chapter 1, Introduction, has briefly presented the motivation, objectives, and contributions of this thesis. • Chapter 2, Fundamentals, exposes a series of concepts that are necessary to correctly understand the information presented in the rest of the thesis, such as the concepts of virtualization, hypervisors, or software-based fault tolerance. In addition, this chapter includes an exhaustive review and comparison between the different hypervisors used in scientific studies dealing with safety-critical systems, and a brief review of some works that try to improve fault tolerance in the hypervisor itself, an area of research that is outside the scope of this work, but that complements the mechanism presented and could be established as a line of future work. • Chapter 3, Problem Statement and Related Work, explains the main reasons why the concept of Hypervisor-Based Fault Tolerance was born and reviews the main articles and research papers on the subject. This review includes both papers related to safety-critical embedded systems (such as the research carried out in this thesis) and papers related to cloud servers and cluster computing that, although not directly applicable to embedded systems, may raise useful concepts that make our solution more complete or allow us to establish future lines of work. • Chapter 4, Proposed Solution, begins with a brief comparison of the work presented in Chapter 3 to establish the requirements that our solution must meet in order to be as complete and innovative as possible. It then sets out the architecture of the proposed solution and explains in detail the two main elements of the solution: the Voter and the Health Monitoring partition. • Chapter 5, Prototype, explains in detail the prototyping of the proposed solution, including the choice of the hypervisor, the processing board, and the critical functionality to be redundant. With respect to the voter, it includes prototypes for both the software version (the voter is implemented in a virtual machine) and the hardware version (the voter is implemented as IP cores on the FPGA). • Chapter 6, Evaluation, includes the evaluation of the prototype developed in Chapter 5. As a preliminary step and given that there is no evidence in this regard, an exercise is carried out to measure the overhead involved in using the XtratuM hypervisor versus not using it. Subsequently, qualitative tests are carried out to check that Health Monitoring is working as expected and a fault injection campaign is carried out to check the error detection and correction rate of our solution. Finally, a comparison is made between the performance of the hardware and software versions of Voter. • Chapter 7, Conclusions and Future Work, is dedicated to collect the conclusions obtained and the contributions made during the research (in the form of articles in journals, conferences and contributions to projects and proposals in the industry). In addition, it establishes some lines of future work that could complete and extend the research carried out during this doctoral thesis.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Katzalin Olcoz Herrero.- Secretario: Félix García Carballeira.- Vocal: Santiago Rodríguez de la Fuent

    Multi-core devices for safety-critical systems: a survey

    Get PDF
    Multi-core devices are envisioned to support the development of next-generation safety-critical systems, enabling the on-chip integration of functions of different criticality. This integration provides multiple system-level potential benefits such as cost, size, power, and weight reduction. However, safety certification becomes a challenge and several fundamental safety technical requirements must be addressed, such as temporal and spatial independence, reliability, and diagnostic coverage. This survey provides a categorization and overview at different device abstraction levels (nanoscale, component, and device) of selected key research contributions that support the compliance with these fundamental safety requirements.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2015-65316-P, Basque Government under grant KK-2019-00035 and the HiPEAC Network of Excellence. The Spanish Ministry of Economy and Competitiveness has also partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013-14717).Peer ReviewedPostprint (author's final draft

    Development and certification of mixed-criticality embedded systems based on probabilistic timing analysis

    Get PDF
    An increasing variety of emerging systems relentlessly replaces or augments the functionality of mechanical subsystems with embedded electronics. For quantity, complexity, and use, the safety of such subsystems is an increasingly important matter. Accordingly, those systems are subject to safety certification to demonstrate system's safety by rigorous development processes and hardware/software constraints. The massive augment in embedded processors' complexity renders the arduous certification task significantly harder to achieve. The focus of this thesis is to address the certification challenges in multicore architectures: despite their potential to integrate several applications on a single platform, their inherent complexity imperils their timing predictability and certification. Recently, the Measurement-Based Probabilistic Timing Analysis (MBPTA) technique emerged as an alternative to deal with hardware/software complexity. The innovation that MBPTA brings about is, however, a major step from current certification procedures and standards. The particular contributions of this Thesis include: (i) the definition of certification arguments for mixed-criticality integration upon multicore processors. In particular we propose a set of safety mechanisms and procedures as required to comply with functional safety standards. For timing predictability, (ii) we present a quantitative approach to assess the likelihood of execution-time exceedance events with respect to the risk reduction requirements on safety standards. To this end, we build upon the MBPTA approach and we present the design of a safety-related source of randomization (SoR), that plays a key role in the platform-level randomization needed by MBPTA. And (iii) we evaluate current certification guidance with respect to emerging high performance design trends like caches. Overall, this Thesis pushes the certification limits in the use of multicore and MBPTA technology in Critical Real-Time Embedded Systems (CRTES) and paves the way towards their adoption in industry.Una creciente variedad de sistemas emergentes reemplazan o aumentan la funcionalidad de subsistemas mecánicos con componentes electrónicos embebidos. El aumento en la cantidad y complejidad de dichos subsistemas electrónicos así como su cometido, hacen de su seguridad una cuestión de creciente importancia. Tanto es así que la comercialización de estos sistemas críticos está sujeta a rigurosos procesos de certificación donde se garantiza la seguridad del sistema mediante estrictas restricciones en el proceso de desarrollo y diseño de su hardware y software. Esta tesis trata de abordar los nuevos retos y dificultades dadas por la introducción de procesadores multi-núcleo en dichos sistemas críticos: aunque su mayor rendimiento despierta el interés de la industria para integrar múltiples aplicaciones en una sola plataforma, suponen una mayor complejidad. Su arquitectura desafía su análisis temporal mediante los métodos tradicionales y, asimismo, su certificación es cada vez más compleja y costosa. Con el fin de lidiar con estas limitaciones, recientemente se ha desarrollado una novedosa técnica de análisis temporal probabilístico basado en medidas (MBPTA). La innovación de esta técnica, sin embargo, supone un gran cambio cultural respecto a los estándares y procedimientos tradicionales de certificación. En esta línea, las contribuciones de esta tesis están agrupadas en tres ejes principales: (i) definición de argumentos de seguridad para la certificación de aplicaciones de criticidad-mixta sobre plataformas multi-núcleo. Se definen, en particular, mecanismos de seguridad, técnicas de diagnóstico y reacción de faltas acorde con el estándar IEC 61508 sobre una arquitectura multi-núcleo de referencia. Respecto al análisis temporal, (ii) presentamos la cuantificación de la probabilidad de exceder un límite temporal y su relación con los requisitos de reducción de riesgos derivados de los estándares de seguridad funcional. Con este fin, nos basamos en la técnica MBPTA y presentamos el diseño de una fuente de números aleatorios segura; un componente clave para conseguir las propiedades aleatorias requeridas por MBPTA a nivel de plataforma. Por último, (iii) extrapolamos las guías actuales para la certificación de arquitecturas multi-núcleo a una solución comercial de 8 núcleos y las evaluamos con respecto a las tendencias emergentes de diseño de alto rendimiento (caches). Con estas contribuciones, esta tesis trata de abordar los retos que el uso de procesadores multi-núcleo y MBPTA implican en el proceso de certificación de sistemas críticos de tiempo real y facilita, de esta forma, su adopción por la industria.Postprint (published version

    GPU devices for safety-critical systems: a survey

    Get PDF
    Graphics Processing Unit (GPU) devices and their associated software programming languages and frameworks can deliver the computing performance required to facilitate the development of next-generation high-performance safety-critical systems such as autonomous driving systems. However, the integration of complex, parallel, and computationally demanding software functions with different safety-criticality levels on GPU devices with shared hardware resources contributes to several safety certification challenges. This survey categorizes and provides an overview of research contributions that address GPU devices’ random hardware failures, systematic failures, and independence of execution.This work has been partially supported by the European Research Council with Horizon 2020 (grant agreements No. 772773 and 871465), the Spanish Ministry of Science and Innovation under grant PID2019-107255GB, the HiPEAC Network of Excellence and the Basque Government under grant KK-2019-00035. The Spanish Ministry of Economy and Competitiveness has also partially supported Leonidas Kosmidis with a Juan de la Cierva Incorporación postdoctoral fellowship (FJCI-2020- 045931-I).Peer ReviewedPostprint (author's final draft

    Lifecycle Management of Automotive Safety-Critical Over the Air Updates: A Systems Approach

    Get PDF
    With the increasing importance of Over The Air (OTA) updates in the automotive field, maintaining safety standards becomes more challenging as frequent incremental changes of embedded software are regularly integrated into a wide range of vehicle variants. This necessitates new processes and methodologies with a holistic view on the backend, where the updates are developed and released

    Towards a centralized multicore automotive system

    Get PDF
    Today’s automotive systems are inundated with embedded electronics to host chassis, powertrain, infotainment, advanced driver assistance systems, and other modern vehicle functions. As many as 100 embedded microcontrollers execute hundreds of millions of lines of code in a single vehicle. To control the increasing complexity in vehicle electronics and services, automakers are planning to consolidate different on-board automotive functions as software tasks on centralized multicore hardware platforms. However, these vehicle software services have different and contrasting timing, safety, and security requirements. Existing vehicle operating systems are ill-equipped to provide all the required service guarantees on a single machine. A centralized automotive system aims to tackle this by assigning software tasks to multiple criticality domains or levels according to their consequences of failures, or international safety standards like ISO 26262. This research investigates several emerging challenges in time-critical systems for a centralized multicore automotive platform and proposes a novel vehicle operating system framework to address them. This thesis first introduces an integrated vehicle management system (VMS), called DriveOS™, for a PC-class multicore hardware platform. Its separation kernel design enables temporal and spatial isolation among critical and non-critical vehicle services in different domains on the same machine. Time- and safety-critical vehicle functions are implemented in a sandboxed Real-time Operating System (OS) domain, and non-critical software is developed in a sandboxed general-purpose OS (e.g., Linux, Android) domain. To leverage the advantages of model-driven vehicle function development, DriveOS provides a multi-domain application framework in Simulink. This thesis also presents a real-time task pipeline scheduling algorithm in multiprocessors for communication between connected vehicle services with end-to-end guarantees. The benefits and performance of the overall automotive system framework are demonstrated with hardware-in-the-loop testing using real-world applications, car datasets and simulated benchmarks, and with an early-stage deployment in a production-grade luxury electric vehicle

    Secure and safe virtualization-based framework for embedded systems development

    Get PDF
    Tese de Doutoramento - Programa Doutoral em Engenharia Electrónica e de Computadores (PDEEC)The Internet of Things (IoT) is here. Billions of smart, connected devices are proliferating at rapid pace in our key infrastructures, generating, processing and exchanging vast amounts of security-critical and privacy-sensitive data. This strong connectivity of IoT environments demands for a holistic, end-to-end security approach, addressing security and privacy risks across different abstraction levels: device, communications, cloud, and lifecycle managment. Security at the device level is being misconstrued as the addition of features in a late stage of the system development. Several software-based approaches such as microkernels, and virtualization have been used, but it is proven, per se, they fail in providing the desired security level. As a step towards the correct operation of these devices, it is imperative to extend them with new security-oriented technologies which guarantee security from the outset. This thesis aims to conceive and design a novel security and safety architecture for virtualized systems by 1) evaluating which technologies are key enablers for scalable and secure virtualization, 2) designing and implementing a fully-featured virtualization environment providing hardware isolation 3) investigating which "hard entities" can extend virtualization to guarantee the security requirements dictated by confidentiality, integrity, and availability, and 4) simplifying system configurability and integration through a design ecosystem supported by a domain-specific language. The developed artefacts demonstrate: 1) why ARM TrustZone is nowadays a reference technology for security, 2) how TrustZone can be adequately exploited for virtualization in different use-cases, 3) why the secure boot process, trusted execution environment and other hardware trust anchors are essential to establish and guarantee a complete root and chain of trust, and 4) how a domain-specific language enables easy design, integration and customization of a secure virtualized system assisted by the above mentioned building blocks.Vivemos na era da Internet das Coisas (IoT). Biliões de dispositivos inteligentes começam a proliferar nas nossas infraestruturas chave, levando ao processamento de avolumadas quantidades de dados privados e sensíveis. Esta forte conectividade inerente ao conceito IoT necessita de uma abordagem holística, em que os riscos de privacidade e segurança são abordados nas diferentes camadas de abstração: dispositivo, comunicações, nuvem e ciclo de vida. A segurança ao nível dos dispositivos tem sido erradamente assegurada pela inclusão de funcionalidades numa fase tardia do desenvolvimento. Têm sido utilizadas diversas abordagens de software, incluindo a virtualização, mas está provado que estas não conseguem garantir o nível de segurança desejado. De forma a garantir a correta operação dos dispositivos, é fundamental complementar os mesmos com novas tecnologias que promovem a segurança desde os primeiros estágios de desenvolvimento. Esta tese propõe, assim, o desenvolvimento de uma solução arquitetural inovadora para sistemas virtualizados seguros, contemplando 1) a avaliação de tecnologias chave que promovam tal realização, 2) a implementação de uma solução de virtualização garantindo isolamento por hardware, 3) a identificação de componentes que integrados permitirão complementar a virtualização para garantir os requisitos de segurança, e 4) a simplificação do processo de configuração e integração da solução através de um ecossistema suportado por uma linguagem de domínio específico. Os artefactos desenvolvidos demonstram: 1) o porquê da tecnologia ARM TrustZone ser uma tecnologia de referência para a segurança, 2) a efetividade desta tecnologia quando utilizada em diferentes domínios, 3) o porquê do processo seguro de inicialização, juntamente com um ambiente de execução seguro e outros componentes de hardware, serem essenciais para estabelecer uma cadeia de confiança, e 4) a viabilidade em utilizar uma linguagem de um domínio específico para configurar e integrar um ambiente virtualizado suportado pelos artefactos supramencionados

    Leveraging virtualization technologies for resource partitioning in mixed criticality systems

    Get PDF
    Multi- and many-core processors are becoming increasingly popular in embedded systems. Many of these processors now feature hardware virtualization capabilities, such as the ARM Cortex A15, and x86 processors with Intel VT-x or AMD-V support. Hardware virtualization offers opportunities to partition physical resources, including processor cores, memory and I/O devices amongst guest virtual machines. Mixed criticality systems and services can then co-exist on the same platform in separate virtual machines. However, traditional virtual machine systems are too expensive because of the costs of trapping into hypervisors to multiplex and manage machine physical resources on behalf of separate guests. For example, hypervisors are needed to schedule separate VMs on physical processor cores. Additionally, traditional hypervisors have memory footprints that are often too large for many embedded computing systems. This dissertation presents the design of the Quest-V separation kernel, which partitions services of different criticality levels across separate virtual machines, or sandboxes. Each sandbox encapsulates a subset of machine physical resources that it manages without requiring intervention of a hypervisor. In Quest-V, a hypervisor is not needed for normal operation, except to bootstrap the system and establish communication channels between sandboxes. This approach not only reduces the memory footprint of the most privileged protection domain, it removes it from the control path during normal system operation, thereby heightening security

    Cache-based Timing Side-channels in Partitioning Hypervisors

    Get PDF
    Dissertação de mestrado em Engenharia Eletrónica Industrial e ComputadoresIn recent years, the automotive industry has seen a technology complexity increase to comply with computing innovations such as autonomous driving, connectivity and mobility. As such, the need to reduce this complexity without compromising the intended metrics is imperative. The advent of hypervisors in the automotive domain presents a solution to reduce the complexity of the systems by enabling software portability and isolation between virtual machines (VMs). Although virtualization creates the illusion of strict isolation and exclusive resource access, the convergence of critical and non-critical systems into shared chips presents a security problem. This shared hardware has microarchitectural features that can be exploited through their temporal behavior, creating sensitive data leakage channels between co-located VMs. In mixed-criticality systems, the exploitation of these channels can lead to safety issues on systems with real-time constraints compromising the whole system. The implemented side-channel attacks demonstrated well-defined channels, across two real-time partitioning hypervisors in mixed-criticality systems, that enable the inference of a co-located VM’s cache activity. Furthermore, these channels have proven to be mitigated using cache coloring as a countermeasure, thus increasing the determinism of the system in detriment of average performance. From a safety perspective, this dissertation emphasizes the need to weigh the tradeoffs of the trending architectural features that target performance over predictability and determinism.Nos últimos anos, a indústria automotiva tem sido objeto de um crescendo na sua complexidade tecnológica de maneira a manter-se a par das mais recentes inovações de computação. Sendo assim, a necessidade de reduzir a complexidade sem comprometer as métricas pretendidas é imperativa. O advento dos hipervisores na indústria automotiva apresenta uma solução para a redução da complexidade dos sistemas, possiblitando a portabilidade do software e o isolamento entrevirtual vachines (VMs). Embora a virtualização crie a ilusão de isolamento e acesso exclusivo a recursos, a convergência de sistemas críticos e não-críticos em chips partilhados representa um problema de segurança. O hardware partilhado tem características microarquiteturais que podem ser exploradas através do seu comportamento temporal, criando canais de fuga de informação crítica entre VMs adjacentes. Em sistemas de criticalidade mista, a exploração destes canais pode comprometer sistemas com limitações de tempo real. Os ataques side-channel implementados revelam canais bem definidos que possibilitam a inferência da atividade de cache de VMs situadas no mesmo processador. Além disso, esses canais provaram serem passíveis de ser mitigados usando cache coloring como estratégia de mitigação, aumentando assim o determinismo do sistema em detrimento da sua performance. De uma perspetiva da segurança, esta dissertação enfatiza a necessidade de pesar os tradeoffs das tendências arquiteturais que priorizam a performance e secundarizam o determinismo e previsibilidade do sistema
    • …
    corecore