59 research outputs found

    Applying Hypervisor-Based Fault Tolerance Techniques to Safety-Critical Embedded Systems

    Get PDF
    This document details the work conducted through the development of this thesis, and it is structured as follows: • Chapter 1, Introduction, has briefly presented the motivation, objectives, and contributions of this thesis. • Chapter 2, Fundamentals, exposes a series of concepts that are necessary to correctly understand the information presented in the rest of the thesis, such as the concepts of virtualization, hypervisors, or software-based fault tolerance. In addition, this chapter includes an exhaustive review and comparison between the different hypervisors used in scientific studies dealing with safety-critical systems, and a brief review of some works that try to improve fault tolerance in the hypervisor itself, an area of research that is outside the scope of this work, but that complements the mechanism presented and could be established as a line of future work. • Chapter 3, Problem Statement and Related Work, explains the main reasons why the concept of Hypervisor-Based Fault Tolerance was born and reviews the main articles and research papers on the subject. This review includes both papers related to safety-critical embedded systems (such as the research carried out in this thesis) and papers related to cloud servers and cluster computing that, although not directly applicable to embedded systems, may raise useful concepts that make our solution more complete or allow us to establish future lines of work. • Chapter 4, Proposed Solution, begins with a brief comparison of the work presented in Chapter 3 to establish the requirements that our solution must meet in order to be as complete and innovative as possible. It then sets out the architecture of the proposed solution and explains in detail the two main elements of the solution: the Voter and the Health Monitoring partition. • Chapter 5, Prototype, explains in detail the prototyping of the proposed solution, including the choice of the hypervisor, the processing board, and the critical functionality to be redundant. With respect to the voter, it includes prototypes for both the software version (the voter is implemented in a virtual machine) and the hardware version (the voter is implemented as IP cores on the FPGA). • Chapter 6, Evaluation, includes the evaluation of the prototype developed in Chapter 5. As a preliminary step and given that there is no evidence in this regard, an exercise is carried out to measure the overhead involved in using the XtratuM hypervisor versus not using it. Subsequently, qualitative tests are carried out to check that Health Monitoring is working as expected and a fault injection campaign is carried out to check the error detection and correction rate of our solution. Finally, a comparison is made between the performance of the hardware and software versions of Voter. • Chapter 7, Conclusions and Future Work, is dedicated to collect the conclusions obtained and the contributions made during the research (in the form of articles in journals, conferences and contributions to projects and proposals in the industry). In addition, it establishes some lines of future work that could complete and extend the research carried out during this doctoral thesis.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Katzalin Olcoz Herrero.- Secretario: Félix García Carballeira.- Vocal: Santiago Rodríguez de la Fuent

    Simulador para arquitecturas multiprocesador utilizadas en el sector espacial para apoyar el desarrollo de mecanismos software de tolerancia a fallos

    Get PDF
    La colonización de la Luna y Marte o la minería espacial son ideas que la humanidad ha ido desarrollando durante décadas pero que hoy en día parecen estar al alcance de la mano gracias al avance tecnológico, la nueva carrera espacial en la que China es cada vez más relevante, y al aumento de interés por parte de gobiernos, agencias y empresas. Parte de estos avances tecnológicos radican en el uso de hardware y software espacial cada vez más potente y versátil. En cualquier misión espacial, el software es un componente fundamental que permite configurar de distintas maneras al sistema y manejar las situaciones excepcionales que se puedan dar. Además, en el entorno espacial tanto el hardware como el software necesitan satisfacer unos requisitos de tolerancia a fallos para evitar en mayor medida los errores producidos por la radiación. La verificación de estos requisitos en sistemas críticos consume cada vez una mayor cantidad de recursos dedicados al desarrollo de los sistemas, sobre todo en sistemas multinúcleo. En este contexto es necesario el uso de nuevas herramientas para el desarrollo y verificación tempranos del software empotrado. Entre estas herramientas se puede incluir la propuesta de este trabajo de tesis, que afronta el problema mediante el uso de técnicas de simulación e inyección de fallos. Dadas las restricciones temporales en el desarrollo de los sistemas embarcados y los estrictos requisitos de robustez del software espacial, es necesario realizar esta verificación en etapas muy tempranas del desarrollo. En un caso ideal, estas tareas serian desempeñadas de forma paralela al desarrollo hardware, permitiendo anticipar discrepancias y problemas en la especificación del sistema. Cabe resaltar que la verificación de los mecanismos de tolerancia a fallos software puede ser difícil o imposible de realizar en el hardware, dado que las técnicas de inyección de fallos hardware limitan la reproducibilidad de escenarios concretos asi como que los escenarios sean observables y con inyección de fallos controlable y transparente. En esta tesis se describe la investigación, desarrollo y uso de la plataforma virtual ``LeonViP-MC'', con capacidad de inyección de fallos y que utiliza traducción dinámica binaria (Dynamic Binary Translation) mediante LLVM y corrutinas para la simulación de sistemas multinúcleo. Gracias a esta plataforma se puede ejecutar el mismo binario a ejecutar en el hardware real, pero en un entorno controlado y determinista. Esto ha permitido la realización de distintas campanas de inyección de fallos que no serían viables de otra manera. Su utilización ha permitido demostrar la fiabilidad de las técnicas de tolerancia a fallos implementadas tanto en el software de arranque de la Unidad de Control del Instrumento (ICU) del Detector de Partículas Energéticas (EPD) embarcado en Solar Orbiter asi como de una aplicación basada en un canal de comunicaciones ARINC 653 que forma parte del desarrollo de un futuro hipervisor para el sistema multinúcleo GR740

    Proceedings of the 19th Sound and Music Computing Conference

    Get PDF
    Proceedings of the 19th Sound and Music Computing Conference - June 5-12, 2022 - Saint-Étienne (France). https://smc22.grame.f

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    Real-Time Trace Decoding and Monitoring for Safety and Security in Embedded Systems

    Get PDF
    Integrated circuits and systems can be found almost everywhere in today’s world. As their use increases, they need to be made safer and more perfor mant to meet current demands in processing power. FPGA integrated SoCs can provide the ideal trade-off between performance, adaptability, and energy usage. One of today’s vital challenges lies in updating existing fault tolerance techniques for these new systems while utilizing all available processing capa bilities, such as multi-core and heterogeneous processing units. Control-flow monitoring is one of the primary mechanisms described for error detection at the software architectural level for the highest grade of hazard level clas sifications (e.g., ASIL D) described in industry safety standards ISO-26262. Control-flow errors are also known to compose the majority of detected errors for ICs and embedded systems in safety-critical and risk-susceptible environ ments [5]. Software-based monitoring methods remain the most popular [6–8]. However, recent studies show that the overheads they impose make actual reliability gains negligible [9, 10]. This work proposes and demonstrates a new control flow checking method implemented in FPGA for multi-core embedded systems called control-flow trace checker (CFTC). CFTC uses existing trace and debug subsystems of modern processors to rebuild their execution states. It can iden tify any errors in real-time by comparing executed states to a set of permitted state transitions determined statically. This novel implementation weighs hardware resource trade-offs to target mul tiple independent tasks in multi-core embedded applications, as well as single core systems. The proposed system is entirely implemented in hardware and isolated from all monitored software components, requiring 2.4% of the target FPGA platform resources to protect an execution unit in its entirety. There fore, it avoids undesired overheads and maintains deterministic error detection latencies, which guarantees reliability improvements without impairing the target software system. Finally, CFTC is evaluated under different software i Resumo fault-injection scenarios, achieving detection rates of 100% of all control-flow errors to wrong destinations and 98% of all injected faults to program binaries. All detection times are further analyzed and precisely described by a model based on the monitor’s resources and speed and the software application’s control-flow structure and binary characteristics.Circuitos integrados estão presentes em quase todos sistemas complexos do mundo moderno. Conforme sua frequência de uso aumenta, eles precisam se tornar mais seguros e performantes para conseguir atender as novas demandas em potência de processamento. Sistemas em Chip integrados com FPGAs conseguem prover o balanço perfeito entre desempenho, adaptabilidade, e uso de energia. Um dos maiores desafios agora é a necessidade de atualizar técnicas de tolerância à falhas para estes novos sistemas, aproveitando os novos avanços em capacidade de processamento. Monitoramento de fluxo de controle é um dos principais mecanismos para a detecção de erros em nível de software para sistemas classificados como de alto risco (e.g. ASIL D), descrito em padrões de segurança como o ISO-26262. Estes erros são conhecidos por compor a maioria dos erros detectados em sistemas integrados [5]. Embora métodos de monitoramento baseados em software continuem sendo os mais populares [6–8], estudos recentes mostram que seus custos adicionais, em termos de performance e área, diminuem consideravelmente seus ganhos reais em confiabilidade [9, 10]. Propomos aqui um novo método de monitora mento de fluxo de controle implementado em FPGA para sistemas embarcados multi-core. Este método usa subsistemas de trace e execução de código para reconstruir o estado atual do processador, identificando erros através de com parações entre diferentes estados de execução da CPU. Propomos uma implementação que considera trade-offs no uso de recuros de sistema para monitorar múltiplas tarefas independetes. Nossa abordagem suporta o monitoramento de sistemas simples e também de sistemas multi-core multitarefa. Por fim, nossa técnica é totalmente implementada em hardware, evitando o uso de unidades de processamento de software que possa adicionar custos indesejáveis à aplicação em perda de confiabilidade. Propomos, assim, um mecanismo de verificação de fluxo de controle, escalável e extensível, para proteção de sistemas embarcados críticos e multi-core

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

    Lock-V: a heterogeneous fault tolerance architecture based on Arm and RISC-V

    Get PDF
    This article presents Lock-V, a heterogeneous fault tolerance architecture that explores a dual-core lockstep (DCLS) technique to mitigate single event upset (SEU) and common-mode failure (CMF) problems. The Lock-V was deployed in two versions, Lock-VA and Lock-VM by applying design diversity in two processor architectures at the instruction set architecture (ISA)-level. Lock-VA features an Arm Cortex-A9 with a RISC-V RV64GC, while Lock-VM includes an Arm Cortex-M3 along with a RISC-V RV32IMA processor. The solution explores fieldprogrammable gate array (FPGA) technology to deploy softcore versions of the RISC-V processors, and dedicated accelerators for performing error detection and triggering the software rollback system used for error recovery. To test Lock-V in both versions, a fault-injection mechanism was implemented to cause bit-flips in the processor registers, a common problem usually present in heavy radiation environments.This work has been supported by FCT - Fundação para a Ciência e a Tecnologia within the R&D Units Project Scope: UIDB/00319/2020

    Smart Wireless Sensor Networks

    Get PDF
    The recent development of communication and sensor technology results in the growth of a new attractive and challenging area - wireless sensor networks (WSNs). A wireless sensor network which consists of a large number of sensor nodes is deployed in environmental fields to serve various applications. Facilitated with the ability of wireless communication and intelligent computation, these nodes become smart sensors which do not only perceive ambient physical parameters but also be able to process information, cooperate with each other and self-organize into the network. These new features assist the sensor nodes as well as the network to operate more efficiently in terms of both data acquisition and energy consumption. Special purposes of the applications require design and operation of WSNs different from conventional networks such as the internet. The network design must take into account of the objectives of specific applications. The nature of deployed environment must be considered. The limited of sensor nodes� resources such as memory, computational ability, communication bandwidth and energy source are the challenges in network design. A smart wireless sensor network must be able to deal with these constraints as well as to guarantee the connectivity, coverage, reliability and security of network's operation for a maximized lifetime. This book discusses various aspects of designing such smart wireless sensor networks. Main topics includes: design methodologies, network protocols and algorithms, quality of service management, coverage optimization, time synchronization and security techniques for sensor networks
    corecore