813 research outputs found

    Reduced Precision DWC: an Efficient Hardening Strategy for Mixed-Precision Architectures

    Get PDF
    Duplication with Comparison (DWC) is an effective software-level solution to improve the reliability of computing devices. However, it introduces performance and energy consumption overheads that could be unsuitable for high-performance computing or real-time safety-critical applications. In this work, we present Reduced-Precision Duplication with Comparison (RP-DWC) as a means to lower the overhead of DWC by executing the redundant copy in reduced precision. RP-DWC is particularly suitable for modern mixed-precision architectures, such as NVIDIA GPUs, that feature dedicated functional units for computing with programmable accuracy. We discuss the benefits and challenges associated with RP-DWC and show that the intrinsic difference between the mixed-precision copies allows for detecting most, but not all, errors. However, as the undetected faults are the ones that fall into the difference between precisions, they are the ones that produce a much smaller impact on the application output and, thus, might be tolerated. We investigate RP-DWC impact into fault detection, performance, and energy consumption on Volta GPUs. Through fault injection and beam experiment, using three microbenchmarks and four real applications, we show that RP-DWC achieves an excellent coverage (up to 86%) with minimal overheads (as low as 0.1% time and 24% energy consumption overhead

    Design Space Exploration for MPSoC Architectures

    Get PDF
    Multiprocessor system-on-chip (MPSoC) designs utilize the available technology and communication architectures to meet the requirements of the upcoming applications. In MPSoC, the communication platform is both the key enabler, as well as the key differentiator for realizing efficient MPSoCs. It provides product differentiation to meet a diverse, multi-dimensional set of design constraints, including performance, power, energy, reconfigurability, scalability, cost, reliability and time-to-market. The communication resources of a single interconnection platform cannot be fully utilized by all kind of applications, such as the availability of higher communication bandwidth for computation but not data intensive applications is often unfeasible in the practical implementation. This thesis aims to perform the architecture-level design space exploration towards efficient and scalable resource utilization for MPSoC communication architecture. In order to meet the performance requirements within the design constraints, careful selection of MPSoC communication platform, resource aware partitioning and mapping of the application play important role. To enhance the utilization of communication resources, variety of techniques such as resource sharing, multicast to avoid re-transmission of identical data, and adaptive routing can be used. For implementation, these techniques should be customized according to the platform architecture. To address the resource utilization of MPSoC communication platforms, variety of architectures with different design parameters and performance levels, namely Segmented bus (SegBus), Network-on-Chip (NoC) and Three-Dimensional NoC (3D-NoC), are selected. Average packet latency and power consumption are the evaluation parameters for the proposed techniques. In conventional computing architectures, fault on a component makes the connected fault-free components inoperative. Resource sharing approach can utilize the fault-free components to retain the system performance by reducing the impact of faults. Design space exploration also guides to narrow down the selection of MPSoC architecture, which can meet the performance requirements with design constraints.Siirretty Doriast

    Hierarchical Agent-based Adaptation for Self-Aware Embedded Computing Systems

    Get PDF
    Siirretty Doriast

    Applying Hypervisor-Based Fault Tolerance Techniques to Safety-Critical Embedded Systems

    Get PDF
    This document details the work conducted through the development of this thesis, and it is structured as follows: • Chapter 1, Introduction, has briefly presented the motivation, objectives, and contributions of this thesis. • Chapter 2, Fundamentals, exposes a series of concepts that are necessary to correctly understand the information presented in the rest of the thesis, such as the concepts of virtualization, hypervisors, or software-based fault tolerance. In addition, this chapter includes an exhaustive review and comparison between the different hypervisors used in scientific studies dealing with safety-critical systems, and a brief review of some works that try to improve fault tolerance in the hypervisor itself, an area of research that is outside the scope of this work, but that complements the mechanism presented and could be established as a line of future work. • Chapter 3, Problem Statement and Related Work, explains the main reasons why the concept of Hypervisor-Based Fault Tolerance was born and reviews the main articles and research papers on the subject. This review includes both papers related to safety-critical embedded systems (such as the research carried out in this thesis) and papers related to cloud servers and cluster computing that, although not directly applicable to embedded systems, may raise useful concepts that make our solution more complete or allow us to establish future lines of work. • Chapter 4, Proposed Solution, begins with a brief comparison of the work presented in Chapter 3 to establish the requirements that our solution must meet in order to be as complete and innovative as possible. It then sets out the architecture of the proposed solution and explains in detail the two main elements of the solution: the Voter and the Health Monitoring partition. • Chapter 5, Prototype, explains in detail the prototyping of the proposed solution, including the choice of the hypervisor, the processing board, and the critical functionality to be redundant. With respect to the voter, it includes prototypes for both the software version (the voter is implemented in a virtual machine) and the hardware version (the voter is implemented as IP cores on the FPGA). • Chapter 6, Evaluation, includes the evaluation of the prototype developed in Chapter 5. As a preliminary step and given that there is no evidence in this regard, an exercise is carried out to measure the overhead involved in using the XtratuM hypervisor versus not using it. Subsequently, qualitative tests are carried out to check that Health Monitoring is working as expected and a fault injection campaign is carried out to check the error detection and correction rate of our solution. Finally, a comparison is made between the performance of the hardware and software versions of Voter. • Chapter 7, Conclusions and Future Work, is dedicated to collect the conclusions obtained and the contributions made during the research (in the form of articles in journals, conferences and contributions to projects and proposals in the industry). In addition, it establishes some lines of future work that could complete and extend the research carried out during this doctoral thesis.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Katzalin Olcoz Herrero.- Secretario: Félix García Carballeira.- Vocal: Santiago Rodríguez de la Fuent

    GRU-based denoising autoencoder for detection and clustering of unknown single and concurrent faults during system integration testing of automotive software systems

    Get PDF
    Recently, remarkable successes have been achieved in the quality assurance of automotive software systems (ASSs) through the utilization of real-time hardware-in-the-loop (HIL) simulation. Based on the HIL platform, safe, flexible and reliable realistic simulation during the system development process can be enabled. However, notwithstanding the test automation capability, large amounts of recordings data are generated as a result of HIL test executions. Expert knowledge-based approaches to analyze the generated recordings, with the aim of detecting and identifying the faults, are costly in terms of time, effort and difficulty. Therefore, in this study, a novel deep learning-based methodology is proposed so that the faults of automotive sensor signals can be efficiently and automatically detected and identified without human intervention. Concretely, a hybrid GRU-based denoising autoencoder (GRU-based DAE) model with the k-means algorithm is developed for the fault-detection and clustering problem in sequential data. By doing so, based on the real-time historical data, not only individual faults but also unknown simultaneous faults under noisy conditions can be accurately detected and clustered. The applicability and advantages of the proposed method for the HIL testing process are demonstrated by two automotive case studies. To be specific, a high-fidelity gasoline engine and vehicle dynamic system along with an entire vehicle model are considered to verify the performance of the proposed model. The superiority of the proposed architecture compared to other autoencoder variants is presented in the results in terms of reconstruction error under several noise levels. The validation results indicate that the proposed model can perform high detection and clustering accuracy of unknown faults compared to stand-alone techniques

    DRAM Bender: An Extensible and Versatile FPGA-based Infrastructure to Easily Test State-of-the-art DRAM Chips

    Full text link
    To understand and improve DRAM performance, reliability, security and energy efficiency, prior works study characteristics of commodity DRAM chips. Unfortunately, state-of-the-art open source infrastructures capable of conducting such studies are obsolete, poorly supported, or difficult to use, or their inflexibility limit the types of studies they can conduct. We propose DRAM Bender, a new FPGA-based infrastructure that enables experimental studies on state-of-the-art DRAM chips. DRAM Bender offers three key features at the same time. First, DRAM Bender enables directly interfacing with a DRAM chip through its low-level interface. This allows users to issue DRAM commands in arbitrary order and with finer-grained time intervals compared to other open source infrastructures. Second, DRAM Bender exposes easy-to-use C++ and Python programming interfaces, allowing users to quickly and easily develop different types of DRAM experiments. Third, DRAM Bender is easily extensible. The modular design of DRAM Bender allows extending it to (i) support existing and emerging DRAM interfaces, and (ii) run on new commercial or custom FPGA boards with little effort. To demonstrate that DRAM Bender is a versatile infrastructure, we conduct three case studies, two of which lead to new observations about the DRAM RowHammer vulnerability. In particular, we show that data patterns supported by DRAM Bender uncovers a larger set of bit-flips on a victim row compared to the data patterns commonly used by prior work. We demonstrate the extensibility of DRAM Bender by implementing it on five different FPGAs with DDR4 and DDR3 support. DRAM Bender is freely and openly available at https://github.com/CMU-SAFARI/DRAM-Bender.Comment: To appear in TCAD 202

    Reliability and Security Assessment of Modern Embedded Devices

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Sensei : enforcing secure coding guidelines in the integrated development environment

    Get PDF
    We discuss the potential benefits, requirements, and implementation challenges of a security-by-design approach in which an integrated development environment (IDE) plugin assists software developers to write code that complies with secure coding guidelines. We discuss how such a plugin can enable a company's policy-setting security experts and developers to pass their knowledge on to each other more efficiently, and to let developers more effectively put that knowledge into practice. This is achieved by letting the team members develop customized rule sets that formalize coding guidelines and by letting the plugin check the compliance of code being written to those rule sets in real time, similar to an as-you-type spell checker. Upon detected violations, the plugin suggests options to quickly fix them and offers additional information for the developer. We share our experience with proof-of-concept designs and implementations rolled out in multiple companies, and present some future research and development directions

    Resilient and Trustworthy Dynamic Data-driven Application Systems (DDDAS) Services for Crisis Management Environments

    Get PDF
    Future crisis management systems needresilient and trustworthy infrastructures to quickly develop reliable applications and processes, andensure end-to-end security, trust, and privacy. Due to the multiplicity and diversity of involved actors, volumes of data, and heterogeneity of shared information;crisis management systems tend to be highly vulnerable and subjectto unforeseen incidents. As a result, the dependability of crisis management systems can be at risk. This paper presents a cloud-based resilient and trustworthy infrastructure (known as rDaaS) to quickly develop secure crisis management systems. The rDaaS integrates the Dynamic Data-Driven Application Systems (DDDAS) paradigm into a service-oriented architecture over cloud technology and provides a set of resilient DDDAS-As-A Service (rDaaS) components to build secure and trusted adaptable crisis processes. The rDaaS also ensures resilience and security by obfuscating the execution environment and applying Behavior Software Encryption and Moving Technique Defense. A simulation environment for a nuclear plant crisis management case study is illustrated to build resilient and trusted crisis response processes
    • …
    corecore