8 research outputs found

    SafeDE: A low-cost hardware solution to enforce diverse redundancy in multicores

    Get PDF
    Failure risk must be tiny in high-integrity systems, such as those in cars, satellites and aircraft. Hence, safety measures must be deployed to avoid a single fault leading to a failure. Redundancy has been often used to address this concern, but it has been proven insufficient if a single fault can cause the same error in all redundant elements, which defeats the purpose of redundancy for error detection. Hence, to avoid this scenario, diversity is implemented along with redundancy, being lockstep execution the most popular diverse redundancy solution for computing cores. However, classic lockstep solutions have non-negligible limitations if implemented in hardware (e.g., half of the cores can only be used for redundant execution and are not even visible at user level), or in software (e.g., the software loop to enforce staggering is long and costs performance). This paper tackles the limitations of classic lockstep solutions by providing an extended analysis and evaluation of SafeDE, a Diversity Enforcement hardware module combining the short loop to enforce diversity of hardware solutions, and the nonintrusiveness of software solutions. Hence, cores can operate in lockstep mode efficiently or run independent tasks. In this paper, we present SafeDE and its rationale, its application to N-modular systems, its hardware and software integration, and an evaluation showing its performance and area efficiency, and its behavior in the presence of faults.This work was supported in part by the European Unionโ€™s Horizon 2020 Research and Innovation Programme under Grant 871467, and in part by the Spanish Ministry of Science and Innovation under Grant PID2019-107255GB-C21/AEI/10.13039/501100011033.Peer ReviewedPostprint (author's final draft

    A Survey of Recent Developments in Testability, Safety and Security of RISC-V Processors

    Get PDF
    With the continued success of the open RISC-V architecture, practical deployment of RISC-V processors necessitates an in-depth consideration of their testability, safety and security aspects. This survey provides an overview of recent developments in this quickly-evolving field. We start with discussing the application of state-of-the-art functional and system-level test solutions to RISC-V processors. Then, we discuss the use of RISC-V processors for safety-related applications; to this end, we outline the essential techniques necessary to obtain safety both in the functional and in the timing domain and review recent processor designs with safety features. Finally, we survey the different aspects of security with respect to RISC-V implementations and discuss the relationship between cryptographic protocols and primitives on the one hand and the RISC-V processor architecture and hardware implementation on the other. We also comment on the role of a RISC-V processor for system security and its resilience against side-channel attacks

    ParaDox: Eliminating Voltage Margins via Heterogeneous Fault Tolerance.

    Get PDF
    Providing reliability is becoming a challenge for chip manufacturers, faced with simultaneously trying to improve miniaturization, performance and energy efficiency. This leads to very large margins on voltage and frequency, designed to avoid errors even in the worst case, along with significant hardware expenditure on eliminating voltage spikes and other forms of transient error, causing considerable inefficiency in power consumption and performance. We flip traditional ideas about reliability and performance around, by exploring the use of error resilience for power and performance gains. ParaMedic is a recent architecture that provides a solution for reliability with low overheads via automatic hardware error recovery. It works by splitting up checking onto many small cores in a heterogeneous multicore system with hardware logging support. However, its design is based on the idea that errors are exceptional. We transform ParaMedic into ParaDox, which shows high performance in both error-intensive and scarce-error scenarios, thus allowing correct execution even when undervolted and overclocked. Evaluation within error-intensive simulation environments confirms the error resilience of ParaDox and the low associated recovery cost. We estimate that compared to a non-resilient system with margins, ParaDox can reduce energy-delay product by 15% through undervolting, while completely recovering from any induced errors

    DC-Patch: A Microarchitectural Fault Patching Technique for GPU Register Files

    Get PDF
    The ever-increasing parallelism demand of General-Purpose Graphics Processing Unit (GPGPU) applications pushes toward larger and more energy-hungry register files in successive GPU generations. Reducing the supply voltage beyond its safe limit is an effective way to improve the energy efficiency of register files. However, at these operating voltages, the reliability of the circuit is compromised. This work aims to tolerate permanent faults from process variations in large GPU register files operating below the safe supply voltage limit. To do so, this paper proposes a microarchitectural patching technique, DC-Patch, exploiting the inherent data redundancy of applications to compress registers at run-time with neither compiler assistance nor instruction set modifications. Instead of disabling an entire faulty register file entry, DC-Patch leverages the reliable cells within a faulty entry to store compressed register values. Experimental results show that, with more than a third of faulty register entries, DC-Patch ensures a reliable operation of the register file and reduces the energy consumption by 47% with respect to a conventional register file working at nominal supply voltage. The energy savings are 21% compared to a voltage noise smoothing scheme operating at the safe supply voltage limit. These benefits are obtained with less than 2 and 6% impact on the system performance and area, respectively

    Design of a diversity enforcement module for safety critical processing systems

    Get PDF
    Safety-critical systems must adhere to specific functional safety standards describing the development process for those systems. One key requirement is the ability to avoid a single fault from causing a system failure, or in other words, avoiding Common Cause Failures (CCFs). Redundancy is a usual solution against CCFs. However, some specific CCFs may affect redundant components identically (e.g., voltage droops, clock interferences), hence potentially leading to identical errors that may go unnoticed and cause a failure. Diversity is often deployed along with redundancy to avoid also those CCFs. In the particular case of computing elements (e.g., cores), this is usually realized with some form of lockstep execution where two identical cores execute the same software, but with some time shift among them (aka staggering). Therefore, both cores have different state at any point in time and faults affecting both cores lead to different errors, which can be detected by comparing the outputs. Unfortunately, existing solutions have some non-negligible costs: (i) hardware-only solutions hide half of the cores making them non-user visible, hence halving platform performance even for non-critical tasks. Conversely, (ii) software-only solutions are much more flexible but impose the use of a third core to run the lockstep monitor, and require large staggering which has significant impact in performance for short programs. This thesis devises a new solution aiming at combining the advantages of existing solutions. Our proposal, a hardware diversity-enforcement module (referred to as SafeDE), is an efficient hardware realization of the software monitor. Therefore, it does not hide any core to the end user, it does not require a third core for monitoring purposes, and allows operating with tiny staggering (e.g., few tens of cycles instead of hundreds of thousands as required for the software-only solution). We implement and integrate SafeDE in a space multicore prototype in an FPGA and validate that it effectively achieves its requirements with negligible hardware costs. Moreover, this work has already led to the publication of two peer-reviewed articles in especialized conferences and journals

    ์ตœ์‹  ECU๋ณด๋“œ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์†Œํ”„ํŠธ์—๋Ÿฌ๋“ค์„ ์‹ค์‹œ๊ฐ„ ๋ณต๊ตฌํ•˜๋Š” ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2020. 8. ์ด์ฐฝ๊ฑด.This dissertation presents the fault-tolerant real-time scheduling using dynamic mode switch support of modern ECU hardware. This dissertation first describes the optimal capacity of the Periodic Resource which contains harmonic periodic task set using the exact time supply function.We show that the optimal capacity can be represented as sum of the each individual utilization of the task in the harmonic periodic task set for both normal state(i.e. no faults) and faulty state. Then, this dissertation proposes non-critical task overlapping technique by only using the idle time intervals of the Periodic Resource in order to overlap the non-critical tasks which ensures no additional capacity increase. Finally, this dissertation proposes the basic form of the Periodic Resources in order to efficiently use the dynamic mode switch support. Next, we also proposes the bin-packing heuristic algorithm that considers both making sub-taskset as a one Periodic Resource and Periodic Resource wide bin-packing which has the pseudo-polynomial time complexity. Experimental results show that the proposed algorithm performs better than the traditional partitioned fixed-priority scheduling approach and partitioned mixed-criticality scheduling approach. Also, the achievement is made up to 18% in terms of the total needed cores compared to traditional partitioned fixed-priority approach for making the given input task set schedulable.๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํšจ์œจ์ ์ธ ์žฌ๊ตฌ์„ฑ๊ฐ€๋Šฅ ์‹œ์Šคํ…œ ์‚ฌ์šฉ์„ ์œ„ํ•œ ๊ณ„์ธต๊ธฐ๋ฐ˜ ์‹ค์‹œ๊ฐ„ ๊ฒฐํ•จ ๊ฐ๋‚ด ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์ฃผ๊ธฐ ์ž์› ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ์ตœ์  ์ฃผ๊ธฐ ์ž์› ์„œ๋ฒ„์˜ ์šฉ๋Ÿ‰์„ ์ฃผ๊ธฐ ์ž์› ๋ชจ๋ธ์ด ๊ฐ€์ง€๋Š” ์‹ค์‹œ๊ฐ„ ์ฃผ๊ธฐ ํƒœ์Šคํฌ ์…‹์˜ ์œ ํ‹ธ๋ผ์ด์ œ์ด์…˜์˜ ํ•ฉ์œผ๋กœ ์ œ์‹œํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ํ•ด๋‹น ์ตœ์  ์„œ๋ฒ„ ์šฉ๋Ÿ‰์„ ์‹œ์Šคํ…œ์ด ์ •์ƒ ๋™์ž‘ํ• ๋•Œ์™€ ์˜ค๋™์ž‘ ํ• ๋•Œ ๋ชจ๋‘์— ๋Œ€ํ•ด์„œ ์ œ์‹œํ•œ๋‹ค. ๋‹ค์Œ์œผ๋กœ, ๋น„์ค‘์š” ํƒœ์Šคํฌ ์…‹๋“ค์„ ์ค‘์š” ์ฃผ๊ธฐ ์ž์› ์„œ๋ฒ„์˜ ์—ฌ๋ถ„ ๊ณต๋ฐฑ ์‹œ๊ฐ„์„ ํ™œ์šฉํ•ด ์„œ๋ฒ„ ์šฉ๋Ÿ‰์˜ ์ฆ๊ฐ€ ์—†์ด ๋น„์ค‘์š” ํƒœ์Šคํฌ๋ฅผ ์ค‘์š” ์ฃผ๊ธฐ ์ž์› ์„œ๋ฒ„์— ํ• ๋‹นํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ณธ ๋…ผ๋ฌธ์€ ์ฃผ๊ธฐ ์ž์› ์„œ๋ฒ„ ๋‹จ์œ„์˜ ํŒŒํ‹ฐ์…˜ ๊ธฐ๋ฒ•๊ณผ ์ฃผ๊ธฐ ํƒœ์Šคํฌ๋ฅผ ํ•˜๋‚˜์˜ ์ฃผ๊ธฐ ์ž์› ์„œ๋ฒ„๋กœ ๋งŒ๋“œ๋Š” ๋นˆํŒจํ‚น ํœด๋ฆฌ์Šคํ‹ฑ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์‹œํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ธฐ์กด์— ์‚ฌ์šฉ๋˜์—ˆ๋˜ ํŒŒํ‹ฐ์…˜ ๊ธฐ๋ฐ˜ ์šฐ์„ ์ˆœ์œ„ ์Šค์ผ€์ค„๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ํŒŒํ‹ฐ์…˜ ๊ธฐ๋ฐ˜ ์šฐ์„ ์ˆœ์œ„ ํ˜ผ์žก ์ค‘์š”๋„ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋ณด๋‹ค ๋” ์ž‘์€ ์ˆ˜์˜ ์ฝ”์–ด์˜ ๊ฐœ์ˆ˜๋ฅผ ๋„์ถœ ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ์‹คํ—˜๊ฒฐ๊ณผ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ, ๋ณธ ์—ฐ๊ตฌ์—์„œ ์ œ์•ˆํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์žฌ๊ตฌ์„ฑ๊ฐ€๋Šฅ ์‹œ์Šคํ…œ์— ํ™œ์šฉํ•œ๋‹ค๋ฉด ๊ธฐ์กด ๋ฐฉ๋ฒ• ๋Œ€๋น„ ์ตœ๋Œ€ 18%์˜ ์ฝ”์–ด์ ˆ๊ฐํšจ๊ณผ๋ฅผ ๊ธฐ๋Œ€ํ• ์ˆ˜ ์žˆ๋‹ค.1 Introduction 1 1.1 Motivation and Objective 1 1.2 Approach 2 1.3 Organization 6 2 System Model 7 3 Schedulability Analysis 10 3.1 Background 10 3.2 Optimal Capacity Analysis During Normal State 14 3.3 Optimal Capacity Analysis During Fault State 16 3.4 Periodic Resource Wide Schedulability Test 20 3.5 Non-Critical Task Overlapping 24 4 Proposed Approach 26 4.1 Minimum Harmonic Partitions of the Task Set 26 4.2 Proposed Heuristic Algorithm 28 4.2.1 Choosing Detection method 28 4.2.2 Packing Minimum Harmonic Partitions 29 4.2.3 Packing Free Tasks 30 4.2.4 Packing Non-Critical Tasks 31 4.3 Algorithm Description 32 5 Evaluation 35 5.1 Experimental Setup 35 5.2 Simulation Results 36 5.2.1 Free Task Bin-Packing 38 5.2.2 Minimum Harmonic Partitions Bin-Packing 40 5.2.3 Effect of Non-Critical Task Overlapping 43 5.2.4 Effect of State-Wise Computation 45 6 Related Works 46 6.1 Hierarchical Fault-Tolerant Real-Time Scheduling 46 6.2 Error Detection Method 46 7 Conclusion 48 References 50Maste

    Self-Test Mechanisms for Automotive Multi-Processor System-on-Chips

    Get PDF
    L'abstract รจ presente nell'allegato / the abstract is in the attachmen
    corecore