Search CORE

8 research outputs found

SafeDE: A low-cost hardware solution to enforce diverse redundancy in multicores

Author: Abella Ferrer Jaume
Alcaide Portet Sergi
Bas Jalón Francisco
Benedicte Illescas Pedro
Cabo Pitarch Guillem
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2022
Field of study

Failure risk must be tiny in high-integrity systems, such as those in cars, satellites and aircraft. Hence, safety measures must be deployed to avoid a single fault leading to a failure. Redundancy has been often used to address this concern, but it has been proven insufficient if a single fault can cause the same error in all redundant elements, which defeats the purpose of redundancy for error detection. Hence, to avoid this scenario, diversity is implemented along with redundancy, being lockstep execution the most popular diverse redundancy solution for computing cores. However, classic lockstep solutions have non-negligible limitations if implemented in hardware (e.g., half of the cores can only be used for redundant execution and are not even visible at user level), or in software (e.g., the software loop to enforce staggering is long and costs performance). This paper tackles the limitations of classic lockstep solutions by providing an extended analysis and evaluation of SafeDE, a Diversity Enforcement hardware module combining the short loop to enforce diversity of hardware solutions, and the nonintrusiveness of software solutions. Hence, cores can operate in lockstep mode efficiently or run independent tasks. In this paper, we present SafeDE and its rationale, its application to N-modular systems, its hardware and software integration, and an evaluation showing its performance and area efficiency, and its behavior in the presence of faults.This work was supported in part by the European Union’s Horizon 2020 Research and Innovation Programme under Grant 871467, and in part by the Spanish Ministry of Science and Innovation under Grant PID2019-107255GB-C21/AEI/10.13039/501100011033.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

A Survey of Recent Developments in Testability, Safety and Security of RISC-V Processors

Author: Abolfazl Sajadi
Bernd Becker
Carles Hernandez
Denis Schwachhofer
Ilia Polian
Ilya Tuzov
Jens Anders
Mahnaz Namazi Rizi
Mathias Sauer
Matteo Sonza Reorda
Nele Mentens
Nikolaos Deligiannis
Nourhan Elhamawy
Nuša Zidaric
Pablo Andreu
Riccardo Cantoro
Stefan Wagner
Steffen Becker
Tobias Faller
Todor Stefanov
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

With the continued success of the open RISC-V architecture, practical deployment of RISC-V processors necessitates an in-depth consideration of their testability, safety and security aspects. This survey provides an overview of recent developments in this quickly-evolving field. We start with discussing the application of state-of-the-art functional and system-level test solutions to RISC-V processors. Then, we discuss the use of RISC-V processors for safety-related applications; to this end, we outline the essential techniques necessary to obtain safety both in the functional and in the timing domain and review recent processor designs with safety features. Finally, we survey the different aspects of security with respect to RISC-V implementations and discuss the relationship between cryptographic protocols and primitives on the one hand and the RISC-V processor architecture and hardware implementation on the other. We also comment on the role of a RISC-V processor for system security and its resilience against side-channel attacks

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Leiden University Scholary Publications

ParaDox: Eliminating Voltage Margins via Heterogeneous Fault Tolerance.

Author: Ainsworth Sam
Jones Timothy M
Mycroft Alan
Zoubritzky Lionel
Publication venue: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)
Publication date: 22/04/2021
Field of study

Providing reliability is becoming a challenge for chip manufacturers, faced with simultaneously trying to improve miniaturization, performance and energy efficiency. This leads to very large margins on voltage and frequency, designed to avoid errors even in the worst case, along with significant hardware expenditure on eliminating voltage spikes and other forms of transient error, causing considerable inefficiency in power consumption and performance. We flip traditional ideas about reliability and performance around, by exploring the use of error resilience for power and performance gains. ParaMedic is a recent architecture that provides a solution for reliability with low overheads via automatic hardware error recovery. It works by splitting up checking onto many small cores in a heterogeneous multicore system with hardware logging support. However, its design is based on the idea that errors are exceptional. We transform ParaMedic into ParaDox, which shows high performance in both error-intensive and scarce-error scenarios, thus allowing correct execution even when undervolted and overclocked. Evaluation within error-intensive simulation environments confirms the error resilience of ParaDox and the low associated recovery cost. We estimate that compared to a non-resilient system with margins, ParaDox can reduce energy-delay product by 15% through undervolting, while completely recovering from any induced errors

Edinburgh Research Explorer

Apollo (Cambridge)

DC-Patch: A Microarchitectural Fault Patching Technique for GPU Register Files

Author: Gran-Tejero R.
Suarez-Gracia D.
Valero A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

The ever-increasing parallelism demand of General-Purpose Graphics Processing Unit (GPGPU) applications pushes toward larger and more energy-hungry register files in successive GPU generations. Reducing the supply voltage beyond its safe limit is an effective way to improve the energy efficiency of register files. However, at these operating voltages, the reliability of the circuit is compromised. This work aims to tolerate permanent faults from process variations in large GPU register files operating below the safe supply voltage limit. To do so, this paper proposes a microarchitectural patching technique, DC-Patch, exploiting the inherent data redundancy of applications to compress registers at run-time with neither compiler assistance nor instruction set modifications. Instead of disabling an entire faulty register file entry, DC-Patch leverages the reliable cells within a faulty entry to store compressed register values. Experimental results show that, with more than a third of faulty register entries, DC-Patch ensures a reliable operation of the register file and reduces the energy consumption by 47% with respect to a conventional register file working at nominal supply voltage. The energy savings are 21% compared to a voltage noise smoothing scheme operating at the safe supply voltage limit. These benefits are obtained with less than 2 and 6% impact on the system performance and area, respectively

Repositorio Universidad de Zaragoza

Design of a diversity enforcement module for safety critical processing systems

Author: Bas Jalón Francisco
Publication venue: Universitat Politècnica de Catalunya
Publication date: 02/07/2022
Field of study

Safety-critical systems must adhere to specific functional safety standards describing the development process for those systems. One key requirement is the ability to avoid a single fault from causing a system failure, or in other words, avoiding Common Cause Failures (CCFs). Redundancy is a usual solution against CCFs. However, some specific CCFs may affect redundant components identically (e.g., voltage droops, clock interferences), hence potentially leading to identical errors that may go unnoticed and cause a failure. Diversity is often deployed along with redundancy to avoid also those CCFs. In the particular case of computing elements (e.g., cores), this is usually realized with some form of lockstep execution where two identical cores execute the same software, but with some time shift among them (aka staggering). Therefore, both cores have different state at any point in time and faults affecting both cores lead to different errors, which can be detected by comparing the outputs. Unfortunately, existing solutions have some non-negligible costs: (i) hardware-only solutions hide half of the cores making them non-user visible, hence halving platform performance even for non-critical tasks. Conversely, (ii) software-only solutions are much more flexible but impose the use of a third core to run the lockstep monitor, and require large staggering which has significant impact in performance for short programs. This thesis devises a new solution aiming at combining the advantages of existing solutions. Our proposal, a hardware diversity-enforcement module (referred to as SafeDE), is an efficient hardware realization of the software monitor. Therefore, it does not hide any core to the end user, it does not require a third core for monitoring purposes, and allows operating with tiny staggering (e.g., few tens of cycles instead of hundreds of thousands as required for the software-only solution). We implement and integrate SafeDE in a space multicore prototype in an FPGA and validate that it effectively achieves its requirements with negligible hardware costs. Moreover, this work has already led to the publication of two peer-reviewed articles in especialized conferences and journals

UPCommons. Portal del coneixement obert de la UPC

최신 ECU보드를 활용하여 소프트에러들을 실시간 복구하는 기법

Author: 정재환
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (석사) -- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2020. 8. 이창건.This dissertation presents the fault-tolerant real-time scheduling using dynamic mode switch support of modern ECU hardware. This dissertation first describes the optimal capacity of the Periodic Resource which contains harmonic periodic task set using the exact time supply function.We show that the optimal capacity can be represented as sum of the each individual utilization of the task in the harmonic periodic task set for both normal state(i.e. no faults) and faulty state. Then, this dissertation proposes non-critical task overlapping technique by only using the idle time intervals of the Periodic Resource in order to overlap the non-critical tasks which ensures no additional capacity increase. Finally, this dissertation proposes the basic form of the Periodic Resources in order to efficiently use the dynamic mode switch support. Next, we also proposes the bin-packing heuristic algorithm that considers both making sub-taskset as a one Periodic Resource and Periodic Resource wide bin-packing which has the pseudo-polynomial time complexity. Experimental results show that the proposed algorithm performs better than the traditional partitioned fixed-priority scheduling approach and partitioned mixed-criticality scheduling approach. Also, the achievement is made up to 18% in terms of the total needed cores compared to traditional partitioned fixed-priority approach for making the given input task set schedulable.본 논문에서는 효율적인 재구성가능 시스템 사용을 위한 계층기반 실시간 결함 감내 스케줄링 기법을 제안한다. 본 연구는 주기 자원 모델을 기반으로, 최적 주기 자원 서버의 용량을 주기 자원 모델이 가지는 실시간 주기 태스크 셋의 유틸라이제이션의 합으로 제시한다. 본 논문은 해당 최적 서버 용량을 시스템이 정상 동작할때와 오동작 할때 모두에 대해서 제시한다. 다음으로, 비중요 태스크 셋들을 중요 주기 자원 서버의 여분 공백 시간을 활용해 서버 용량의 증가 없이 비중요 태스크를 중요 주기 자원 서버에 할당하는 방법론을 제시한다. 마지막으로 본 논문은 주기 자원 서버 단위의 파티션 기법과 주기 태스크를 하나의 주기 자원 서버로 만드는 빈패킹 휴리스틱 알고리즘을 제시한다. 실험 결과, 본 논문에서 제시한 알고리즘은 기존에 사용되었던 파티션 기반 우선순위 스케줄링 알고리즘과 파티션 기반 우선순위 혼잡 중요도 알고리즘보다 더 작은 수의 코어의 개수를 도출 할 수 있음을 보인다. 실험결과를 기반으로, 본 연구에서 제안한 알고리즘을 재구성가능 시스템에 활용한다면 기존 방법 대비 최대 18%의 코어절감효과를 기대할수 있다.1 Introduction 1 1.1 Motivation and Objective 1 1.2 Approach 2 1.3 Organization 6 2 System Model 7 3 Schedulability Analysis 10 3.1 Background 10 3.2 Optimal Capacity Analysis During Normal State 14 3.3 Optimal Capacity Analysis During Fault State 16 3.4 Periodic Resource Wide Schedulability Test 20 3.5 Non-Critical Task Overlapping 24 4 Proposed Approach 26 4.1 Minimum Harmonic Partitions of the Task Set 26 4.2 Proposed Heuristic Algorithm 28 4.2.1 Choosing Detection method 28 4.2.2 Packing Minimum Harmonic Partitions 29 4.2.3 Packing Free Tasks 30 4.2.4 Packing Non-Critical Tasks 31 4.3 Algorithm Description 32 5 Evaluation 35 5.1 Experimental Setup 35 5.2 Simulation Results 36 5.2.1 Free Task Bin-Packing 38 5.2.2 Minimum Harmonic Partitions Bin-Packing 40 5.2.3 Effect of Non-Critical Task Overlapping 43 5.2.4 Effect of State-Wise Computation 45 6 Related Works 46 6.1 Hierarchical Fault-Tolerant Real-Time Scheduling 46 6.2 Error Detection Method 46 7 Conclusion 48 References 50Maste

SNU Open Repository and Archive

The Arm Triple Core Lock-Step (TCLS) Processor

Author: Amort T.
Atmel Corp.
Balaji Venu
Berg M.
DeCoursey R.
Degalahal V.
Doyle R.
Emre Ozer
Ghahroodi M. M.
Ginosar R.
Gizopoulos D.
Gregoire Gimenez
Hans-Ulrich Zurek
Hargrove M. J.
Hillman R.
Hjorth M.
Iturbe X.
Iturbe X.
Iturbe X.
Jean-Luc Poupat
Johnston A. H.
Kanekal S.
Koebel F.
Koopman P.
Kuschel T.
Poupat J. L.
Poupat J. L.
Resch S.
Rudolph D.
Tech Infineon
Venu B.
Wilson C.
Xabier Iturbe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Self-Test Mechanisms for Automotive Multi-Processor System-on-Chips

Author: FLORIDIA ANDREA
Publication venue: country:Italy
Publication date: 23/09/2021
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)