Search CORE

1,601 research outputs found

Low latency reconfiguration mechanism for fine-grained processor internal functional units

Author: Nolte Jörg
Segabinazzi Ferreira Raphael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/11/2019
Field of study

The strive for performance, low power consumption, and less chip area have been diminishing the reliability and the time to fault occurrences due to wear out of electronic devices. Recent research has shown that functional units within processors usually execute a different amount of operations when running programs. Therefore, these units present different individual wear out during their lifetime. Most existent schemes for re-configuration of processors due to fault detection and other processor parameters are done at the level of cores which is a costly way to achieve redundancy. This paper presents a low latency (approximately 1 clock cycle) software controlled mechanism to reconfigure units within processor cores according to predefined parameters. Such reconfiguration capability delivers features like wear out balance of processor functional units, configuration of units according to the criticality of tasks running on an operating system and configurations to gain in performance (e.g. parallel execution) when possible. The focus of this paper is to show the implemented low latency reconfiguration mechanism and highlight its possible main features

Crossref

Digitales Repositorium der BTU Cottbus – Senftenberg

Fault-tolerant computer study

Author: Avizienis A. A.
Ercegovac M. D.
Rennels D. A.
Publication venue
Publication date
Field of study

A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed

NASA Technical Reports Server

Software Fault Tolerance in Real-Time Systems: Identifying the Future Research Questions

Author: FEDERICO REGHENZANI
WILLIAM FORNACIARI
ZHISHAN GUO
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2023
Field of study

Tolerating hardware faults in modern architectures is becoming a prominent problem due to the miniaturization of the hardware components, their increasing complexity, and the necessity to reduce the costs. Software-Implemented Hardware Fault Tolerance approaches have been developed to improve the system dependability to hardware faults without resorting to custom hardware solutions. However, these come at the expense of making the satisfaction of the timing constraints of the applications/activities harder from a scheduling standpoint. This paper surveys the current state of the art of fault tolerance approaches when used in the context real-time systems, identifying the main challenges and the cross-links between these two topics. We propose a joint scheduling-failure analysis model that highlights the formal interactions among software fault tolerance mechanisms and timing properties. This model allows us to present and discuss many open research questions with the final aim to spur the future research activities

Archivio istituzionale della ricerca - Politecnico di Milano

Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication

Author: D'Angelo Gabriele
Ferretti Stefano
Marzolla Moreno
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation middleware. FT-GAIA has being designed to reliably handle Parallel And Distributed Simulation (PADS) models, which are needed to properly simulate and analyze complex systems arising in any kind of scientific or engineering field. PADS takes advantage of multiple execution units run in multicore processors, cluster of workstations or HPC systems. However, large computing systems, such as HPC systems that include hundreds of thousands of computing nodes, have to handle frequent failures of some components. To cope with this issue, FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some protection against Byzantine failures, since interaction messages among the simulated entities are replicated as well, so that the receiving entity can identify and discard corrupted messages. Results from an analytical model and from an experimental evaluation show that FT-GAIA provides a high degree of fault tolerance, at the cost of a moderate increase in the computational load of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Urbino

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

An Approach to Cost-Efficient Fault Tolerance in Inherently Redundant Fail-Operational Systems

Author: Becker Jürgen
Dörr Tobias
Friederich Patrick
Leitner Arnd
Sandmann Timo
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/01/2020
Field of study

Crossref

KITopen

EnSuRe: Energy & Accuracy Aware Fault-tolerant Scheduling on Real-time Heterogeneous Systems

Author: Adetomi Adewale
Arslan Tughrul
Ehsan Shoaib
Kasap Server
McDonald-Maier Klaus
Saha Sangeet
Zhai Xiaojun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/06/2021
Field of study

This paper proposes an energy efficient real-time scheduling strategy called EnSuRe, which (i) executes real-time tasks on low power consuming primary processors to enhance the system accuracy by maintaining the deadline and (ii) provides reliability against a fixed number of transient faults by selectively executing backup tasks on high power consuming backup processor. Simulation results reveal that EnSuRe consumes nearly 25% less energy, compared to existing techniques, while satisfying the fault tolerance requirements. EnSuRe is also able to achieve 75% system accuracy with 50% system utilisation. Further, the obtained simulation outcomes are validated on benchmark tasks via a fault injection framework on Xilinx ZYNQ APSoC heterogeneous dual core platform

University of Essex Research Repository

Southampton (e-Prints Soton)

Coventry University Pure Portal

Performance Evaluation of Scheduling Algorithms for Real Time Cloud Computing Systems

Author: Krishna Varri Murali
Publication venue
Publication date: 01/05/2015
Field of study

Cloud computing shares data and oers services transparently among its users. With the increase in number of users of cloud the tasks to be scheduled increases. The performance of cloud depends on the task scheduling algorithms used in the scheduling components or brokering components. Scheduling of tasks on cloud computing systems is one of the research problem, Where the matching of machines and completion time of the tasks are considered. Tasks matching of machines problem is that, assume number of active hosts are Y, number of VMs in each host are Z. Maximum number of possible Virtual Machines(VMs) to schedule a single task is (y*z). If we need to schedule X tasks, number of possibilities are (y *z)^x. So scheduling of tasks is NP Hard problem. NP Hard means this scheduling of tasks on VMs not having polynomial time complexity, but it may have algorithm for verifying solution. Fault-tolerance becomes an important key to establish dependability in cloud computing system. In task scheduling, if task not completed in it's deadline ,then it is one type of fault in scheduling of tasks. In this thesis this type of faults are taken and try to overcome it. In this thesis we present a non-preemptive scheduling algorithm, By inserting the ideal time for postponing the task by ensuring the other task will completes its execution with in the deadline. In simulation the proposed algorithm maximizes the prot of 25%, throughput of 25% and minimizes the penalty of 20% over EDF

ethesis@nitr