726 research outputs found
Microprocessor fault-tolerance via on-the-fly partial reconfiguration
This paper presents a novel approach to exploit FPGA dynamic partial reconfiguration to improve the fault tolerance of complex microprocessor-based systems, with no need to statically reserve area to host redundant components. The proposed method not only improves the survivability of the system by allowing the online replacement of defective key parts of the processor, but also provides performance graceful degradation by executing in software the tasks that were executed in hardware before a fault and the subsequent reconfiguration happened. The advantage of the proposed approach is that thanks to a hardware hypervisor, the CPU is totally unaware of the reconfiguration happening in real-time, and there's no dependency on the CPU to perform it. As proof of concept a design using this idea has been developed, using the LEON3 open-source processor, synthesized on a Virtex 4 FPG
Advanced flight control system study
The architecture, requirements, and system elements of an ultrareliable, advanced flight control system are described. The basic criteria are functional reliability of 10 to the minus 10 power/hour of flight and only 6 month scheduled maintenance. A distributed system architecture is described, including a multiplexed communication system, reliable bus controller, the use of skewed sensor arrays, and actuator interfaces. Test bed and flight evaluation program are proposed
Intelligent redundant actuation system requirements and preliminary system design
Several redundant actuation system configurations were designed and demonstrated to satisfy the stringent operational requirements of advanced flight control systems. However, this has been accomplished largely through brute force hardware redundancy, resulting in significantly increased computational requirements on the flight control computers which perform the failure analysis and reconfiguration management. Modern technology now provides powerful, low-cost microprocessors which are effective in performing failure isolation and configuration management at the local actuator level. One such concept, called an Intelligent Redundant Actuation System (IRAS), significantly reduces the flight control computer requirements and performs the local tasks more comprehensively than previously feasible. The requirements and preliminary design of an experimental laboratory system capable of demonstrating the concept and sufficiently flexible to explore a variety of configurations are discussed
Using embedded hardware monitor cores in critical computer systems
The integration of FPGA devices in many different architectures and services
makes monitoring and real time detection of errors an important concern in FPGA
system design. A monitor is a tool, or a set of tools, that facilitate analytic
measurements in observing a given system. The goal of these observations is
usually the performance analysis and optimisation, or the surveillance of the system.
However, System-on-Chip (SoC) based designs leave few points to attach external
tools such as logic analysers. Thus, an embedded error detection core that allows
observation of critical system nodes (such as processor cores and buses) should
enforce the operation of the FPGA-based system, in order to prevent system
failures. The core should not interfere with system performance and must ensure
timely detection of errors.
This thesis is an investigation onto how a robust hardware-monitoring module
can be efficiently integrated in a target PCI board (with FPGA-based application processing
features) which is part of a critical computing system. [Continues.
EuFRATE: European FPGA Radiation-hardened Architecture for Telecommunications
The EuFRATE project aims to research, develop and test radiation-hardening methods for telecommunication
payloads deployed for Geostationary-Earth Orbit (GEO) using Commercial-Off-The-Shelf Field Programmable Gate Arrays
(FPGAs). This project is conducted by Argotec Group (Italy) with the collaboration of two partners: Politecnico di Torino
(Italy) and Technische Universit¨at Dresden (Germany). The idea of the project focuses on high-performance telecommunication
algorithms and the design and implementation strategies for connecting an FPGA device into a robust and efficient cluster
of multi-FPGA systems. The radiation-hardening techniques currently under development are addressing both device and
cluster levels, with redundant datapaths on multiple devices, comparing the results and isolating fatal errors. This paper
introduces the current state of the project’s hardware design description, the composition of the FPGA cluster node, the
proposed cluster topology, and the radiation hardening techniques. Intermediate stage experimental results of the FPGA
communication layer performance and fault detection techniques are presented. Finally, a wide summary of the project’s impact
on the scientific community is provided
Autonomous spacecraft maintenance study group
A plan to incorporate autonomous spacecraft maintenance (ASM) capabilities into Air Force spacecraft by 1989 is outlined. It includes the successful operation of the spacecraft without ground operator intervention for extended periods of time. Mechanisms, along with a fault tolerant data processing system (including a nonvolatile backup memory) and an autonomous navigation capability, are needed to replace the routine servicing that is presently performed by the ground system. The state of the art fault handling capabilities of various spacecraft and computers are described, and a set conceptual design requirements needed to achieve ASM is established. Implementations for near term technology development needed for an ASM proof of concept demonstration by 1985, and a research agenda addressing long range academic research for an advanced ASM system for 1990s are established
Fault Tolerant Electronic System Design
Due to technology scaling, which means reduced transistor size, higher density, lower voltage and more aggressive clock frequency, VLSI devices may become more sensitive against soft errors. Especially for those devices used in safety- and mission-critical applications, dependability and reliability are becoming increasingly important constraints during the development of system on/around them. Other phenomena (e.g., aging and wear-out effects) also have negative impacts on reliability of modern circuits. Recent researches show that even at sea level, radiation particles can still induce soft errors in electronic systems.
On one hand, processor-based system are commonly used in a wide variety of applications, including safety-critical and high availability missions, e.g., in the automotive, biomedical and aerospace domains. In these fields, an error may produce catastrophic consequences. Thus, dependability is a primary target that must be achieved taking into account tight constraints in terms of cost, performance, power and time to market. With standards and regulations (e.g., ISO-26262, DO-254, IEC-61508) clearly specify the targets to be achieved and the methods to prove their achievement, techniques working at system level are particularly attracting.
On the other hand, Field Programmable Gate Array (FPGA) devices are becoming more and more attractive, also in safety- and mission-critical applications due to the high performance, low power consumption and the flexibility for reconfiguration they provide. Two types of FPGAs are commonly used, based on their configuration memory cell technology, i.e., SRAM-based and Flash-based FPGA. For SRAM-based FPGAs, the SRAM cells of the configuration memory highly susceptible to radiation induced effects which can leads to system failure; and for Flash-based FPGAs, even though their non-volatile configuration memory cells are almost immune to Single Event Upsets induced by energetic particles, the floating gate switches and the logic cells in the configuration tiles can still suffer from Single Event Effects when hit by an highly charged particle. So analysis and mitigation techniques for Single Event Effects on FPGAs are becoming increasingly important in the design flow especially when reliability is one of the main requirements
Fault-tolerant fpga for mission-critical applications.
One of the devices that play a great role in electronic circuits design, specifically safety-critical design applications, is Field programmable Gate Arrays (FPGAs). This is because of its high performance, re-configurability and low development cost. FPGAs are used in many applications such as data processing, networks, automotive, space and industrial applications. Negative impacts on the reliability of such applications result from moving to smaller feature sizes in the latest FPGA architectures. This increases the need for fault-tolerant techniques to improve reliability and extend system lifetime of FPGA-based applications. In this thesis, two fault-tolerant techniques for FPGA-based applications are proposed with a built-in fault detection region. A low cost fault detection scheme is proposed for detecting faults using the fault detection region used in both schemes. The fault detection scheme primarily detects open faults in the programmable interconnect resources in the FPGAs. In addition, Stuck-At faults and Single Event Upsets (SEUs) fault can be detected. For fault recovery, each scheme has its own fault recovery approach. The first approach uses a spare module and a 2-to-1 multiplexer to recover from any fault detected. On the other hand, the second approach recovers from any fault detected using the property of Partial Reconfiguration (PR) in the FPGAs. It relies on identifying a Partially Reconfigurable block (P_b) in the FPGA that is used in the recovery process after the first faulty module is identified in the system. This technique uses only one location to recover from faults in any of the FPGA’s modules and the FPGA interconnects. Simulation results show that both techniques can detect and recover from open faults. In addition, Stuck-At faults and Single Event Upsets (SEUs) fault can also be detected. Finally, both techniques require low area overhead
NASA Formal Methods Workshop, 1990
The workshop brought together researchers involved in the NASA formal methods research effort for detailed technical interchange and provided a mechanism for interaction with representatives from the FAA and the aerospace industry. The workshop also included speakers from industry to debrief the formal methods researchers on the current state of practice in flight critical system design, verification, and certification. The goals were: define and characterize the verification problem for ultra-reliable life critical flight control systems and the current state of practice in industry today; determine the proper role of formal methods in addressing these problems, and assess the state of the art and recent progress toward applying formal methods to this area
Trading-off reliability and performance in FPGA-based reconfigurable heterogeneous systems
Recent years have witnessed the rapid growth of heterogeneous systems, composed of CPUs and hardware accelerators, to face up the constant increase of computational performance demand of digital systems. In this scenario, FPGAs offer the possibility to implement high performance reconfigurable accelerators, able to speed up the intrinsically parallel portions of applications. The study of reconfigurable heterogeneous systems is still maturing and, while some contributions about performance and power consumption are available, in literature there are few works addressing reliability. This paper analyzes reconfigurable heterogeneous systems in presence of permanent faults occurring in the FPGA. In this context, a reconfigurable heterogeneous architecture, including a Run Time Manager responsible for the communication of software tasks and the FPGA, the scheduling and the placement of hardware tasks, is presented. In addition, the paper introduces a reconfigurable heterogeneous system simulator for the proposed architecture. This simulator is able to evaluate during the design phase the degradation of the system performance due to permanent faults and allows to explore the design space dimensions efficiently
- …