Search CORE

289 research outputs found

Microprocessor fault-tolerance via on-the-fly partial reconfiguration

Author: Di Carlo Stefano
Miele Andrea
Prinetto Paolo Ernesto
Trapanese Antonio
Publication venue: IEEE Computer Society
Publication date: 01/01/2010
Field of study

This paper presents a novel approach to exploit FPGA dynamic partial reconfiguration to improve the fault tolerance of complex microprocessor-based systems, with no need to statically reserve area to host redundant components. The proposed method not only improves the survivability of the system by allowing the online replacement of defective key parts of the processor, but also provides performance graceful degradation by executing in software the tasks that were executed in hardware before a fault and the subsequent reconfiguration happened. The advantage of the proposed approach is that thanks to a hardware hypervisor, the CPU is totally unaware of the reconfiguration happening in real-time, and there's no dependency on the CPU to perform it. As proof of concept a design using this idea has been developed, using the LEON3 open-source processor, synthesized on a Virtex 4 FPG

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A Framework for implementing radiation-tolerant circuits on reconfigurable FPGAs

Author: Gustavo R. Alves
José M. Ferreira
Luís F. Lemos
Manuel G. Gericota
Publication venue
Publication date: 01/01/2006
Field of study

The outstanding versatility of SRAM-based FPGAs make them the preferred choice for implementing complex customizable circuits. To increase the amount of logic available, manufacturers are using nanometric technologies to boost logic density and reduce prices. However, the use of nanometric scales also makes FPGAs particularly vulnerable to radiation-induced faults, especially because of the increasing amount of configuration memory cells that are necessary to define their functionality. This paper describes a framework for implementing circuits immune to radiation-induced faults, based on a customized Triple Modular Redundancy (TMR) infrastructure and on a detection-and-fix controller. This controller is responsible for the detection of data incoherencies, location of the faulty module and restoration of the original configuration, without affecting the normal operation of the mission logic. A short survey of the most recent data published concerning the impact of radiation-induced faults in FPGAs is presented to support the assumptions underlying our proposed framework. A detailed explanation of the controller functionality is also provided, followed by an experimental case study

Repositório Científico do Instituto Politécnico do Porto

Repositório Aberto da Universidade do Porto

Analyse und Erweiterung eines fehler-toleranten NoC für SRAM-basierte FPGAs in Weltraumapplikationen

Author: Bubenhagen Frank
Publication venue
Publication date: 01/01/2019
Field of study

Data Processing Units for scientific space mission need to process ever higher volumes of data and perform ever complex calculations. But the performance of available space-qualified general purpose processors is just in the lower three digit megahertz range, which is already insufficient for some applications. As an alternative, suitable processing steps can be implemented in hardware on a space-qualified SRAM-based FPGA. However, suitable devices are susceptible against space radiation. At the Institute for Communication and Network Engineering a fault-tolerant, network-based communication architecture was developed, which enables the construction of processing chains on the basis of different processing modules within suitable SRAM-based FPGAs and allows the exchange of single processing modules during runtime, too. The communication architecture and its protocol shall isolate non SEU mitigated or just partial SEU mitigated modules affected by radiation-induced faults to prohibit the propagation of errors within the remaining System-on-Chip. In the context of an ESA study, this communication architecture was extended with further components and implemented in a representative hardware platform. Based on the acquired experiences during the study, this work analyses the actual fault-tolerance characteristics as well as weak points of this initial implementation. At appropriate locations, the communication architecture was extended with mechanisms for fault-detection and fault-differentiation as well as with a hardware-based monitoring solution. Both, the former measures and the extension of the employed hardware-platform with selective fault-injection capabilities for the emulation of radiation-induced faults within critical areas of a non SEU mitigated processing module, are used to evaluate the effects of radiation-induced faults within the communication architecture. By means of the gathered results, further measures to increase fast detection and isolation of faulty nodes are developed, selectively implemented and verified. In particular, the ability of the communication architecture to isolate network nodes without SEU mitigation could be significantly improved.Instrumentenrechner für wissenschaftliche Weltraummissionen müssen ein immer höheres Datenvolumen verarbeiten und immer komplexere Berechnungen ausführen. Die Performanz von verfügbaren qualifizierten Universalprozessoren liegt aber lediglich im unteren dreistelligen Megahertz-Bereich, was für einige Anwendungen bereits nicht mehr ausreicht. Als Alternative bietet sich die Implementierung von entsprechend geeigneten Datenverarbeitungsschritten in Hardware auf einem qualifizierten SRAM-basierten FPGA an. Geeignete Bausteine sind jedoch empfindlich gegenüber der Strahlungsumgebung im Weltraum. Am Institut für Datentechnik und Kommunikationsnetze wurde eine fehlertolerante netzwerk-basierte Kommunikationsarchitektur entwickelt, die innerhalb eines geeigneten SRAM-basierten FPGAs Datenverarbeitungsmodule miteinander nach Bedarf zu Verarbeitungsketten verbindet, sowie den Austausch von einzelnen Modulen im Betrieb ermöglicht. Nicht oder nur partiell SEU mitigierte Module sollen bei strahlungsbedingten Fehlern im Modul durch das Protokoll und die Fehlererkennungsmechanismen der Kommunikationsarchitektur isoliert werden, um ein Ausbreiten des Fehlers im restlichen System-on-Chip zu verhindern. Im Kontext einer ESA Studie wurde diese Kommunikationsarchitektur um Komponenten erweitert und auf einer repräsentativen Hardwareplattform umgesetzt. Basierend auf den gesammelten Erfahrungen aus der Studie, wird in dieser Arbeit eine Analyse der tatsächlichen Fehlertoleranz-Eigenschaften sowie der Schwachstellen dieser ursprünglichen Implementierung durchgeführt. Die Kommunikationsarchitektur wurde an geeigneten Stellen um Fehlerdetektierungs- und Fehlerunterscheidungsmöglichkeiten erweitert, sowie um eine hardwarebasierte Überwachung ergänzt. Sowohl diese Maßnahmen, als auch die Erweiterung der Hardwareplattform um gezielte Fehlerinjektions-Möglichkeiten zum Emulieren von strahlungsinduzierten Fehlern in kritischen Komponenten eines nicht SEU mitigierten Prozessierungsmoduls werden genutzt, um die tatsächlichen auftretenden Effekte in der Kommunikationsarchitektur zu evaluieren. Anhand der Ergebnisse werden weitere Verbesserungsmaßnahmen speziell zur schnellen Detektierung und Isolation von fehlerhaften Knoten erarbeitet, selektiv implementiert und verifiziert. Insbesondere die Fähigkeit, fehlerhafte, nicht SEU mitigierte Netzwerkknoten innerhalb der Kommunikationsarchitektur zu isolieren, konnte dabei deutlich verbessert werden

Digitale Bibliothek Braunschweig

EuFRATE: European FPGA Radiation-hardened Architecture for Telecommunications

Author: Azimi Sarah
Bozzoli Ludovica
Catanese Antonio
Charaf Najdet
Fazzoletto Emilio
Goehringer Diana
Kalms Lester
King Stephen
La Greca Salvatore Gabriele
Merodio Cordinachs David
Pertuz Sergio
Rizzieri Daniele
Scarpa Eugenio
Sterpone Luca
Wulf Cornelia
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

The EuFRATE project aims to research, develop and test radiation-hardening methods for telecommunication payloads deployed for Geostationary-Earth Orbit (GEO) using Commercial-Off-The-Shelf Field Programmable Gate Arrays (FPGAs). This project is conducted by Argotec Group (Italy) with the collaboration of two partners: Politecnico di Torino (Italy) and Technische Universit¨at Dresden (Germany). The idea of the project focuses on high-performance telecommunication algorithms and the design and implementation strategies for connecting an FPGA device into a robust and efficient cluster of multi-FPGA systems. The radiation-hardening techniques currently under development are addressing both device and cluster levels, with redundant datapaths on multiple devices, comparing the results and isolating fatal errors. This paper introduces the current state of the project’s hardware design description, the composition of the FPGA cluster node, the proposed cluster topology, and the radiation hardening techniques. Intermediate stage experimental results of the FPGA communication layer performance and fault detection techniques are presented. Finally, a wide summary of the project’s impact on the scientific community is provided

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Radiation Induced Fault Detection, Diagnosis, and Characterization of Field Programmable Gate Arrays

Author: Getz Thomas B.
Publication venue: AFIT Scholar
Publication date: 11/03/2011
Field of study

The development of Field Programmable Gate Arrays (FPGAs) has been a great achievement in the world of micro-electronics. One of these devices can be programmed to do just about anything, and replace the need for thousands of individual specialized devices. Despite their great versatility, FPGAs are still extremely vulnerable to radiation from cosmic waves in space and from adversaries on the ground. Extensive research has been conducted to examine how radiation disrupts different types of FPGAs. The results show, unfortunately, that the newer FPGAs with smaller technology are even more susceptible to radiation damage than the older ones. This research incorporates and enhances current methods of radiation detection. The design consists of 15 sensor networks that each have 29 sensors. The sensors are simple inverters, but they have the ability to detect flipped bits and delay errors caused by radiation. Analyzers process the outputs of each sensor to determine if the value agrees with what is expected. This information is fed to a reporter that creates an easy-to-read output that describes which network the fault is in, what type of fault is present, how many are in the network, how long they have been there, and the percent slowdown if it is a delay issue. Each network reports any fault data, to the computer screen in real time. This design does need some improvement, but once those improvements are made and tested, this system can be incorporated with FPGA reconfiguration methods that automatically place application logic away from failing errors of the FPGA. This system has great potential to become a great too in fault mitigation

AFTI Scholar (Air Force Institute of Technology)

Novel hardware verification methods for FPGAs

Author: Kourfali Alexandra
Publication venue
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography

Fault Tolerant Nanosatellite Computing on a Budget

Author: Cheng Yu-Min
Chou Pai
Fuchs Christian M.
Furano Gianluca
Holst Stefan
Liou Jing-Jia
Lu Shyue-Kung
Magistrati Giorgio
Marinis Kostas
Murillo Nadia M.
Plaat Aske
Tavoularis Antonios
Wen Xiaoqing
Publication venue: DigitalCommons@USU
Publication date: 05/08/2019
Field of study

In this contribution, we present a CubeSat-compatible on-board computer (OBC) architecture that offers strong fault tolerance to enable the use of such spacecraft in critical and long-term missions. We describe in detail the design of our OBC’s breadboard setup, and document its composition from the component-level, all the way down to the software level. Fault tolerance in this OBC is achieved without resorting to radiation hardening, just intelligent through software. The OBC ages graceful, and makes use of FPGA-reconfiguration and mixed criticality. It can dynamically adapt to changing performance requirements throughout a space mission. We developed a proof-of-concept with several Xilinx Ultrascale and Ultrascale+ FPGAs. With the smallest Kintex Ultrascale+ KU3P device, we achieve 1.94W total power consumption at 300Mhz, well within the power budget range of current 2U CubeSats. To our knowledge, this is the first scalable and COTS-based, widely reproducible OBC solution which can offer strong fault coverage even for small CubeSats. To reproduce this OBC architecture, no custom-written, proprietary, or protected IP is needed, and the needed design tools are available free-of-charge to academics. All COTS components required to construct this architecture can be purchased on the open market, and are affordable even for academic and scientific CubeSat developers

DigitalCommons@USU

Robust configurable system design with built-in self-healing

Author: Gustavo Alves
José Ferreira
Manuel Gericota
Publication venue
Publication date: 01/01/2005
Field of study

The new generations of SRAM-based FPGA (Field Programmable Gate Array) devices, built on nanometre technology, are the preferred choice for the implementation of reconfigurable computing platforms. However, their vulnerability to hard and soft errors is a major weakness to robust system design based on FPGAs. In this paper, a novel Built-In Self-Healing (BISH) methodology, based on modular redundancy and on selfreconfiguration, is proposed. A soft microprocessor core implemented in the FPGA is responsible for the management and execution of all the BISH procedures. Fault detection and diagnosis is followed by repairing actions, taking advantage of the self-configuration features. Meanwhile, modular redundancy assures that the system still works correctly. This approach leads to a robust system design able to assure high reliability, availability and data integrity

Repositório Científico do Instituto Politécnico do Porto

Repositório Aberto da Universidade do Porto

An Adaptive Modular Redundancy Technique to Self-regulate Availability, Area, and Energy Consumption in Mission-critical Applications

Author: Al-Haddad Rawad N.
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2011
Field of study

As reconfigurable devices\u27 capacities and the complexity of applications that use them increase, the need for self-reliance of deployed systems becomes increasingly prominent. A Sustainable Modular Adaptive Redundancy Technique (SMART) composed of a dual-layered organic system is proposed, analyzed, implemented, and experimentally evaluated. SMART relies upon a variety of self-regulating properties to control availability, energy consumption, and area used, in dynamically-changing environments that require high degree of adaptation. The hardware layer is implemented on a Xilinx Virtex-4 Field Programmable Gate Array (FPGA) to provide self-repair using a novel approach called a Reconfigurable Adaptive Redundancy System (RARS). The software layer supervises the organic activities within the FPGA and extends the self-healing capabilities through application-independent, intrinsic, evolutionary repair techniques to leverage the benefits of dynamic Partial Reconfiguration (PR). A SMART prototype is evaluated using a Sobel edge detection application. This prototype is shown to provide sustainability for stressful occurrences of transient and permanent fault injection procedures while still reducing energy consumption and area requirements. An Organic Genetic Algorithm (OGA) technique is shown capable of consistently repairing hard faults while maintaining correct edge detector outputs, by exploiting spatial redundancy in the reconfigurable hardware. A Monte Carlo driven Continuous Markov Time Chains (CTMC) simulation is conducted to compare SMART\u27s availability to industry-standard Triple Modular Technique (TMR) techniques. Based on nine use cases, parameterized with realistic fault and repair rates acquired from publically available sources, the results indicate that availability is significantly enhanced by the adoption of fast repair techniques targeting aging-related hard-faults. Under harsh environments, SMART is shown to improve system availability from 36.02% with lengthy repair techniques to 98.84% with fast ones. This value increases to five nines (99.9998%) under relatively more favorable conditions. Lastly, SMART is compared to twenty eight standard TMR benchmarks that are generated by the widely-accepted BL-TMR tools. Results show that in seven out of nine use cases, SMART is the recommended technique, with power savings ranging from 22% to 29%, and area savings ranging from 17% to 24%, while still maintaining the same level of availability

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)