39 research outputs found

    Fault tolerant methods for reliability in FPGAs

    Full text link

    Reliable and Fault-Resilient Schemes for Efficient Radix-4 Complex Division

    Get PDF
    Complex division is commonly used in various applications in signal processing and control theory including astronomy and nonlinear RF measurements. Nevertheless, unless reliability and assurance are embedded into the architectures of such structures, the suboptimal (and thus erroneous) results could undermine the objectives of such applications. As such, in this thesis, we present schemes to provide complex number division architectures based on (Sweeney, Robertson, and Tocher) SRT-division with fault diagnosis mechanisms. Different fault resilient architectures are proposed in this thesis which can be tailored based on the eventual objectives of the designs in terms of area and time requirements, among which we pinpoint carefully the schemes based on recomputing with shifted operands (RESO) to be able to detect both natural and malicious faults and with proper modification achieve high throughputs. The design also implements a minimized look up table approach which favors in error detection based designs and provides high fault coverage with relatively-low overhead. Additionally, to benchmark the effectiveness of the proposed schemes, extensive fault diagnosis assessments are performed for the proposed designs through fault simulations and FPGA implementations; the design is implemented on Xilinx Spartan-VI and Xilinx Virtex-VI FPGA families

    Fault-Resilient Lightweight Cryptographic Block Ciphers for Secure Embedded Systems

    Get PDF
    The development of extremely-constrained environments having sensitive nodes such as RFID tags and nano-sensors necessitates the use of lightweight block ciphers. Indeed, lightweight block ciphers are essential for providing low-cost confidentiality to such applications. Nevertheless, providing the required security properties does not guarantee their reliability and hardware assurance when the architectures are prone to natural and malicious faults. In this thesis, considering false-alarm resistivity, error detection schemes for the lightweight block ciphers are proposed with the case study of XTEA (eXtended TEA). We note that lightweight block ciphers might be better suited for low-resource environments compared to the Advanced Encryption Standard, providing low complexity and power consumption. To the best of the author\u27s knowledge, there has been no error detection scheme presented in the literature for the XTEA to date. Three different error detection approaches are presented and according to our fault-injection simulations for benchmarking the effectiveness of the proposed schemes, high error coverage is derived. Finally, field-programmable gate array (FPGA) implementations of these proposed error detection structures are presented to assess their efficiency and overhead. The proposed error detection architectures are capable of increasing the reliability of the implementations of this lightweight block cipher. The schemes presented can also be applied to lightweight hash functions with similar structures, making the presented schemes suitable for providing reliability to their lightweight security-constrained hardware implementations

    Fault-Tolerant FPGA-Based Systems

    Get PDF
    This paper presents a new approach to on-line fault tolerance via reconfiguration for the systems mapped onto field programmable gate arrays (FPGAs). The fault detection, based on self-checking technique, is introduced at application level; therefore our approach can detect the faults of configurable logic blocks (CLBs) and routing interconnections in the FPGAs concurrently with the normal system work. A grid of tiles is projected on the FPGA structure and a certain number of spare CLBs is reserved inside every tile. The number of spare CLBs per tile, which will be used as a backup upon detecting any faulty CLB, is estimated in accordance with the probability of failure. After locating the faulty CLBs, the faulty tile will be reconfigured with avoiding the faulty CLBs. Our proposed approach uses a combination of hardware and software redundancy. We assume that a module external to the FPGA controls automatically the reconfiguration process in addition to the diagnosis process (DIRC); typically this is an embedded microprocessor having some storage for the various tile configurations. We have implemented our approach using Xilinx Virtex FPGA. The DIRC code is written in JBits software tools. In response to a component failure this approach capitalizes on the unique reconfiguration capabilities of FPGAs and replaces the affected tile with a functionally equivalent one that does not rely on the faulty component. Unlike fixed structure fault-tolerance techniques for ASICs and microprocessors, this approach allows a single physical component to provide redundant backup for several types of components

    Low-overhead fault-tolerant logic for field-programmable gate arrays

    Get PDF
    While allowing for the fabrication of increasingly complex and efficient circuitry, transistor shrinkage and count-per-device expansion have major downsides: chiefly increased variation, degradation and fault susceptibility. For this reason, design-time consideration of faults will have to be given to increasing numbers of electronic systems in the future to ensure yields, reliabilities and lifetimes remain acceptably high. Many mathematical operators commonly accelerated in hardware are suited to modification resulting in datapath error detection and correction capabilities with far lower area, performance and/or power consumption overheads than those incurred through the utilisation of more established, general-purpose fault tolerance methods such as modular redundancy. Field-programmable gate arrays are uniquely placed to allow further area savings to be made thanks to their dynamic reconfigurability. The majority of the technical work presented within this thesis is based upon a benchmark hardware accelerator---a matrix multiplier---that underwent several evolutions in order to detect and correct faults manifesting along its datapath at runtime. In the first instance, fault detectability in excess of 99% was achieved in return for 7.87% additional area and 45.5% extra latency. In the second, the ability to correct errors caused by those faults was added at the cost of 4.20% more area, while 50.7% of this---and 46.2% of the previously incurred latency overhead---was removed through the introduction of partial reconfiguration in the third. The fourth demonstrates further reductions in both area and performance overheads---of 16.7% and 8.27%, respectively---through systematic data width reduction by allowing errors of less than ±0.5% of the maximum output value to propagate.Open Acces

    Sustainable Fault-handling Of Reconfigurable Logic Using Throughput-driven Assessment

    Get PDF
    A sustainable Evolvable Hardware (EH) system is developed for SRAM-based reconfigurable Field Programmable Gate Arrays (FPGAs) using outlier detection and group testing-based assessment principles. The fault diagnosis methods presented herein leverage throughput-driven, relative fitness assessment to maintain resource viability autonomously. Group testing-based techniques are developed for adaptive input-driven fault isolation in FPGAs, without the need for exhaustive testing or coding-based evaluation. The techniques maintain the device operational, and when possible generate validated outputs throughout the repair process. Adaptive fault isolation methods based on discrepancy-enabled pair-wise comparisons are developed. By observing the discrepancy characteristics of multiple Concurrent Error Detection (CED) configurations, a method for robust detection of faults is developed based on pairwise parallel evaluation using Discrepancy Mirror logic. The results from the analytical FPGA model are demonstrated via a self-healing, self-organizing evolvable hardware system. Reconfigurability of the SRAM-based FPGA is leveraged to identify logic resource faults which are successively excluded by group testing using alternate device configurations. This simplifies the system architect\u27s role to definition of functionality using a high-level Hardware Description Language (HDL) and system-level performance versus availability operating point. System availability, throughput, and mean time to isolate faults are monitored and maintained using an Observer-Controller model. Results are demonstrated using a Data Encryption Standard (DES) core that occupies approximately 305 FPGA slices on a Xilinx Virtex-II Pro FPGA. With a single simulated stuck-at-fault, the system identifies a completely validated replacement configuration within three to five positive tests. The approach demonstrates a readily-implemented yet robust organic hardware application framework featuring a high degree of autonomous self-control

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Characterisation and mitigation of long-term degradation effects in programmable logic

    No full text
    Reliability has always been an issue in silicon device engineering, but until now it has been managed by the carefully tuned fabrication process. In the future the underlying physical limitations of silicon-based electronics, plus the practical challenges of manufacturing with such complexity at such a small scale, will lead to a crunch point where transistor-level reliability must be forfeited to continue achieving better productivity. Field-programmable gate arrays (FPGAs) are built on state-of-the-art silicon processes, but it has been recognised for some time that their distinctive characteristics put them in a favourable position over application-specific integrated circuits in the face of the reliability challenge. The literature shows how a regular structure, interchangeable resources and an ability to reconfigure can all be exploited to detect, locate, and overcome degradation and keep an FPGA application running. To fully exploit these characteristics, a better understanding is needed of the behavioural changes that are seen in the resources that make up an FPGA under ageing. Modelling is an attractive approach to this and in this thesis the causes and effects are explored of three important degradation mechanisms. All are shown to have an adverse affect on FPGA operation, but their characteristics show novel opportunities for ageing mitigation. Any modelling exercise is built on assumptions and so an empirical method is developed for investigating ageing on hardware with an accelerated-life test. Here, experiments show that timing degradation due to negative-bias temperature instability is the dominant process in the technology considered. Building on simulated and experimental results, this work also demonstrates a variety of methods for increasing the lifetime of FPGA lookup tables. The pre-emptive measure of wear-levelling is investigated in particular detail, and it is shown by experiment how di fferent reconfiguration algorithms can result in a significant reduction to the rate of degradation
    corecore