502 research outputs found

    An On-line BIST RAM Architecture with Self Repair Capabilities

    Get PDF
    The emerging field of self-repair computing is expected to have a major impact on deployable systems for space missions and defense applications, where high reliability, availability, and serviceability are needed. In this context, RAM (random access memories) are among the most critical components. This paper proposes a built-in self-repair (BISR) approach for RAM cores. The proposed design, introducing minimal and technology-dependent overheads, can detect and repair a wide range of memory faults including: stuck-at, coupling, and address faults. The test and repair capabilities are used on-line, and are completely transparent to the external user, who can use the memory without any change in the memory-access protocol. Using a fault-injection environment that can emulate the occurrence of faults inside the module, the effectiveness of the proposed architecture in terms of both fault detection and repairing capability was verified. Memories of various sizes have been considered to evaluate the area-overhead introduced by this proposed architectur

    A Review paper on the Memory Built-In Self-Repair with Redundancy Logic

    Full text link
    The Present review paper expresses the word oriented memory test methodology for Built-In Self-Repair (BISR). To replace the defect words few logics are introduced. These logics are memory BIST logic and Wrapper logic. Whenever a test is carries on, the defected words are pointed out by its address only and these addresses are called failing address. The failing addresses are stored in the fuse box. Using fuse box it avoids the classic redundancy concept, where the RAMS has spare rows and columns. After the detection of faulty address, they are stored in redundancy logic. During test and redundancy configuration, the fuse box is connected to a scan registernbsp by this processnbsp inputnbsp and output data can be evaluated

    Interoperability between a dynamic reliability modeling and a Systems Engineering process – Principles and Case Study

    Get PDF
    International audienceIndustrial systems are often large, and complex, in terms of structure, dynamic interactions between subsystems and components, dynamic operational environment, ageing, etc. The dynamic reliability approach is a convenient framework to model the behavior of such systems. However, there is a price to pay, e.g. in terms of amount of data, size of state graphs, volume of reliability calculations, and combination of various engineering activities. A sound Systems Engineering process, benefiting from the improvement of most recent tools, may be a fruitful approach to decrease these difficulties. Although feasibility demonstrations have been done for conventional, static, approaches of dependability, interoperability between dynamic reliability modeling and Systems Engineering has not the same maturity level. The article explains how, on the basis of Systems Engineering (SE) process definitions, a Meta-model defines a framework for integrating the safety into SE processes. It supports a "hub automaton", that is the key element for interoperability with the tools and activities required for a dynamic reliability assessment. The case study is the dynamic assessment of availability of a feed-water control system in a power plant steam generator, presented in previous articles

    Memory built-in self-repair and correction for improving yield: a review

    Get PDF
    Nanometer memories are highly prone to defects due to dense structure, necessitating memory built-in self-repair as a must-have feature to improve yield. Today’s system-on-chips contain memories occupying an area as high as 90% of the chip area. Shrinking technology uses stricter design rules for memories, making them more prone to manufacturing defects. Further, using 3D-stacked memories makes the system vulnerable to newer defects such as those coming from through-silicon-vias (TSV) and micro bumps. The increased memory size is also resulting in an increase in soft errors during system operation. Multiple memory repair techniques based on redundancy and correction codes have been presented to recover from such defects and prevent system failures. This paper reviews recently published memory repair methodologies, including various built-in self-repair (BISR) architectures, repair analysis algorithms, in-system repair, and soft repair handling using error correcting codes (ECC). It provides a classification of these techniques based on method and usage. Finally, it reviews evaluation methods used to determine the effectiveness of the repair algorithms. The paper aims to present a survey of these methodologies and prepare a platform for developing repair methods for upcoming-generation memories

    Fault Tree Analysis: a survey of the state-of-the-art in modeling, analysis and tools

    Get PDF
    Fault tree analysis (FTA) is a very prominent method to analyze the risks related to safety and economically critical assets, like power plants, airplanes, data centers and web shops. FTA methods comprise of a wide variety of modelling and analysis techniques, supported by a wide range of software tools. This paper surveys over 150 papers on fault tree analysis, providing an in-depth overview of the state-of-the-art in FTA. Concretely, we review standard fault trees, as well as extensions such as dynamic FT, repairable FT, and extended FT. For these models, we review both qualitative analysis methods, like cut sets and common cause failures, and quantitative techniques, including a wide variety of stochastic methods to compute failure probabilities. Numerous examples illustrate the various approaches, and tables present a quick overview of results

    Sustainable Fault-handling Of Reconfigurable Logic Using Throughput-driven Assessment

    Get PDF
    A sustainable Evolvable Hardware (EH) system is developed for SRAM-based reconfigurable Field Programmable Gate Arrays (FPGAs) using outlier detection and group testing-based assessment principles. The fault diagnosis methods presented herein leverage throughput-driven, relative fitness assessment to maintain resource viability autonomously. Group testing-based techniques are developed for adaptive input-driven fault isolation in FPGAs, without the need for exhaustive testing or coding-based evaluation. The techniques maintain the device operational, and when possible generate validated outputs throughout the repair process. Adaptive fault isolation methods based on discrepancy-enabled pair-wise comparisons are developed. By observing the discrepancy characteristics of multiple Concurrent Error Detection (CED) configurations, a method for robust detection of faults is developed based on pairwise parallel evaluation using Discrepancy Mirror logic. The results from the analytical FPGA model are demonstrated via a self-healing, self-organizing evolvable hardware system. Reconfigurability of the SRAM-based FPGA is leveraged to identify logic resource faults which are successively excluded by group testing using alternate device configurations. This simplifies the system architect\u27s role to definition of functionality using a high-level Hardware Description Language (HDL) and system-level performance versus availability operating point. System availability, throughput, and mean time to isolate faults are monitored and maintained using an Observer-Controller model. Results are demonstrated using a Data Encryption Standard (DES) core that occupies approximately 305 FPGA slices on a Xilinx Virtex-II Pro FPGA. With a single simulated stuck-at-fault, the system identifies a completely validated replacement configuration within three to five positive tests. The approach demonstrates a readily-implemented yet robust organic hardware application framework featuring a high degree of autonomous self-control

    Low-overhead fault-tolerant logic for field-programmable gate arrays

    Get PDF
    While allowing for the fabrication of increasingly complex and efficient circuitry, transistor shrinkage and count-per-device expansion have major downsides: chiefly increased variation, degradation and fault susceptibility. For this reason, design-time consideration of faults will have to be given to increasing numbers of electronic systems in the future to ensure yields, reliabilities and lifetimes remain acceptably high. Many mathematical operators commonly accelerated in hardware are suited to modification resulting in datapath error detection and correction capabilities with far lower area, performance and/or power consumption overheads than those incurred through the utilisation of more established, general-purpose fault tolerance methods such as modular redundancy. Field-programmable gate arrays are uniquely placed to allow further area savings to be made thanks to their dynamic reconfigurability. The majority of the technical work presented within this thesis is based upon a benchmark hardware accelerator---a matrix multiplier---that underwent several evolutions in order to detect and correct faults manifesting along its datapath at runtime. In the first instance, fault detectability in excess of 99% was achieved in return for 7.87% additional area and 45.5% extra latency. In the second, the ability to correct errors caused by those faults was added at the cost of 4.20% more area, while 50.7% of this---and 46.2% of the previously incurred latency overhead---was removed through the introduction of partial reconfiguration in the third. The fourth demonstrates further reductions in both area and performance overheads---of 16.7% and 8.27%, respectively---through systematic data width reduction by allowing errors of less than ±0.5% of the maximum output value to propagate.Open Acces

    Supporting group maintenance through prognostics-enhanced dynamic dependability prediction

    Get PDF
    Condition-based maintenance strategies adapt maintenance planning through the integration of online condition monitoring of assets. The accuracy and cost-effectiveness of these strategies can be improved by integrating prognostics predictions and grouping maintenance actions respectively. In complex industrial systems, however, effective condition-based maintenance is intricate. Such systems are comprised of repairable assets which can fail in different ways, with various effects, and typically governed by dynamics which include time-dependent and conditional events. In this context, system reliability prediction is complex and effective maintenance planning is virtually impossible prior to system deployment and hard even in the case of condition-based maintenance. Addressing these issues, this paper presents an online system maintenance method that takes into account the system dynamics. The method employs an online predictive diagnosis algorithm to distinguish between critical and non-critical assets. A prognostics-updated method for predicting the system health is then employed to yield well-informed, more accurate, condition-based suggestions for the maintenance of critical assets and for the group-based reactive repair of non-critical assets. The cost-effectiveness of the approach is discussed in a case study from the power industry

    Addressing Complexity and Intelligence in Systems Dependability Evaluation

    Get PDF
    Engineering and computing systems are increasingly complex, intelligent, and open adaptive. When it comes to the dependability evaluation of such systems, there are certain challenges posed by the characteristics of “complexity” and “intelligence”. The first aspect of complexity is the dependability modelling of large systems with many interconnected components and dynamic behaviours such as Priority, Sequencing and Repairs. To address this, the thesis proposes a novel hierarchical solution to dynamic fault tree analysis using Semi-Markov Processes. A second aspect of complexity is the environmental conditions that may impact dependability and their modelling. For instance, weather and logistics can influence maintenance actions and hence dependability of an offshore wind farm. The thesis proposes a semi-Markov-based maintenance model called “Butterfly Maintenance Model (BMM)” to model this complexity and accommodate it in dependability evaluation. A third aspect of complexity is the open nature of system of systems like swarms of drones which makes complete design-time dependability analysis infeasible. To address this aspect, the thesis proposes a dynamic dependability evaluation method using Fault Trees and Markov-Models at runtime.The challenge of “intelligence” arises because Machine Learning (ML) components do not exhibit programmed behaviour; their behaviour is learned from data. However, in traditional dependability analysis, systems are assumed to be programmed or designed. When a system has learned from data, then a distributional shift of operational data from training data may cause ML to behave incorrectly, e.g., misclassify objects. To address this, a new approach called SafeML is developed that uses statistical distance measures for monitoring the performance of ML against such distributional shifts. The thesis develops the proposed models, and evaluates them on case studies, highlighting improvements to the state-of-the-art, limitations and future work

    Strategies for Optimising DRAM Repair

    Get PDF
    Dynamic Random Access Memories (DRAM) are large complex devices, prone to defects during manufacture. Yield is improved by the provision of redundant structures used to repair these defects. This redundancy is often implemented by the provision of excess memory capacity and programmable address logic allowing the replacement of faulty cells within the memory array. As the memory capacity of DRAM devices has increased, so has the complexity of their redundant structures, introducing increasingly complex restrictions and interdependencies upon the use of this redundant capacity. Currently redundancy analysis algorithms solving the problem of optimally allocating this redundant capacity must be manually customised for each new device. Compromises made to reduce the complexity, and human error, reduce the efficacy of these algorithms. This thesis develops a methodology for automating the customisation of these redundancy analysis algorithms. Included are: a modelling language describing the redundant structures (including the restrictions and interdependencies placed upon their use), algorithms manipulating this model to generate redundancy analysis algorithms, and methods for translating those algorithms into executable code. Finally these concepts are used to develop a prototype software tool capable of generating redundancy analysis algorithms customised for a specified device
    • …
    corecore