1,750 research outputs found

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Reducing Library Characterization Time for Cell-aware Test while Maintaining Test Quality

    Get PDF
    Cell-aware test (CAT) explicitly targets faults caused by defects inside library cells to improve test quality, compared with conventional automatic test pattern generation (ATPG) approaches, which target faults only at the boundaries of library cells. The CAT methodology consists of two stages. Stage 1, based on dedicated analog simulation, library characterization per cell identifies which cell-level test pattern detects which cell-internal defect; this detection information is encoded in a defect detection matrix (DDM). In Stage 2, with the DDMs as inputs, cell-aware ATPG generates chip-level test patterns per circuit design that is build up of interconnected instances of library cells. This paper focuses on Stage 1, library characterization, as both test quality and cost are determined by the set of cell-internal defects identified and simulated in the CAT tool flow. With the aim to achieve the best test quality, we first propose an approach to identify a comprehensive set, referred to as full set, of potential open- and short-defect locations based on cell layout. However, the full set of defects can be large even for a single cell, making the time cost of the defect simulation in Stage 1 unaffordable. Subsequently, to reduce the simulation time, we collapse the full set to a compact set of defects which serves as input of the defect simulation. The full set is stored for the diagnosis and failure analysis. With inspecting the simulation results, we propose a method to verify the test quality based on the compact set of defects and, if necessary, to compensate the test quality to the same level as that based on the full set of defects. For 351 combinational library cells in Cadence’s GPDK045 45nm library, we simulate only 5.4% defects from the full set to achieve the same test quality based on the full set of defects. In total, the simulation time, via linear extrapolation per cell, would be reduced by 96.4% compared with the time based on the full set of defects

    Investigation into yield and reliability enhancement of TSV-based three-dimensional integration circuits

    No full text
    Three dimensional integrated circuits (3D ICs) have been acknowledged as a promising technology to overcome the interconnect delay bottleneck brought by continuous CMOS scaling. Recent research shows that through-silicon-vias (TSVs), which act as vertical links between layers, pose yield and reliability challenges for 3D design. This thesis presents three original contributions.The first contribution presents a grouping-based technique to improve the yield of 3D ICs under manufacturing TSV defects, where regular and redundant TSVs are partitioned into groups. In each group, signals can select good TSVs using rerouting multiplexers avoiding defective TSVs. Grouping ratio (regular to redundant TSVs in one group) has an impact on yield and hardware overhead. Mathematical probabilistic models are presented for yield analysis under the influence of independent and clustering defect distributions. Simulation results using MATLAB show that for a given number of TSVs and TSV failure rate, careful selection of grouping ratio results in achieving 100% yield at minimal hardware cost (number of multiplexers and redundant TSVs) in comparison to a design that does not exploit TSV grouping ratios. The second contribution presents an efficient online fault tolerance technique based on redundant TSVs, to detect TSV manufacturing defects and address thermal-induced reliability issue. The proposed technique accounts for both fault detection and recovery in the presence of three TSV defects: voids, delamination between TSV and landing pad, and TSV short-to-substrate. Simulations using HSPICE and ModelSim are carried out to validate fault detection and recovery. Results show that regular and redundant TSVs can be divided into groups to minimise area overhead without affecting the fault tolerance capability of the technique. Synthesis results using 130-nm design library show that 100% repair capability can be achieved with low area overhead (4% for the best case). The last contribution proposes a technique with joint consideration of temperature mitigation and fault tolerance without introducing additional redundant TSVs. This is achieved by reusing spare TSVs that are frequently deployed for improving yield and reliability in 3D ICs. The proposed technique consists of two steps: TSV determination step, which is for achieving optimal partition between regular and spare TSVs into groups; The second step is TSV placement, where temperature mitigation is targeted while optimizing total wirelength and routing difference. Simulation results show that using the proposed technique, 100% repair capability is achieved across all (five) benchmarks with an average temperature reduction of 75.2? (34.1%) (best case is 99.8? (58.5%)), while increasing wirelength by a small amount

    Sustainable Fault-handling Of Reconfigurable Logic Using Throughput-driven Assessment

    Get PDF
    A sustainable Evolvable Hardware (EH) system is developed for SRAM-based reconfigurable Field Programmable Gate Arrays (FPGAs) using outlier detection and group testing-based assessment principles. The fault diagnosis methods presented herein leverage throughput-driven, relative fitness assessment to maintain resource viability autonomously. Group testing-based techniques are developed for adaptive input-driven fault isolation in FPGAs, without the need for exhaustive testing or coding-based evaluation. The techniques maintain the device operational, and when possible generate validated outputs throughout the repair process. Adaptive fault isolation methods based on discrepancy-enabled pair-wise comparisons are developed. By observing the discrepancy characteristics of multiple Concurrent Error Detection (CED) configurations, a method for robust detection of faults is developed based on pairwise parallel evaluation using Discrepancy Mirror logic. The results from the analytical FPGA model are demonstrated via a self-healing, self-organizing evolvable hardware system. Reconfigurability of the SRAM-based FPGA is leveraged to identify logic resource faults which are successively excluded by group testing using alternate device configurations. This simplifies the system architect\u27s role to definition of functionality using a high-level Hardware Description Language (HDL) and system-level performance versus availability operating point. System availability, throughput, and mean time to isolate faults are monitored and maintained using an Observer-Controller model. Results are demonstrated using a Data Encryption Standard (DES) core that occupies approximately 305 FPGA slices on a Xilinx Virtex-II Pro FPGA. With a single simulated stuck-at-fault, the system identifies a completely validated replacement configuration within three to five positive tests. The approach demonstrates a readily-implemented yet robust organic hardware application framework featuring a high degree of autonomous self-control

    Investigation into voltage and process variation-aware manufacturing test

    No full text
    Increasing integration and complexity in IC design provides challenges for manufacturing testing. This thesis studies how process and supply voltage variation influence defect behaviour to determine the impact on manufacturing test cost and quality. The focus is on logic testing of static CMOS designs with respect to two important defect types in deep submicron CMOS: resistive bridges and full opens. The first part of the thesis addresses testing for resistive bridge defects in designs with multiple supply voltage settings. To enable analysis, a fault simulator is developed using a supply voltage-aware model for bridge defect behaviour. The analysis shows that for high defect coverage it is necessary to perform test for more than one supply voltage setting, due to supply voltage-dependent behaviour. A low-cost and effective test method is presented consisting of multi-voltage test generation that achieves high defect coverage and test set size reduction without compromise to defect coverage. Experiments on synthesised benchmarks with realistic bridge locations validate the proposed method.The second part focuses on the behaviour of full open defects under supply voltage variation. The aim is to determine the appropriate value of supply voltage to use when testing. Two models are considered for the behaviour of full open defects with and without gate tunnelling leakage influence. Analysis of the supply voltage-dependent behaviour of full open defects is performed to determine if it is required to test using more than one supply voltage to detect all full open defects. Experiments on synthesised benchmarks using an extended version of the fault simulator tool mentioned above, measure the quantitative impact of supply voltage variation on defect coverage.The final part studies the impact of process variation on the behaviour of bridge defects. Detailed analysis using synthesised ISCAS benchmarks and realistic bridge model shows that process variation leads to additional faults. If process variation is not considered in test generation, the test will fail to detect some of these faults, which leads to test escapes. A novel metric to quantify the impact of process variation on test quality is employed in the development of a new test generation tool, which achieves high bridge defect coverage. The method achieves a user-specified test quality with test sets which are smaller than test sets generated without consideration of process variation

    Développement des techniques de test et de diagnostic pour les FPGA hiérarchique de type mesh

    Get PDF
    The evolution trend of shrinking feature size and increasing complexity in modern electronics is being slowed down due to physical limits that generate numerous imperfections and defects during fabrication steps or projected life time of the chip. Field Programmable Gate Arrays (FPGAs) are used in complex digital systems mainly due to their reconfigurability and shorter time-to-market. To maintain a high reliability of such systems, FPGAs should be tested thoroughly for defects. FPGA architecture optimization for area saving and better signal routability is an ongoing process which directly impacts the overall FPGA testability, hence the reliability. This thesis presents a complete strategy for test and diagnosis of manufacturing defects in mesh-based FPGAs containing a novel multilevel interconnects topology which promises to provide better area and routability. Efficiency of the proposed test schemes is analyzed in terms of test cost, respective fault coverage and diagnostic resolution.L’évolution tendant à réduire la taille et augmenter la complexité des circuits électroniques modernes, est en train de ralentir du fait des limitations technologiques, qui génèrent beaucoup de d’imperfections et de defaults durant la fabrication ou la durée de vie de la puce. Les FPGAs sont utilisés dans les systèmes numériques complexes, essentiellement parce qu’ils sont reconfigurables et rapide à commercialiser. Pour garder une grande fiabilité de tels systèmes, les FPGAs doivent être testés minutieusement pour les defaults. L’optimisation de l’architecture des FPGAs pour l’économie de surface et une meilleure routabilité est un processus continue qui impacte directement la testabilité globale et de ce fait, la fiabilité. Cette thèse présente une stratégie complète pour le test et le diagnostique des defaults de fabrication des “mesh-based FPGA” contenant une nouvelle topologie d’interconnections à plusieurs niveaux, ce qui promet d’apporter une meilleure routabilité. Efficacité des schémas proposes est analysée en termes de temps de test, couverture de faute et résolution de diagnostique

    Optimization of Cell-Aware Test

    Get PDF

    Optimization of Cell-Aware Test

    Get PDF
    • …
    corecore