478 research outputs found

    Design of CMOS PSCD circuits and checkers for stuck-at and stuck-on faults

    Get PDF
    [[abstract]]We present in this paper an approach to designing partially strongly code-disjoint (PSCD) CMOS circuits and checkers, considering transistor stuck-on faults in addition to gate-level stuck-at faults. Our design-for-testability (DFT) technique requires only a small number of extra transistors for monitoring abnormal static currents, coupled with a simple clocking scheme, to detect the stuck-on faults concurrently. The DFT circuitry not only can detect the faults in the functional circuit but also can detect or tolerate faults in itself, making it a good candidate for checker design. Switch and circuit level simulations were performed on a sample circuit, and a sample 4-out-of-8 code checker chip using the proposed technique has been designed, fabricated, and tested, showing the correctness of the method. Performance penalty is reduced by a novel BiCMOS checker circuit.[[fileno]]2030108010057[[department]]電機工程學

    TFI-FTS: An efficient transient fault injection and fault-tolerant system for asynchronous circuits on FPGA platform

    Get PDF
    Designing VLSI digital circuits is challenging tasks because of testing the circuits concerning design time. The reliability and productivity of digital integrated circuits are primarily affected by the defects in the manufacturing process or systems. If the defects are more in the systems, which leads the fault in the systems. The fault tolerant systems are necessary to overcome the faults in the VLSI digital circuits. In this research article, an asynchronous circuits based an effective transient fault injection (TFI) and fault tolerant system (FTS) are modelled. The TFI system generates the faults based on BMA based LFSR with faulty logic insertion and one hot encoded register. The BMA based LFSR reduces the hardware complexity with less power consumption on-chip than standard LFSR method. The FTS uses triple mode redundancy (TMR) based majority voter logic (MVL) to tolerant the faults for asynchronous circuits. The benchmarked 74X-series circuits are considered as an asynchronous circuit for TMR logic. The TFI-FTS module is modeled using Verilog-HDL on Xilinx-ISE and synthesized on hardware platform. The Performance parameters are tabulated for TFI-FTS based asynchronous circuits. The performance of TFI-FTS Module is analyzed with 100% fault coverage. The fault coverage is validated using functional simulation of each asynchronous circuit with fault injection in TFI-FTS Module

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    Reliable Hardware Architectures of CORDIC Algorithm with Fixed Angle of Rotations

    Get PDF
    Fixed-angle rotation operation of vectors is widely used in signal processing, graphics, and robotics. Various optimized coordinate rotation digital computer (CORDIC) designs have been proposed for uniform rotation of vectors through known and specified angles. Nevertheless, in the presence of faults, such hardware architectures are potentially vulnerable. In this thesis, we propose efficient error detection schemes for two fixed-angle rotation designs, i.e., the Interleaved Scaling and Cascaded Single-rotation CORDIC. To the best of our knowledge, this work is the first in providing reliable architectures for these variants of CORDIC. The former is suitable for low-area applications and, hence, we propose recomputing with encoded operands schemes which add negligible area overhead to the designs. Moreover, the proposed error detection schemes for the latter variant are optimized for efficient applications which hamper the performance of the architectures negligibly. We present three variants of recomputing with encoded operands to detect both transient and permanent faults, coupled with signature-based schemes. The overheads of the proposed designs are assessed through Xilinx FPGA implementations and their effectiveness is benchmarked through error simulations. The results give confidence for the proposed efficient architectures which can be tailored based on the reliability requirements and the overhead to be tolerated

    Design of Soft Error Robust High Speed 64-bit Logarithmic Adder

    Get PDF
    Continuous scaling of the transistor size and reduction of the operating voltage have led to a significant performance improvement of integrated circuits. However, the vulnerability of the scaled circuits to transient data upsets or soft errors, which are caused by alpha particles and cosmic neutrons, has emerged as a major reliability concern. In this thesis, we have investigated the effects of soft errors in combinational circuits and proposed soft error detection techniques for high speed adders. In particular, we have proposed an area-efficient 64-bit soft error robust logarithmic adder (SRA). The adder employs the carry merge Sklansky adder architecture in which carries are generated every 4 bits. Since the particle-induced transient, which is often referred to as a single event transient (SET) typically lasts for 100~200 ps, the adder uses time redundancy by sampling the sum outputs twice. The sampling instances have been set at 110 ps apart. In contrast to the traditional time redundancy, which requires two clock cycles to generate a given output, the SRA generates an output in a single clock cycle. The sampled sum outputs are compared using a 64-bit XOR tree to detect any possible error. An energy efficient 4-input transmission gate based XOR logic is implemented to reduce the delay and the power in this case. The pseudo-static logic (PSL), which has the ability to recover from a particle induced transient, is used in the adder implementation. In comparison with the space redundant approach which requires hardware duplication for error detection, the SRA is 50% more area efficient. The proposed SRA is simulated for different operands with errors inserted at different nodes at the inputs, the carry merge tree, and the sum generation circuit. The simulation vectors are carefully chosen such that the SET is not masked by error masking mechanisms, which are inherently present in combinational circuits. Simulation results show that the proposed SRA is capable of detecting 77% of the errors. The undetected errors primarily result when the SET causes an even number of errors and when errors occur outside the sampling window

    Synthesis Of Self-resetting Stage Logic Pipelines

    Get PDF
    As designers began to pack multi-million transistors onto a single chip, their reliance on a global clocking signal to orchestrate the operations of the chip has started to face almost insurmountable difficulties. As a result, designers started to explore clockless circuits to avoid the global clocking problem. Recently, self-resetting circuits implemented in dynamic logic families have been proposed as viable clockless alternatives. While these circuits can produce excellent performances, they display serious limitations in terms of area cost and power consumption. A middle-of-the-road alternative, which can provide a good performance and avoid the limitations seen in dynamic self-resetting circuits, would be to implement self-resetting behavior in static circuits. This alternative has been introduced recently as Self-Resetting Stage Logic and used to propose three types of clockless pipelines. Experimental studies show that these pipelines have the potential to produce high throughputs with a minimum area overhead if a suitable synthesis methodology is available. This thesis proposes a novel synthesis methodology to design and verify clockless pipelines implemented in SRSL by taking advantage of the maturity of current CAD tools. This methodology formulates the synthesis problem as a combinatorial analytical problem for which a run-time efficient exact solution is difficult to derive. Consequently, a two-phase algorithm is proposed to synthesize these pipelines from gate netlists subject to user-specified constraints. The first phase is a heuristic based on the as-soon-as-possible scheduling strategy in which each gate of the netlist is assigned to a single pipeline stage without violating the period constraint of each pipeline stage. On the other hand, the second phase consists of a heuristic, based on the Kernighan-Lin partitioning strategy, to minimize the number of nets crossing each pair of adjacent pipeline stages. The objective of this optimization is to reduce the number of latches separating pipeline stages since these latches tend to occupy large areas. Experiments conducted on a prototype of the synthesis algorithm reveal that these self-resetting stage logic pipelines can easily reach throughputs higher than 1 GHz. Furthermore, these experiments reveal that the area overhead needed to implement the self-resetting circuitry of these pipelines can be easily amortized over the area of the logic embedded in the pipeline stages. In the overall, the synthesis methods developed for SRSL produce low area overhead pipelines for wide and deep gate netlists while it tends to produce high throughput pipelines for wide and shallow gate netlists. This shows that these pipelines are mostly suitable for coarse-grain datapaths

    The Automatic Synthesis of Fault Tolerant and Fault Secure VLSI Systems

    Get PDF
    This thesis investigates the design of fault tolerant and fault secure (FTFS) systems within the framework of silicon compilation. Automatic design modification is used to introduce FTFS characteristics into a design. A taxonomy of FTFS techniques is introduced and is used to identify a number of features which an "automatic design for FTFS" system should exhibit. A silicon compilation system, Chip Churn 2 (CC2), has been implemented and has been used to demonstrate the feasibility of automatic design of FTFS systems. The CC2 system provides a design language, simulation facilities and a back-end able to produce CMOS VLSI designs. A number of FTFS design methods have been implemented within the CC2 environment; these methods range from triple modular redundancy to concurrent parity code checking. The FTFS design methods can be applied automatically to general designs in order to realise them as FTFS systems. A number of example designs are presented; these are used to illustrate the FTFS modification techniques which have been implemented. Area results for CMOS devices are presented; this allows the modification methods to be compared. A number of problems arising from the methods are highlighted and some solutions suggested

    Automatic test pattern generation for asynchronous circuits

    Get PDF
    The testability of integrated circuits becomes worse with transistor dimensions reaching nanometer scales. Testing, the process of ensuring that circuits are fabricated without defects, becomes inevitably part of the design process; a technique called design for test (DFT). Asynchronous circuits have a number of desirable properties making them suitable for the challenges posed by modern technologies, but are severely limited by the unavailability of EDA tools for DFT and automatic test-pattern generation (ATPG). This thesis is motivated towards developing test generation methodologies for asynchronous circuits. In total four methods were developed which are aimed at two different fault models: stuck-at faults at the basic logic gate level and transistor-level faults. The methods were evaluated using a set of benchmark circuits and compared favorably to previously published work. First, ABALLAST is a partial-scan DFT method adapting the well-known BALLAST technique for asynchronous circuits where balanced structures are used to guide the selection of the state-holding elements that will be scanned. The test inputs are automatically provided by a novel test pattern generator, which uses time frame unrolling to deal with the remaining, non-scanned sequential C-elements. The second method, called AGLOB, uses algorithms from strongly-connected components in graph graph theory as a method for finding the optimal position of breaking the loops in the asynchronous circuit and adding scan registers. The corresponding ATPG method converts cyclic circuits into acyclic for which standard tools can provide test patterns. These patterns are then automatically converted for use in the original cyclic circuits. The third method, ASCP, employs a new cycle enumeration method to find the loops present in a circuit. Enumerated cycles are then processed using an efficient set covering heuristic to select the scan elements for the circuit to be tested.Applying these methods to the benchmark circuits shows an improvement in fault coverage compared to previous work, which, for some circuits, was substantial. As no single method consistently outperforms the others in all benchmarks, they are all valuable as a designer’s suite of tools for testing. Moreover, since they are all scan-based, they are compatible and thus can be simultaneously used in different parts of a larger circuit. In the final method, ATRANTE, the main motivation of developing ATPG is supplemented by transistor level test generation. It is developed for asynchronous circuits designed using a State Transition Graph (STG) as their specification. The transistor-level circuit faults are efficiently mapped onto faults that modify the original STG. For each potential STG fault, the ATPG tool provides a sequence of test vectors that expose the difference in behavior to the output ports. The fault coverage obtained was 52-72 % higher than the coverage obtained using the gate level tests. Overall, four different design for test (DFT) methods for automatic test pattern generation (ATPG) for asynchronous circuits at both gate and transistor level were introduced in this thesis. A circuit extraction method for representing the asynchronous circuits at a higher level of abstraction was also implemented. Developing new methods for the test generation of asynchronous circuits in this thesis facilitates the test generation for asynchronous designs using the CAD tools available for testing the synchronous designs. Lessons learned and the research questions raised due to this work will impact the future work to probe the possibilities of developing robust CAD tools for testing the future asynchronous designs

    Null convention logic circuits for asynchronous computer architecture

    Get PDF
    For most of its history, computer architecture has been able to benefit from a rapid scaling in semiconductor technology, resulting in continuous improvements to CPU design. During that period, synchronous logic has dominated because of its inherent ease of design and abundant tools. However, with the scaling of semiconductor processes into deep sub-micron and then to nano-scale dimensions, computer architecture is hitting a number of roadblocks such as high power and increased process variability. Asynchronous techniques can potentially offer many advantages compared to conventional synchronous design, including average case vs. worse case performance, robustness in the face of process and operating point variability and the ready availability of high performance, fine grained pipeline architectures. Of the many alternative approaches to asynchronous design, Null Convention Logic (NCL) has the advantage that its quasi delay-insensitive behavior makes it relatively easy to set up complex circuits without the need for exhaustive timing analysis. This thesis examines the characteristics of an NCL based asynchronous RISC-V CPU and analyses the problems with applying NCL to CPU design. While a number of university and industry groups have previously developed small 8-bit microprocessor architectures using NCL techniques, it is still unclear whether these offer any real advantages over conventional synchronous design. A key objective of this work has been to analyse the impact of larger word widths and more complex architectures on NCL CPU implementations. The research commenced by re-evaluating existing techniques for implementing NCL on programmable devices such as FPGAs. The little work that has been undertaken previously on FPGA implementations of asynchronous logic has been inconclusive and seems to indicate that asynchronous systems cannot be easily implemented in these devices. However, most of this work related to an alternative technique called bundled data, which is not well suited to FPGA implementation because of the difficulty in controlling and matching delays in a 'bundle' of signals. On the other hand, this thesis clearly shows that such applications are not only possible with NCL, but there are some distinct advantages in being able to prototype complex asynchronous systems in a field-programmable technology such as the FPGA. A large part of the value of NCL derives from its architectural level behavior, inherent pipelining, and optimization opportunities such as the merging of register and combina- tional logic functions. In this work, a number of NCL multiplier architectures have been analyzed to reveal the performance trade-offs between various non-pipelined, 1D and 2D organizations. Two-dimensional pipelining can easily be applied to regular architectures such as array multipliers in a way that is both high performance and area-efficient. It was found that the performance of 2D pipelining for small networks such as multipliers is around 260% faster than the equivalent non-pipelined design. However, the design uses 265% more transistors so the methodology is mainly of benefit where performance is strongly favored over area. A pipelined 32bit x 32bit signed Baugh-Wooley multiplier with Wallace-Tree Carry Save Adders (CSA), which is representative of a real design used for CPUs and DSPs, was used to further explore this concept as it is faster and has fewer pipeline stages compared to the normal array multiplier using Ripple-Carry adders (RCA). It was found that 1D pipelining with ripple-carry chains is an efficient implementation option but becomes less so for larger multipliers, due to the completion logic for which the delay time depends largely on the number of bits involved in the completion network. The average-case performance of ripple-carry adders was explored using random input vectors and it was observed that it offers little advantage on the smaller multiplier blocks, but this particular timing characteristic of asynchronous design styles be- comes increasingly more important as word size grows. Finally, this research has resulted in the development of the first 32-Bit asynchronous RISC-V CPU core. Called the Redback RISC, the architecture is a structure of pipeline rings composed of computational oscillations linked with flow completeness relationships. It has been written using NELL, a commercial description/synthesis tool that outputs standard Verilog. The Redback has been analysed and compared to two approximately equivalent industry standard 32-Bit synchronous RISC-V cores (PicoRV32 and Rocket) that are already fabricated and used in industry. While the NCL implementation is larger than both commercial cores it has similar performance and lower power compared to the PicoRV32. The implementation results were also compared against an existing NCL design tool flow (UNCLE), which showed how much the results of these implementation strategies differ. The Redback RISC has achieved similar level of throughput and 43% better power and 34% better energy compared to one of the synchronous cores with the same benchmark test and test condition such as input sup- ply voltage. However, it was shown that area is the biggest drawback for NCL CPU design. The core is roughly 2.5× larger than synchronous designs. On the other hand its area is still 2.9× smaller than previous designs using UNCLE tools. The area penalty is largely due to the unavoidable translation into a dual-rail topology when using the standard NCL cell library

    NASA Space Engineering Research Center Symposium on VLSI Design

    Get PDF
    The NASA Space Engineering Research Center (SERC) is proud to offer, at its second symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories and the electronics industry. These featured speakers share insights into next generation advances that will serve as a basis for future VLSI design. Questions of reliability in the space environment along with new directions in CAD and design are addressed by the featured speakers
    corecore