350 research outputs found
Challenge of a multiple-valued technology in recent deep-submicron VLSI
科研費報告書収録論文(課題番号:13558026・基盤研究(B)(2)・13~16/研究代表者:羽生, 貴弘/転送ボトルネックフリー多値ロジックインメモリVLSIの開発と応用
Empirical timing analysis of CPUs and delay fault tolerant design using partial redundancy
The operating clock frequency is determined by the longest signal propagation
delay, setup/hold time, and timing margin. These are becoming less predictable with
the increasing design complexity and process miniaturization. The difficult challenge
is then to ensure that a device operating at its clock frequency is error-free with
quantifiable assurance. Effort at device-level engineering will not suffice for these
circuits exhibiting wide process variation and heightened sensitivities to operating
condition stress. Logic-level redress of this issue is a necessity and we propose a
design-level remedy for this timing-uncertainty problem.
The aim of the design and analysis approaches presented in this dissertation is to
provide framework, SABRE, wherein an increased operating clock frequency can be
achieved. The approach is a combination of analytical modeling, experimental analy-
sis, hardware /time-redundancy design, exception handling and recovery techniques.
Our proposed design replicates only a necessary part of the original circuit to avoid
high hardware overhead as in triple-modular-redundancy (TMR). The timing-critical
combinational circuit is path-wise partitioned into two sections. The combinational
circuits associated with long paths are laid out without any intrusion except for the
fan-out connections from the first section of the circuit to a replicated second section
of the combinational circuit. Thus only the second section of the circuit is replicated.
The signals fanning out from the first section are latches, and thus are far shorter than the paths spanning the entire combinational circuit. The replicated circuit is timed
at a subsequent clock cycle to ascertain relaxed timing paths. This insures that the
likelihood of mistiming due to stress or process variation is eliminated. During the
subsequent clock cycle, the outcome of the two logically identical, yet time-interleaved,
circuit outputs are compared to detect faults. When a fault is detected, the retry sig-
nal is triggered and the dynamic frequency-step-down takes place before a pipe flush,
and retry is issued. The significant timing overhead associated with the retry is offset
by the rarity of the timing violation events. Simulation results on ISCAS Benchmark
circuits show that 10% of clock frequency gain is possible with 10 to 20 % of hardware
overhead of replicated timing-critical circuit
Recommended from our members
Concurrent error detection in 2-D separable linear transform
As process technology continues to scale to smaller geometries and reduces the supply voltage, reliability of the resulting semiconductor becomes a greater concern. The effect of deep submicron noise, soft errors, variation, and aging degradation pose challenges on the functional correctness of VLSI systems and places roadblocks on reductions in scale. On the other side, as computing moves toward mobile, the energy efficiency of digital systems becomes one of the most important design metrics. However, reliability and energy efficiency are contradicting design requirements. Adding a voltage guard band is the most common method to mitigate the reliability impacts in such instances. Low power design technique like voltage over-scaling (VOS) even reduces the power by scaling the supply voltage just before data-dependant timing errors start to appear. Concurrent error detection is the solution to tackle reliability and energy-efficiency in a unified manner. Fault tolerance can be deployed at different design hierarchies. Given its low overhead, algorithm level error detection is an attractive approach. In this work, a generic weighted checksum code based error detection algorithm targeted generic 2-D separable linear transform is proposed. This technique encodes the input array at the 2-D linear trans- formation level, and algorithms are designed to operate on encoded data and produce encoded output data. The proposed error detection technique is a system-level method and therefore can be used in existing hardware or software 2-D linear transformation architectures with low overhead. The mathematic proof of the algorithm is provided within the scope of this dissertation. The checksum weighting vector for several common transforms are derived as examples, error detection cost and algorithm effectiveness are analyzed. In traditional fault tolerance study, the error is often evaluated at the boolean level. Many DSP applications, like 2-D linear transformation used in the multimedia compression system, do not require exactly correct results, but rather that the quality of the output is within the acceptable range. A generic quality aware error detection in the 2-D separable linear transform is proposed by extending the above property and defining the errors at the functional level. As an example, the quality-aware error detection technique is deployed on a low-power wavelet lifting transform architecture in JPEG2000. A low-cost Signal to Noise Ratio (SNR) aware detection logic based on proposed scheme is integrated into the discrete wavelet lifting transform architecture. This detection logic checks whether the image quality degradation caused by voltage over-scaling induced timing errors is acceptable and determines the optimal voltage set point in operating conditions at run time. This novel quality-based error detection approach is significantly different from traditional error detection schemes which look for exact data equivalence. A simulation result for one design shows that the supply voltage can be scaled down to 75% of the nominal voltage in typical process corner without significant image quality degradation, which translates to 9.15mW power consumption (44% power saving).Electrical and Computer Engineerin
Modelling and Test Generation for Crosstalk Faults in DSM Chips
In the era of deep submicron technology (DSM), many System-on-Chip (SoC) applications require the components to be operating at high clock speeds. With the shrinking feature size and ever increasing clock frequencies, the DSM technology has led to a well-known problem of Signal Integrity (SI) more especially in the connecting layout design. The increasing aspect ratios of metal wires and also the ratio of coupling capacitance over substrate capacitance result in electrical coupling of interconnects which leads to crosstalk problems. In this thesis, first the work carried out to model the crosstalk behaviour between aggressor and victim by considering the distributed RLGC parameters of interconnect and the coupling capacitance and mutual conductance between the two nets is presented. The proposed model also considers the RC linear models of the CMOS drivers and receivers. The behaviour of crosstalk in case of under etching problem has been studied and modelled by distributing and approximating the defect behaviour throughout the nets. Next, the proposed model has also been extended to model the behaviour of crosstalk in case of one victim is influenced by several aggressors by considering all aggressors have similar effect (worst-case) on victim. In all the above cases simulation experiments were also carried out and compared with well-known circuit simulation tool PSPICE. It has been proved that the generated crosstalk model is faster and the results generated are within 10% of error margin compared to latter simulation tool. Because of the accuracy and speed of the proposed model, the model is very useful for both SoC designers and test engineers to analyse the crosstalk behaviour. Each manufactured device needs to be tested thoroughly to ensure the functionality before its delivery. The test pattern generation for crosstalk faults is also necessary to test the corresponding crosstalk faults. In this thesis, the well-known PODEM algorithm for stuck-at faults is extended to generate the test patterns for crosstalk faults between single aggressor and single victim. To apply modified PODEM for crosstalk faults, the transition behaviour has been divided into two logic parts as before transition and after transition. After finding individually required test patterns for before transition and after transition, the generated logic vectors are appended to create transition test patterns for crosstalk faults. The developed algorithm is also applied for a few ISCAS 85 benchmark circuits and the fault coverage is found excellent in most circuits. With the incorporation of proposed algorithm into the ATPG tools, the efficiency of testing will be improved by generating the test patterns for crosstalk faults besides for the conventional stuck-at faults. In generating test patterns for crosstalk faults on single victim due to multiple aggressors, the modified PODEM algorithm is found to be more time consuming. The search capability of Genetic Algorithms in finding the required combination of several input factors for any optimized problem fascinated to apply GA for generating test patterns as generating the test pattern is also similar to finding the required vector out of several input transitions. Initially the GA is applied for generating test patterns for stuck-at faults and compared the results with PODEM algorithm. As the fault coverage is almost similar to the deterministic algorithm PODEM, the GA developed for stuck-at faults is extended to find test patterns for crosstalk faults between single aggressor and single victim. The elitist GA is also applied for a few ISCAS 85 benchmark circuits. Later the algorithm is extended to generate test patterns for worst-case crosstalk faults. It has been proved that elitist GA developed in this thesis is also very useful in generating test patterns for crosstalk faults especially for multiple aggressor and single victim crosstalk faults
Can deep-sub-micron device noise be used as the basis for probabilistic neural computation?
This thesis explores the potential of probabilistic neural architectures for computation with future
nanoscale Metal-Oxide-Semiconductor Field Effect Transistors (MOSFETs). In particular,
the performance of a Continuous Restricted Boltzmann Machine {CRBM) implemented
with generated noise of Random Telegraph Signal (RTS) and 1/ f form has been studied with
reference to the 'typical' Gaussian implementation. In this study, a time domain RTS based
noise analysis capability has been developed based upon future nanoscale MOSFETs, to represent
the effect of nanoscale MOSFET noise on circuit implementation in particular the
synaptic analogue multiplier which is subsequently used to implement stochastic behaviour
of the CRBM. The result of this thesis indicates little degradation in performance from that
of the typical Gaussian CRBM. Through simulation experiments, the CRBM with nanoscale
MOSFET noise shows the ability to reconstruct training data, although it takes longer to converge
to equilibrium. The results in this thesis do not prove that nanoscale MOSFET noise
can be exploited in all contexts and with all data, for probabilistic computation. However,
the result indicates, for the first time, that nanoscale MOSFET noise has the potential to be
used for probabilistic neural computation hardware implementation. This thesis thus introduces
a methodology for a form of technology-downstreaming and highlights the potential of
probabilistic architecture for computation with future nanoscale MOSFETs
AI/ML Algorithms and Applications in VLSI Design and Technology
An evident challenge ahead for the integrated circuit (IC) industry in the
nanometer regime is the investigation and development of methods that can
reduce the design complexity ensuing from growing process variations and
curtail the turnaround time of chip manufacturing. Conventional methodologies
employed for such tasks are largely manual; thus, time-consuming and
resource-intensive. In contrast, the unique learning strategies of artificial
intelligence (AI) provide numerous exciting automated approaches for handling
complex and data-intensive tasks in very-large-scale integration (VLSI) design
and testing. Employing AI and machine learning (ML) algorithms in VLSI design
and manufacturing reduces the time and effort for understanding and processing
the data within and across different abstraction levels via automated learning
algorithms. It, in turn, improves the IC yield and reduces the manufacturing
turnaround time. This paper thoroughly reviews the AI/ML automated approaches
introduced in the past towards VLSI design and manufacturing. Moreover, we
discuss the scope of AI/ML applications in the future at various abstraction
levels to revolutionize the field of VLSI design, aiming for high-speed, highly
intelligent, and efficient implementations
Design and modelling of variability tolerant on-chip communication structures for future high performance system on chip designs
The incessant technology scaling has enabled the integration of functionally complex System-on-Chip (SoC) designs with a large number of heterogeneous systems on a single chip. The processing elements on these chips are integrated through on-chip communication structures which provide the infrastructure necessary for the exchange of data and control signals, while meeting the strenuous physical and design constraints. The use of vast amounts of on chip communications will be central to future designs where variability is an inherent characteristic. For this reason, in this thesis we investigate the performance and variability tolerance of typical on-chip communication structures. Understanding of the relationship between variability and communication is paramount for the designers; i.e. to devise new methods and techniques for designing performance and power efficient communication circuits in the forefront of challenges presented by deep sub-micron (DSM) technologies.
The initial part of this work investigates the impact of device variability due to Random Dopant Fluctuations (RDF) on the timing characteristics of basic communication elements. The characterization data so obtained can be used to estimate the performance and failure probability of simple links through the methodology proposed in this work. For the Statistical Static Timing Analysis (SSTA) of larger circuits, a method for accurate estimation of the probability density functions of different circuit parameters is proposed. Moreover, its significance on pipelined circuits is highlighted. Power and area are one of the most important design metrics for any integrated circuit (IC) design. This thesis emphasises the consideration of communication reliability while optimizing for power and area. A methodology has been proposed for the simultaneous optimization of performance, area, power and delay variability for a repeater inserted interconnect. Similarly for multi-bit parallel links, bandwidth driven optimizations have also been performed. Power and area efficient semi-serial links, less vulnerable to delay variations than the corresponding fully parallel links are introduced. Furthermore, due to technology scaling, the coupling noise between the link lines has become an important issue. With ever decreasing supply voltages, and the corresponding reduction in noise margins, severe challenges are introduced for performing timing verification in the presence of variability. For this reason an accurate model for crosstalk noise in an interconnection as a function of time and skew is introduced in this work. This model can be used for the identification of skew condition that gives maximum delay noise, and also for efficient design verification
Empirical timing analysis of CPUs and delay fault tolerant design using partial redundancy
The operating clock frequency is determined by the longest signal propagation
delay, setup/hold time, and timing margin. These are becoming less predictable with
the increasing design complexity and process miniaturization. The difficult challenge
is then to ensure that a device operating at its clock frequency is error-free with
quantifiable assurance. Effort at device-level engineering will not suffice for these
circuits exhibiting wide process variation and heightened sensitivities to operating
condition stress. Logic-level redress of this issue is a necessity and we propose a
design-level remedy for this timing-uncertainty problem.
The aim of the design and analysis approaches presented in this dissertation is to
provide framework, SABRE, wherein an increased operating clock frequency can be
achieved. The approach is a combination of analytical modeling, experimental analy-
sis, hardware /time-redundancy design, exception handling and recovery techniques.
Our proposed design replicates only a necessary part of the original circuit to avoid
high hardware overhead as in triple-modular-redundancy (TMR). The timing-critical
combinational circuit is path-wise partitioned into two sections. The combinational
circuits associated with long paths are laid out without any intrusion except for the
fan-out connections from the first section of the circuit to a replicated second section
of the combinational circuit. Thus only the second section of the circuit is replicated.
The signals fanning out from the first section are latches, and thus are far shorter than the paths spanning the entire combinational circuit. The replicated circuit is timed
at a subsequent clock cycle to ascertain relaxed timing paths. This insures that the
likelihood of mistiming due to stress or process variation is eliminated. During the
subsequent clock cycle, the outcome of the two logically identical, yet time-interleaved,
circuit outputs are compared to detect faults. When a fault is detected, the retry sig-
nal is triggered and the dynamic frequency-step-down takes place before a pipe flush,
and retry is issued. The significant timing overhead associated with the retry is offset
by the rarity of the timing violation events. Simulation results on ISCAS Benchmark
circuits show that 10% of clock frequency gain is possible with 10 to 20 % of hardware
overhead of replicated timing-critical circuit
- …