Efficient test and debug techniques are indispensable for performance characterization of large complex integrated circuits in deep-submicron and nanometer technologies. Performance characterization of such chips requires on-chip hardware and efficient debug schemes in order to reduce time to market and ensure shipping of chips with lower defect levels. In this paper we present an on-chip scheme for delay fault detection and performance characterization. The proposed technique allows for accurate measurement of delays of speed paths for speed binning and facilitates a systematic and efficient test and debug scheme for delay faults. The area overhead associated with the proposed technique is very low.
INTRODUCTION
Modern VLSI chips have stringent timing requirements driven by shrinking feature sizes. This has resulted in the emergence of delay faults as a significant problem and introduced the necessity to detect delay faults of the order of few picoseconds in logic paths of a digital integrated circuit. Delay faults are a category of faults which cause an otherwise functional chip to fail at specified clock speed and the objective of delay fault testing and debug is to detect timing defects and ensure that the design meets the desired performance specifications. Although timing verification and functional simulation during the design cycle ensure that a chip meets its performance specifications, it should be noted that these techniques are applied to a model of an Integrated Circuit (IC) and not to actual silicon. Hence these techniques are applied to a model of an Integrated Circuit (IC) and not to actual silicon and cannot detect delay faults which are caused by factors like distributed delay variations, crosstalk induced delay, logic errors, excessive voltage drop and swing on the supply nets, etc. Additionally, process variations can have a significant influence on a chip's failure to meet specified performance. Process parameter variations can result in distributed delay faults in the chip, which cause minor delay faults on multiple gates in a given path to accumulate and result in the path failing to meet performance specifications. Adding more details to models of an IC to incorporate these factors will cause the computational costs of timing verification methods to become prohibitive.
Diminishing feature sizes limit the observability of chips, making test and debug more difficult especially for timing violations. Also testing for delay defects using Automated Testing Equipment (ATE) for GHz range processors is very expensive and most testers in test facilities still run at a few 100 MHz. This makes on-chip circuitry an ideal choice for performance characterization and delay fault testing and debug. Also, the use of on-chip testing circuitry allows for at-speed testing essential for accurate detection of timing violations.
A commonly used technique for speed binning and detecting delay defects in chips is increasing clock frequency till the chip fails. There are sevaral disadvantages associated with this method. Primarily in GHz range frequencies, con- • phase shifted with each other. However this method requires ratioed capacitors, and fails if the transitions of the two clocks are skewed. Also this sampling circuit can detect delay faults only if the delayed transition on a path occurs after the sampling instant. Franco and McCluskey [3] propose a DFT technique to detect delay faults using transient switching currents in CMOS inverters. However this scheme has a low noise margin which hinders its fault detection capability, and has a high switching power overhead since it is a dynamic circuit. In [4] , a DFT technique based on capacitor voltage levels is proposed. This scheme requires determination of a threshold voltage which can be difficult to implement. Delay measurement schemes based on digitizing short intervals of time include a shift register/fast counter based Time-to-Digital Converter (TDC) [7] , oscillator based TDC [5] and CMOS tapped delay line configurations [1, 6] . Analog methods based on voltage ramp generation have also been proposed [8] , where the the voltage on a capacitor is proportional to the time difference between two rising edges. The major advantage of using delay line configurations is that they can be implemented in a standard digital CMOS process, which results in low power dissipation, high integration levels and good noise margins. However, the minimum achievable resolution of a TDC implemented using a single delay line is limited by the minimum gate delay of the technology in which it is implemented. This can be overcome by using a balanced delay line.
The organization of the rest of the paper is as follows. In Section 2, we introduce a balanced delay line scheme which can test a path for delay defects for all possible transitions. Section 3 explains a systematic debug scheme for delay fault testing and debug. Results are presented in Section 4 and conclusions are presented in Section 5.
MODIFIED VERNIER DELAY LINE
In this section we propose a Modified Vernier Delay Line (MVDL) that can be used to characterize critical path delays. The block diagram of the MVDL is shown in Figure 1 . It consists of two delay buffer chains with the delay of individual buffers in the lower chain (t buf ) greater than the delay of individual buffers in the upper buffer chain (t buf low ). The first arriving signal is fed to the input x of the lower buffer chain and the late arriving signal is fed to the input y of the upper buffer chain. As x and y propagate through their respective delay chains, the time difference between the two signals is reduced in every stage by an amount which equals the difference in delay of individual buffers in the respective chains. This is basically the resolution of the MVDL, i.e., In theory, any difference between the two signals can be measured by making the resolution as low as possible. However in practice, minimum resolution is limited by factors like mismatch of transistors, delay mismatch due to loading, length of the delay line.
In delay fault testing, the worst case delays in paths could be due to other different types of transitions. The MVDL can handle all possible transitions on a path as follows. Flops with set-reset capability are used to handle such transitions. When the input and output of the Path Under Test(PUT) are rising, the MVDL is RESET (i.e., all the flops are made to store a '0') and then the input and output of the PUT are fed into the x and y inputs of the MVDL, respectively. When both are falling, then the flops are RE-SET and the input and output of the PUT are inverted and fed into the x and y respectively. The first '1' stored in the flops in both the above cases is proportional to the delay of the path.
The above cases hold good only for non-complementing paths, i.e., paths in which an input transition in one direction causes the output to transition in the same direction. However there are two additional cases possible for complementing paths, where a transition on the input causes the output to transition in the opposite direction. For a complementing path, if input is rising and output is falling then we first SET the delay line, i.e., store 1s in all the flops, and subsequently feed in the input and output as it is into x and y of MVDL, respectively. Now all the flops are initially set to '1' and the delay of the path can be indicated by the first flop which latches in a '0'. The second possible case for complementing paths is when the input is falling and the output is rising. Now since the input is being used to clock the flops in the MVDL, we have to feed in the input as a rising edge. So we SET the flops and then complement both input and output before feeding them into x and y of MVDL respectively. Now all the flops initially store a '1' so the path delay is indicated by the presence of the first flop which latches in a '0'. Implementation of these schemes requires a multiplexer each for the selection of complemented or non-complemented input and output. This does not create any problems with the accuracy of the delay measurement since the MVDL is symmetric and depends only upon the difference between the two delays and inaccuracies can be eliminated by matching the upper and lower delay lines. Table 1 shows a list of possible transitions and proposed schemes to handle them.
Reading out of the values stored in the delay line is a crucial task and most proposed solutions involve tremendous hardware overhead like a separate asynchronous read-out architecture using registers [1] . We have resolved this issue in the MVDL by using a readout scheme which has minimal hardware and pin overhead. As shown in Figure 1 Similarly the x line input of a stage is multiplexed with a clock signal to be fed into the the clock input of the flop of that particular stage. Once we have completed the measurement of delay of a path under test, we switch to the readout mode by asserting the mode signal which activates both these multiplexers. Now we can shift out the values in the flops serially using shiftclk to obtain the delay of the path under test.
MVDL FOR DELAY MEASUREMENT AND DEBUG
The proposed debug scheme using the Modified Vernier Delay Line (MVDL) is shown in Figure 2 . A single MVDL is used per chip to test all modules. Critical paths above a certain designer specified threshold are susceptible to failure and these need to be tested for delay faults. The path delay fault model has been used, but if there are an exponential number of paths then the segment delay fault model can be used. Start and end points of critical paths from different modules on the same chip that are above the specified threshold value of delay are multiplexed into the the MVDL delay measurement unit. Measurement of delay of multiple paths/modules requires a structured debug technique. In [10] a debug interface using IEEE 1149.1 is proposed. We extend this to test and debug multiple modules within the same chip using a single MVDL module. The proposed methodology is outlined below.
• Reset the Circuit Under Test (CUT) to a known state.
• Select the particular critical path to be tested from the multiplexed set and feed it into the MVDL for testing.
• SET/RESET the flops in the MVDL as per the path functionality. If the path is a complementing one then we SET the flops, else RESET the flops.
• Activate the MVDL in the delay measurement mode by de-asserting the mode and SET/RESET signals.
• Apply the appropriate worst case patterns to the path and allow values to be latched into the flops.
• Assert the mode signal to scan out the values stored in the MVDL.
• Count the number of number of indicator values i.e. 1s for non-complementing paths/modules and 0s for complementing paths as be the case. • Calculate path delay from this information.
• Select the next critical path and repeat the above steps for the next critical path. 
RESULTS
The Modified Vernier Delay Line (MVDL) was designed in 0.18 µm CMOS technology. Wallace and Dadda multipliers of different sizes were used as test circuits. We used a commercial static timing analysis tool [9] to extract the top critical path for each circuit as well as the delay of the critical path. Each such critical path was then sensitized for the worst case propagation delay and the range of the delay values was measured using the MVDL. The input and output of each critical path were connected to the x and y inputs of the MVDL respectively. The resolution of the MVDL was set to 97ps for all measurements. Table 2 compares critical path delay measurements obtained using the proposed technique against critical path delays obtained from a commercial timing analysis tool [9] for different test circuits.
The area overhead of the MVDL module is minimal. A 12-20 stage MVDL is sufficient to for detecting a delay faults as well as indicating its size with a resolution of 10% of the cycle time and worst case size of delay faults ranging from 20% to 100% of cycle time. The area overhead of a such a MVDL module for a processor having approximately 200 mm 2 die area is 0.0029% -0.0047%.
CONCLUSION
In this paper we have presented a novel scheme to detect and debug delay faults using on-chip capture of delays in the range of few hundred picoseconds. The scheme also has an efficient read out technique involving minimal pin and area overhead. Path delays and delay faults smaller than 100 picoseconds can be measured using this scheme, thus reducing the time-to-market for high-performance chips. Future work would focus on optimal placement of MVDL modules on-chip for obtaining maximum delay fault coverage.
