As an at-speed solution to board-level 
Introduction
For boards and systems composed of today's high-speed state-of-the-art devices, the timing-related defects have become major obstacles to their correct normal behaviors. To solve the problem of such timing-related defects and to ultimately guarantee a satisfactory level of quality in our high-speed products, it has become imperative to deliver some kind of at-speed testing or delay testing.
In order to test interconnects of heavily loaded boards, the IEEE 1149.1 boundary-scan standard test infrastructure [1] has been industry-widely accepted as a cost-effective solution with no or much less need for physical probes [1] [2] [3] [4] [5] [6] . But unfortunately test methods based on the standard boundary-scan structure have exposed shortcoming of failing to reveal and detect timing-related defects in high-speed boards. This drawback results from the fact that the IEEE 1149.1 boundary-scan test architecture is by its nature designed to have an interval which spans at least 2.5 test clock cycles(2.5TCKs) between the update of test stimulus and the capture of the response. This characteristics cause most of the timing-related defects to escape the standard interconnect test.
Several recent researches have focused on overcoming this drawback of the IEEE 1149.1 architecture by making their designs capable of achieving the update and the capture within one system clock cycle [3] [4] [5] [6] .
However, most of the previous works did not pay much attention to the fact that, for most board designs, the interchip signal communications across the interconnections are being synchronized to several different system clocks with varied frequencies for the goal of optimizing overall system performance. This situation is shown in Figure 1 , where a group of interconnects including Net1 and Net2 is within a different system clock domain from the one including Net3 and Net4. Since either at-speed testing or delay fault testing needs to observe the response to the stimuli applied exactly one system clock cycle before, we need some special control mechanism for indicating the period of one system clock cycle. In [3] , the relative timing interval between two 'board-wide' TAP signals(TCK, TMS) are exploited to provide the control to enhanced boundary-scan cells. But with that approach, to deliver appropriate delay tests to the entire interconnecting network is not feasible without repeating similar test processes against each system clock domain of nets using different relative intervals reflecting the different system clock periods. That is, Net1 nd Net2 cannot be tested atspeed simultaneously with Net3 and Net4 in Figure 1 . By building the control circuits within each componet [4] [5] and designing those control circuits to handle their own system clocks available within each component, multiple system clocks on a board can be dealt with at the same time. But in [4] only the interconnects between components featuring the new at-speed control mechanisms can be tested at-speed, while the remaining interconnections are tested at standard TCK rates, significantly reducing the range of applications. The enhanced boundary-scan design architecture in [5] is also useful only when the test clock is of the same frequency and phase as system clocks.
In this paper, we propose a new at-speed solution for the interconnect testing in a boundary-scan environment with a wide range of applicability. With the new at-speed design architecture incorporated within only components at receiving sides, interconnects can be tested at-speed, while the ATE controlling the test provides only board-wide TAP signals at the standard TCK rate, which reduces the burden on the test equipment. It can also facilitate to measure the exact propagation delay sizes by following an appropriate test algorithm described here.
In the following section, the detailed description of the basic design architecture is given. Section 3 explains the test algorithms for at-speed testing, delay measurements and standard interconnect testing. In section 4, the benefits and costs of the proposed approach are reviewed. Conclusion and future work is followed.
Basic Design Architecture
Our new at-speed boundary-scan design architecture is based on the combined use of modified input boundary cells described in the next subsection and a new userdefined boundary-scan control register. Since built within each component, the new boundary-scan register can control the cells in each component independently of the way that the cells in the other components are being controlled. This feature can facilitate fully parallel at-speed testing and delay testing on every net if only one system clock is assumed for each component.
Early Capture: review from [3]
An enhanced boundary-scan test structure proposed in [3] facilitates effective delay fault testing by allowing the capture flipflops(CAPs) to capture data appearing at chip input pins at arbitrary time after the update event. Actually, the capture operation in the CAP is triggered after an interval of 2.5TCKs or later from the update in output boundary cell just as it is during the standard interconnect testing(Update-Capture interval is at least 2.5TCKs). But an additional level sensitive latch(Early Capture Latch: ECL) inserted between the chip input pin and the input multiplexer(MUX1) passes the signal across from a net straight through only while 'Early Capture' control signal is high. It latches the value at the input pin when Early Capture goes low and holds the value until it is captured at the capture instant. Effective delay fault testing and at-speed testing can be performed by driving the Early Capture to the ECL control input in such a way that ECL is transparent during only one system clock cycle from the update instant(Update-EarlyCapture interval is 1SCLK). In our atspeed boundary-scan architecture this modified boundary cell design is capitalized on without alteration.
Early Capture Control Register
The design of the on-chip control circuitry whose mission is the accurate control of the relative timing between the update in output cells and the falling edge of Early Capture(Update-EarlyCapture interval) is the main concern in this paper and described in this section. Figure 3 depicts a sampling register with a delay line that delivers enabling signals to each flipflop stage successively at closely spaced timing intervals, the propagation delays through two inverters. A clock signal(SysCLK) with a half frequency of the system clock(SCLK) is provided as a common data input for each flipflop stage. With this configuration, the continuous timing behavior of SysCLK can be sampled at the CAP stages and stored in the form of discrete bit representation. Hopefully, if the propagation of enabling signal is designed to be initiated by the rising edge of SysCLK, successive 1's would be sampled at the flipflop stages placed on the first part of the register, then 0's in the following part of the register. Obviously, the number of the first stages whose contents are 1 is closely related with one period of the system clock(SCLK). The circuit configuration shown in Figure 4 is responsible for converting the discrete representation stored in the register into a continuous time signal(Early Capture) capable of indicating the duration of one system clock period. Each output of the flipflop stages is tied together forming a wired-and output signal. Suppose all the outputs of each stage are initialized with 1's, thus wired-anded into 1. Now, if a triggering signal is applied, it will ripple through the delay line stages and successively update the outputs of each flipflop with the sampled clock data at the same intervals as in Figure 3 . Updates in the first part of flipflop stages whose input contents are 1's would not change the value of the wired-and output. At the update instant of the flipflop stage whose input value is 0, the wired-and output would be changed to 0. Ideally, the interval between the applying of the triggering signal and the change of the wired-and output signal is identical to one system clock(SCLK) cycle period.
In our new test architecture, the two separate functions described above can be integrated into a user-defined boundary-scan register called ECCR for 'Early Capture Control Register', with each stage composed of a capture flipflop(CAP) and an update flipflop(UPD) just like the standard boundary data register. However, for the delayed enabling signals to each flipflop stage just as shown in the previous figures, the standard register is augmented with two delay lines and multiplexers in its data and control inputs. Figure 5 represents an ECCR configured in a simple boundary-scan component.
In this figure, the CAPs in the ECCR are connected to form a shift register between TDI and TDO when selected, which is also the same as in the standard boundary data register. With this feature, the contents of the sampling register may be obtained either by the sampling operation or by shifting appropriate bit patterns through the shift register scan path, providing capability to set up an arbitrary Update-EarlyCapture interval. The control signals in the CAP stages are enabled only while a specific user-defined instruction, called ECCR_SETUP, is in effect. But those for the UPD stages are independent of the current instruction in the instruction register(IR) and stay enabled throughout the test. These control signals and their Boolean equivalent representations are also shown in Figure 5 . Though not shown in Figure 5 , it is assumed that UPD outputs are designed to be initialized with 1's during Exit1-DR state, causing the Early Capture to return to logical 1.
These difference of delays in two delay lines in Figure 5 could affect adversely the reliability of our test result. For example, this differences may result from the process variation. But process variation may be regarded to be negligible because ECCR can be implemented within a small region on the silicon.
The unwanted delay in the path from the delayed UpdateDR control signal through the UPD flipflop and the wired-and output net to the ECL control input is another potential source of degrading the quality of the ECCRbased test. To overcome this problem, if the delay is as large as the delay through several inverters, we may utilize the programmability of sampling points which will be described in section 3. If it is not so large but still critical, then more expensive solution should be sought after just like those in [7] . 
ECCR_SETUP Instruction
To dynamically setup the contents of an ECCR, we propose to add a new user-defined instruction, called ECCR_SETUP, to the IEEE 1149.1 instruction set. ECCR_SETUP selects the ECCR as a target register connecting TDI and TDO during Shift-DR state. When this instruction is in effect, the propagation of the delayed enabling signal for the CAPs of the ECCR is triggered by the rising edge of ClockECCR during Capture-DR state. This is also shown in Figure 5 .
Test Algorithm
By utilizing the test structure just illustrated above, atspeed testing and delay measurements on the interconnecting nets can be performed in addition to the standard boundary-scan interconnect testing. For these purposes, the following steps are used.
Step A. Load SAMPLE/PRELOAD and shift safe patterns into the boundary registers through the scan path.
Step Step C. Load EXTEST into IR and take the same procedure of the standard EXTEST.
Step A is required to prevent the potential conflicts when the testing mode is being entered. For the different goals we want to achieve after Step C, different procedures are taken during step B.
For the at-speed testing, step Ba need to be followed, but step Bb need not. This makes the Update-EarlyCapture intervals in each component become one period of the system clock(Update-EarlyCapture = 1SCLK period).
By using the capability to setup the ECCR with an arbitrary bit pattern through the shifting operation in step Bb, the propagation delay sizes on interconnects can be measured accurately. For example, if it is found out in step Ba that one period of a system clock associated with a group of interconnects amounts to 10 successive 1's in the first part of the ECCR, then a bit pattern with 9 successive 1's followed by 0's will cause the ECLs in input boundary cells to sample the interconnect data signals earlier than during the at-speed testing case, by a unit delay time(Update-EarlyCapture = 1SCLK period -the propagation delay through two inverters). We may repeat the step B and C with one more or less 1's in the ECCR each time until we can finally determine the size of propagation delay by confirming the change in the captured data signal value. This delay measuring capability can be applied even on the interconnects which non-clocked component pins are connected to. In that case, TCK, for example, rather than system clocks can be sampled and used as a reference signal since its period is already known.
By shifting patterns composed of all 1's into each ECCR, the standard interconnect test will be resulted in. In this case, Update-Capture interval is 2.5TCKs.
For the standard or at-speed interconnect testing, the step B(ECCR_SETUP) need to be followed only once, while measuring the delay sizes need performing the step B and the step C repeatedly with one test pattern. Figure 6 depicts the timing of the test signals during the at-speed interconnect testing. All of these signals are defined in the Figure 5 . A chip input pin is driven by DATA_IN in such a way that our new boundary-scan archi- Figure 6 . Timing of signals at-speed interconnect testing
tecture would manifest its at-speed testing capability. Data input from an interconnect rises to 1 and falls down to 0 around the time t 2 (t 2 -t 1 = 1 SysCLK). As shown in the figure, the sampled value at the time t 3 is 1, not 0. Thus we can be assured that our test architecture can perform the update and the capture within one system clock period.
Benefits and Costs
Building the at-speed control circuitry within each component leads to the significant reduction of overall testing time. If we assume that 5 domains of interconnects controlled by different clock speeds are on a board, at-speed testing or delay testing can be completed 5 times earlier than the case in which at-speed control is provided in the form of board-wide control signal(s). In the case shown in Figure 1 , all interconnects can be tested in a parallel manner. Though it is overlooked in this assumption that interconnections may be divided into several domains even within a component for the effective signal communications with other components, this on-chip control circuitry can still considerably reduce the overall test time compared with board-wide control mechanism just like that in [3] . Moreover, since only the receiving sides of interconnects are required to incorporate the new enhancing features proposed here, the at-speed testing can be achieved on the wide range of interconnects. The capability of measuring the propagation delay sizes across the interconnects is expected to be utilized for many different purposes. Finally, complete compatibility with IEEE 1149.1 standard architecture allows our new test architecture to be used in board designs without any compatibility concern.
Though the basic configuration of ECCR shown in Figure 5 is simple and intuitive, its area overhead may be regarded as being too costly. Particularly, since the serially connected two inverters were selected as a unit delay element in the ECCR delay lines, quite a lot of stages are required to sample the full waveform of one system clock cycle. If inverters are implemented to have a propagation delay of 0.2ns and a system clock is at the frequency of 50MHz, then at least 50 stages of unit delay elements are needed. These amount to 100 inverters and 50 flipflop stages! An alternative configuration is shown in Figure 7 . This configuration reduces the waste of flipflop stages and inverters at the first part of register stages by replacing the inverters placed in this part with elements with a larger delay size and removing the CAPs and the UPDs associated with those delay element stages. This is justified by the observation that these flipflop stages would always sample only logical 1's during Capture-DR state and would not change the wired-and output signal with the sequential updates during Update-DR state.
Finally, board or system design phase, designers are required to be careful to make the skewes of system clocks and the test clock(TCK) to neigboring chips identical.. For this purpose, designers may route those clocks along the same path.
Conclusion and Future Work
We proposed an at-speed boundary-scan interconnect test solution. The proposed test circuit design and test algorithms can serve as means to overcome the problem of the timing-related defects on today's high-speed multiple system clocked boards with limited physical accessibility. Due to the complete conformity with the IEEE 1149.1 standard test structure, the existing boundary-scan software and test equipment can still be used.
Though the at-speed boundary-scan design in this paper can control the boundary-scan cells in each component independently of the way those in the other components are being controlled, it does not provide an effective method to simultaneously control every cell which may be associated with different domains of system clock frequencies within each component. Our on-going research work is targeted at providing a solution to this problem.
