Abstract-Asynchronous interfaces, being a popular way of dealing with timing closure problems in deep submicron SoCs, pose a serious problem for on-line testing. Their behavior is specified not as a traditional clocked automaton, but as an asynchronous protocol with timing which depends on the clocks of the communicating blocks, wire and gate delays. The problem is exacerbated by the use of non-encoded bundled data buses, whose transitions cannot be reliably detected, as opposed to very expensive self-timed data encodings. This paper presents a further development of our low-cost checkers for asynchronous handshake protocol, which now support bundled data and are interfaced to the scan chains to assist diagnosis and debugging. Further reduction of area requirements for the delay elements, possibilities to define separate delays for each protocol phase, test results collection and checking the timing relationships between the handshake and the data signals are discussed. An application of the data transition detectors to checking of synchronous SDR/DDR protocols is shown.
I. Introduction
The ongoing evolution in semiconductor industries allows the integration of a complete system within a single chip. Due to upcoming issues, including process variations, clock skew, problems with distribution of the clock signal, raised electromagnetic interferences (EMI) due to highly frequent signal switches etc. designers have to consider the use of new design technologies.
Pure asynchronous, globally-asynchronous locallysynchronous (GALS) and desynchronization design techniques seem to be a promising option to overcome these issues [1] . The basic concept of data transfer in such systems is handshaking. Handshaking systems are more robust against process variations, offer improved modularity [2] and are suitable for security applications such as smart cards.
Due to these properties much research is being carried out in the field of asynchronous circuit development. This includes circuit development itself as well as development of tools such as Haste [3] in order to automate the design flow. The complementary research field is verification and test. Several techniques have been proposed that structurally test asynchronous circuits by usage of adjusted scan techniques [4] , [5] , [6] , [7] , [8] . BIST techniques have been addressed in [9] , [10] , [11] . Apart from these well established test methods there are some design-for-testability techniques that directly address special properties of asynchronous circuits, e.g., asynchronous handshake protocols. Since a fault (constant, transient, parametric) within a handshake protocol can potentially lead to either loss of data or to a deadlock at the circuit or system level, the appropriate protocol-level on-line testing and diagnosis techniques are needed. However, due to their asynchronous nature these protocols are difficult to test using standard ATE based off-line test approaches.
The currently known approaches [12] , [13] , [14] monitor the handshake signals used in bundled-data protocols and ignore the data transitions. A special bundled-data timing constraint, which is crucial for data integrity, is considered separately from the protocol. While the reasons behind such a situation are understandable, as protocols are frequently modelled with causality models having no notion of time, the engineers are not satisfied with it.
In this paper we propose a new idea of detecting data transitions on a bus without using any high-overhead data encodings known as self-timed monotonic codes (dual-rail, m-of-n, 1-hot, etc.). This prevents us from detecting data transitions reliably. For example, it is clear that a data transition x i → x i (into the same value) will not produce any switching events on a bundled-data bus. However, under normal operation, the data will change on the bus sooner or later, thus performing a transition which will be checked for its timing. From a formal point of view, this introduces non-determinism into the causality model of the protocol as shown in Fig. 1(a) , where the data transition d
* cannot be reliably detected. Our solution is shown in Fig. 1(b) , where the correct data transition d * is ignored (excluded) and the erroneous data transition d * (taking place in the process of reading data) is detected.
The contribution of this paper includes three main aspects: further improvement of our previous designs by interfacing it to scan, modifying the delay lines and inclusion of bundled data transitions into the protocol checking technique. A sideproduct of this work is a checker for data transitions on singledata-rate (SDR) and double-data-rate (DDR) buses. This paper only introduces the circuit solutions and shows simulation results. The future work will include the applications of the proposed techniques to large designs. 
II. Related work
Many approaches to testing asynchronous channels consider the synchronous-asynchronous interfaces. In [15] O. Petre and H.G. Kerkhoff presented a full scan approach in order to structurally test interfaces for 2-phase bundled data protocols. They introduced a method to remodel the interface in order to use commercial ATPG tools for pattern generation. A similar approach was presented by A. Efthymiou et al. in [16] . In contrast to the previously mentioned approach they used a partial scan technique to keep the overhead for a scan as low as possible. Another approach provided by D. Scheit and H.T. Vierhaus in [17] can be applied to detect and even correct faults in data signals of a channel. This is achieved by applying common error detection and correction (EDAC) techniques which recover the data in case a fault within the data word is detected.
In order to evaluate the handshake protocol during its normal operation, one may make use of on-line test techniques. Fig. 2 shows the basic scheme of how protocol checkers can be integrated into the system.
The checker receives the handshake signals (req i , ack i ) of the channel i to be checked and delivers an indication signal in case of a detected fault (see f 1 , f 2 in Fig. 2 [12] . This scheme was further improved resulting in the checker design provided in [13] . Both checkers were designed to test bundled data protocols for the following types of faults:
• Stuck-at faults: One of the handshake signals is permanently at low or high.
• Premature transition: A handshake signal transition that occurs to close to a preceding transition, i.e. the time between the two transition is less than a minimum permitted delay d min .
• Order violation: Occurs when the monitored handshake signal transitions do not correspond to the order of the transitions of the protocol used.
For each of the protocol phases the checkers [12] contain a separate delay element to form a time interval when signal transitions are not allowed. The major drawback of the reference designs is their complexity. In [14] we have shown that the overhead for an asynchronous handshake protocol checker (AHPC) can be reduced by avoiding separate delay elements while sacrificing the possibility to define separate delays for each of the protocol phase. Fig. 3 shows the schematic of the previous checker design. It can be seen that the checker mainly consists of a transition-detection-unit (TDU), an AND-gate, a comparator, one delay element, a 2-bit feedback-shift-register (FSR) and one D-Flip-Flop (DFF).
The checker works as follows: The FSR generates a repeated 2-bit sequence → 01 → 11 → 10 → 00 → . . . [14] handshake signals and fed to the FSR. Thus, the delayed impulse causes the FSR to update its state after the minimum permitted delay d min . In case that another handshake signal transition is detected by the TDU the newly generated pulse causes the DFF to sample the comparison result whereby the FSR has not been updated. Consequently, the sampled comparison result indicates a mismatch between the expected and actual handshake signal values. In this way prematuretransitions and similarly order-violations are detected. If such a fault is detected the AHPC is in a deadlocked state, since the AND-gate that works as a clock gate inhibits the propagation of pulses coming from the TDU. Consequently, the result of the on-line test is kept until the checker has been reset.
III. Checker improvements
In this section we present improvements of the previous checker design. An important contribution is checking for violations of the bundled-data timing constraint, that may result in hold time violations of registers of the interface of the receiving block. Our new implementation is based on our previous AHPC design provided in [14] .
A. Introduction of an improved delay unit
In the previous checker implementation the delay element used to realize the minimum permitted delays were implemented via inverters and buffer cells that have symmetrical rising and falling edge timings. Thus, the resulting delay is near to the model of a perfect transport delay. In order to use enhanced buffers cells with larger delays (and a better delay/area ratio) one has to consider the problem that small signal pulses, such as the pulses generated by the TDU, may fade out while propagating through those enhanced delay cells. This means one has to assume an inertial delay for these delay cells. To solve this problem we modify the delay element as shown in Fig. 4(a) . This modified delay unit (MDU) consists of two logical gates forming a feedback loop and a standard delay element that delays a signal by the time d. The two logical gates act as some kind of a set-reset-latch, whereas the set signal is the input i and the reset signal is the output signal of the standard delay element.
The MDU works as follows: Initially, after the reset has been applied all internal signals are low. An incoming impulse at the input i forces the feedback loop in front of the standard delay element to stay high. Simultaneously, the rising edge propagates through the standard delay element. When the rising edge reaches the output of the delay element it resets the value of the feedback loop to logical low. In consequence, the logical low value, i.e. the falling transition, propagates through the standard delay element which takes the time d as well. Thus, the incoming impulse is delayed and enlarged corresponding to the delay d as shown by the analog simulation given in Fig. 4(b) .
Using this modified delay unit the minimum delay between two signal transitions can approximately be doubled without doubling the number of delay buffers in the following way. The output of the modified delay unit is combined with the output of the comparator via an OR-gate as illustrated in Fig. 5 . This leads to the following behavior. If a premature transition of a handshake signal occurs in the interval d o + the circuit works in the same way as before. In case that the transition occurs in the interval d o − then the output of the modified delay unit and consequently the output of gate OR 1 are logical high. Thus, the DFF will capture this value that will indicate the premature transition fault. Thus, the minimum delay d min between two signal transitions is given by
B. Integrating scan capabilities
Another improvement of the checker design is the introduction of a scan path in order to ease the collection of the test results during off-line tests. Making the checker scannable also offers the possibility to configure the checker before performing on-line tests. Therefore, the same checker architecture can be used for different asynchronous handshake protocols, e.g., protocols that start with different handshake values. The scan insertion is relatively ease to achieve. Since the checker contains an FSR it is not needed to replace all standard flipflops (FFs) by their scannable counterparts. Instead, only the first FF of the FSR and the DFF that samples the output of the comparator are made scannable. To perform the shift process an external clock signal tck is required. As shown in Fig.  6 four gates are introduced that act as multiplexers for the clock signal. G 1 provides the FSR with either the delayed impulse generated by the TDU or the external clock signal. Similarly, G 2 provides the DFF with the external clock signal or the pulse from the TDU. In order to avoid the checker from malfunction due to setup-and hold-time violations caused by premature samplings of the output of the comparator, the clock pulse coming from the TDU is delayed by an additional buffer element.
C. Configurable delay implementation
One major drawback of the checker proposed in [14] compared with the AHPC of Shang et al. is that the phases of the handshake protocol has to be symmetrical, i.e. all phases are required to have the same minimum permitted delay. But most asynchronous protocols have asymmetrical protocol phases. Thus, the definition of separate minimum delays for each of the protocol phases is mandatory. In order to realize this for our checker solution, we propose the insertion of a configurable delay line as shown in Fig. 7 . The configuration of the delay line is controlled by the FSR. Thus, depending on the current value of the FSR the multiplexer selects the delay for the current protocol phase. The problem that needs to be solved in this circuit configuration is the following: Let's assume that initially d 1 is selected. Thus, an incoming pulse propagates through d 1 and the multiplexer and causes the FSR to update its state. Thus, a longer path, e.g., d 2 is selected. Now it might happen that the same pulse has not propagated through d 2 . Consequently, the same incoming pulse can cause multiple glitches at the output of the multiplexer. This is illustrated by the second waveform of Fig. 8 . In order to prevent this behavior we introduce a similar set-reset-latch logic as proposed in the modified delay unit scheme. With this additional latch an incoming pulse results in exactly one rising edge at the output clk fsr as shown in the third waveform However, in this simple configurable delay unit (CDU) inverter chains are necessary to prevent the incoming pulse to fade out while propagating through the delay. For this reason this circuitry is unfeasible for larger delays. As a solution we suggest the scheme as given in Fig. 9 . The basic idea realized by this circuitry is to convert the level based events, i.e. pulses, coming from the TDU into a transition based event. After the transition has propagated through the configurable delay the transition is reconverted into a level based event, i.e. a pulse. The conversion from the level-based to a transition based event is obtained by the FF in front of the delay line, whereas the back-end reconversion is achieved using a transition detector. The additional C-element is used to prevent multiple signal switches in case of changing from a short to a long delay path similar to the set-reset-latch in the alternative approach. The resulting output of this extended CDU is given in the bottommost waveform of Fig. 8 .
D. Checking bundled data timing
Besides the previously mentioned architectural improvements our new AHPC shall be able to check bundle data timing violations, i.e. if the data signals of a channel are switching in the wrong protocol phase or time.
Such a violation can potentially appear in a receiver initiated pull protocol, shown in Fig. 10 on the receiver side of the bus 1 . First, the request is assigned by the receiver to initiate the data transfer. After an arbitrary delay the sender drives the data and sets the acknowledgment. When the acknowledgement reaches the receiver the data is allowed to change for the bundled-data time t bd . After this period the data has to be quiescent at least for the time t q , otherwise a bundled-data timing violation occurs that may result in setup and hold time violations of the registers of the receiving interface. In order to integrate the bundled-data timing check, one has to know in which phase of the protocol the data changes and how long these changes take to settle down at the input of the receiver, i.e. the bundled-data time t bd . Eqn. 1 shows how to obtain t bd .
Depending on t d and t a , t bd can be positive or negative. In order to avoid using any delay elements in the read interface a designer may choose to set up the CAD tool constraints or modify the remote interface to have
In this case the phase r, a = 1, 1 can be used to select the quiescent interval. If the inequality given in Eqn. 2 does not hold, additional delay elements will be required to correctly enable the check process. We propose the following two possibilities to test for such violations: The first way is to integrate one transition detector for each data bit. The outputs of these detectors are combined via OR gates (similar to the TDU implementation shown in Fig. 3 ). The second possibility is to combine all data bits using an XOR tree as shown in Fig. 11 . A per bit transition detector will have the advantage to detect all timing violations while having a huge overhead, e.g., one may need n XOR2 gates, n buffer cells and 2 n−1 −1 OR2 gates to obtain the final transition indication signal of n data signals. Additionally, the buffer cells have to have a sufficiently large delay to avoid the generated pulse to fade out while propagating through the OR-tree. In contrary, an XOR tree will only indicate a transition when the parity of the data signals changes. However, unless a special constant parity encoding is used, there will be reasonably frequent detectable data transitions (whose probability is application specific) facilitating checking of timing violations and hazards. Due to the drastically reduced overhead (2 n−1 XOR2 gates and one buffer cell) we prefer this method for our bundled-data timing check scheme.
As shown in Fig. 11 the root of the XOR tree is connected to the AND2 gate through a transition detector. The second input of this gate is an enable signal derived from the state of the FSR. This enable signal is used to prevent the checker to indicate an error in case the data is allowed to change or when the designer does not care if so. In the considered pull-protocol example the state of the FSR in which the data needs to be quiescent is req fsr , ack fsr = 0, 1 , since the FSR predicts the handshake signal values of the next rather than of the current phase. The enable signal needs to be delayed for the time t bd , since the data is allowed to change in this interval. This is achieved by the additional buffer cell b between the gates g 1 and g 2 . The resulting extended TDU will replace the previous implementation making the checker capable of detecting bundled-data timing violations.
The provided scheme of data transition checking is not restricted to handshake protocols. Another application of this test method are SDR and DDR buses. Similar to the handshake signals of a bundled-data handshake protocol the bus clock represents the reference to which timing constraints need to be fulfilled in order to function correctly. For this application the checker can be simplified resulting in a scheme illustrated in Fig. 12(a) . The scheme shows the implementation for a SDR bus. One can recognize three delay elements: d 1 is used to define the time interval in which the data is required to be quiescent; d 2 and d 3 are used to delay the check enable signal and the pulse generated by data switching activity to appear in the interval to be checked. Therefore, d 2 and d 3 have to be carefully balanced against each other. The scheme can easily be adopted to DDR-busses with symmetrical clock waveforms by replacing the NOR gate by an XOR gate as shown in Fig.  12(b) . 
IV. Experimental Results
In order to estimate the area and power requirements all of the checker implementations provided here and the one of Shang et al. provided in [13] , were realized in IHP 0.13 μm technology using the Synopsys Design Compiler for synthesis. In contrast to the implementations in [14] we implemented the AHPC of Shang et al. with only two delay elements. Additionally, we have used enhanced buffer cells with larger delays instead of inverters. Both measures result in a drastically reduction of the area requirements compared with the results we presented in [14] . Since our previous AHPC design requires inverter chains due to the inertial delay behavior of the enhanced buffer cells, the area and power requirement are worse than the ones of the AHPC of Shang et al. as illustrated in Fig. 13 . The diagram shows the overhead for the different AHPC implementations with one delay element that realizes d min ∈ {1 ns, 2 ns, 4 ns, 8 ns, 16 ns}. Except of the previous AHPC design with and without SCAN each delay element was implemented using enhanced buffer cells. The resulting area requirements are shown in Fig. 13(a) . It can be seen that depending on the delay used and the type of checker our new implementations require 44.5-70.4% of the area requirement provided by Shang et al. Additionally, one can see that the scan insertion has a negligible constant overhead (approx. 37.6 μm 2 ). The area and power consumption results of the AHPC with configurable delays are given in Table I . The checkers are designed to have different delays of 2 ns, 4 ns, 6 ns and 8 ns for each protocol phase. It can be seen, that our checker scheme with the extended CDU has the smallest area requirement while having the a slightly higher power consumption than the checker implementation of Shang et al. Due to the considerable amount of inverter cells and, therefore, switching cells in case of handshake activities, our checker with the simple CDU has the largest power consumption and area requirement as well. Table II shows the post-synthesis results of the extended TDU as shown in Fig. 11 with and without bundled-data timing check of 4, 8, 16 and 32 bit data buses.
V. Conclusions
In this paper we have proposed several improvements of the checker design we provided in [14] . This includes the methodology to minimize the overhead for an AHPC in case only one minimum delay for all protocol phases is sufficient. Therefore, we introduced a special delay element that is capable of enlarging a short pulse to the duration of a defined delay. We have shown how the collection of test results can be eased during off-line mode by introducing a scan architecture. The scan information can be used in diagnosis and functional testing. Furthermore, it solves the problem with off-line testing of the previous asynchronous checker circuits. Additionally, we introduced a configurable delay unit, in order to define different delays for each protocol phase. Finally, we have provided a methodology to test timing dependencies between handshake signals and data transitions. An application of the data transition detectors for checking of synchronous SDR/DDR protocols is presented. The future work will include applications of the proposed techniques to large designs.
