Abstract-An automatic test pattern generation approach to detect delay defects in a circuit consisting of current mode threshold logic gates is introduced. Each generated pattern should excite the maximum propagation delay at the fault site. Manufactured weights may vary, and maximum delay is ensured by applying an appropriately generated set of patterns per fault. Experimental results show the efficiency of the proposed methods.
of the TN which is pipelined, does not have any feedback loops, and has unit combinational depth at each pipeline stage.
The main objective of this paper is to implement an automatic test pattern generation (ATPG) method where each generated pattern excites the maximum possible delay on the CMTL gate at the fault site and is more likely to excite an error at the output driven by the faulty gate. Given the pipelined structure of the TN, the transition fault (TF) model is ideal for detecting delay defects. This pattern sensitive approach will likely to detect a TF, if one exists. The manufactured weights of a TLG may vary. It is shown that the maximum delay at the faulty TLG is excited even if the test pattern set is generated using the designed weights. To our knowledge, ATPG methods for TNs only focus on logic defects using the stuck-at fault model [12] , [13] .
The main contributions of this paper are summarized as follows. First, it is shown how to generate a test set per TF to excite the maximum possible delay at the gate and propagate the latched error to an observable point of the pipelined TN. An important component of the ATPG is to identify the group of patterns that excite the maximum rising or falling transition delay at the selected gate. Second, it is shown how to generate a set of test patterns for each TF to ensure the maximum delay in the presence of weight deviations. This is important because physical defects and process variation during the manufacturing of a TLG may result into the manufactured weights that differ from the designed (ideal) weight values. Finally, a test set compaction scheme is introduced which focuses on generating a high quality test pattern to detect more than one TFs in order to reduce the test data volume and the test application time.
This paper is structured as follows. Section II presents preliminaries. Section III presents the ATPG method considering the designed weights. Section IV presents a method to generate a test set per TF in order to cope with weight deviations. Section V presents the test set compaction technique. Section VI presents experimental results. Section VII concludes this paper.
II. PRELIMINARIES
Threshold logic (TL) gate is a weight dependent majority gate. It consists of n input variables. Each input variable x i is associated with a weight value w i . The output of TL gate is associated with the weight value referred to as threshold 0278-0070 c 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. w i x i of a particular pattern or minterm (fully specified product term) p l be denoted as W l . The CMTL gate representation for the two input AND gate with weight values as (w 1 , w 2 : w 0 ) = (2, 2 : 3) is shown in Fig. 1 [3] . The unit weight value corresponds to the minimum gate length of the process technology. The CMTL gate consists of the differential and the sensor part. All the transistors in the differential part are connected in parallel. The differential part is further divided into the threshold and the inputs part. The pMOS transistor in the threshold part is always active. The number of active pMOS in the inputs part depends on the applied input pattern. The sensor part has three pMOS transistors P 1 , P 2 , and P 3 , and four nMOS transistors N 1 , N 2 , N 3 , and N 4 . The nodes M 1 and M 2 connect the differential part and the sensor part.
The operation of the CMTL gate is divided into two phases, the equalization phase (CLK is high) and the evaluation phase (CLK is low). Due to the design architecture, both output nodes F and FB are at same initial value (≈30% of VDD) at the beginning of the equalization phase. Then a small voltage difference is developed across the output nodes which mainly depends on the differential part due to the applied pattern p l . In the evaluation phase, the sensor part boosts the initial voltage difference to a logic state at the output nodes. The node F rises from 30% VDD to VDD for logic one, and drops from 30% VDD to 0 V for logic zero. The node FB changes in the opposite direction with respect to the node F (see [3] for more details).
The propagation delay (d) of the CMTL gate for the applied pattern p l is modeled by [3] is Equation (2) also indicates that the pattern p l which minimizes |w 0 −W l | generates the maximum transition delay at the gate. The transition delay value of the CMTL gate decreases linearly with the increase in the |w 0 − W l |. If the w 0 is equal to the W l value of the applied pattern p l , then an unstable value is produced at the output. Hence, the weight assignment for the CMTL gate is chosen such that the W l value for any input pattern p l is not equal to its w 0 value. Therefore, the weight configuration of a CMTL gate always needs to satisfy W l = w 0 for any pattern p l .
The TF model is widely used for testing delay defects at a gate. It requires that a transition is generated at the output of each gate. There are two TFs associated with each gate: 1) slow-to-rise (STR) fault and 2) slow-to-fall (STF) fault. A TF should propagate to an observable point through any path [14] .
It is noted that CMTL gates are clocked. At the beginning of the clock cycle the output of CMTL gate is set to an initial value irrespective of the previous clock cycle value. Hence, the initialization vector of the TF model is not required for testing delay defects of the CMTL gate. For each STR fault at gate G, the test pattern is generated considering that the ATPG requires setting logic one at the selected gate and propagating its effect to an observable point. Likewise, a pattern for STF fault is generated.
From the above and (2), the minterms which evaluate the function to one (also called onset minterms), and, in addition, produce minimum weight summation are responsible for the maximum rising transition delay. The offset minterms (i.e., minterms that set the function to zero) which produce maximum weight summation are responsible for the maximum falling transition delay of the CMTL gate.
The following provides an overview of a TN implemented with CMTL gates. CMTL gates are clocked and circuits that are implemented using CMTL gates are pipelined TNs without feedback loops. Synchronization CMTL buffers are inserted at the pipeline stages so that each CMTL gate receives its inputs at the appropriate clock cycle. Methods as in [7] synthesize TNs so that the total number of TLGs is minimized. 2 shows a combinational circuit with six complex combinational components whose functionalities are listed explicitly. Each combinational component is a TLG, implemented by a CMTL gate. Fig. 3 shows a four stage pipelined TN of CMTL gates representing the circuit of Fig. 2 . The weight assignment for the inputs and the threshold at each CMTL gate is also shown. At each pipeline stage, CMTL buffers (represented by rectangle) ensure that each TLG receives its inputs at the appropriate clock cycle in a synchronized manner.
In this paper, functions are represented using binary decision diagrams (BDDs). An n input Boolean function f represented in a BDD is a directed acyclic graph where the Shannon decomposition is carried out in each node. Each vertex has two outgoing edges. A pointer to each vertex represents a distinct function. One outgoing edge simplifies the function by setting the variable to true and the other outgoing edge by setting it to false. A BDD does not contain vertices whose outgoing edges point to isomorphic subgraphs or to the same node. BDDs are canonical forms suitable for representing functions very compactly [15] .
III. ATPG BASED ON THE TRANSITION FAULT MODEL
The proposed ATPG method is called delay defect on TL gates (DTL). This is a function-based test generation approach that uses BDDs and the conditions listed in Section II to excite maximum STR or STF delay at the fault site and propagate the transition to an observable point. Each TLG has been assigned weight values and is stored in a BDD.
Procedure GROUP returns all the minterms of a TLG that excite maximum delay at each TLG inputs [16] . The set of minterms returned by GROUP is stored as a BDD function. Another procedure of DTL, called MAP, rewrites the function returned by GROUP so that the set of minterms of the embedded TLG is expressed in terms of the primary inputs of the TN.
Algorithm 1 is a nonenumerative BDD traversal method that identifies all the fully specified input assignments (minterms) of the given TLG function which produce the minimum active weight summation for STR fault or the maximum active weight summation for STF fault [16] .
The operation of GROUP is recursive. For the simplicity in the exposition, GROUP is explained when considering an STR fault. Similar procedures apply for an STF fault. Let f denote the function at a BDD node. Let the field M f be the set of all minterms of f which produce minimum active weight summation value W lf (m l ∈ M f ). For STF fault, M f will have the set of minterms with W lf denote the maximum active summation value. Let M f t denote the simplified function of outgoing edge with the variable set to true. Let M f e denote the simplified function of outgoing edge with the variable set to false from the M f . The BDD functions at the two outgoing edges have the minimum active weight sums denoted by W lf t and W lf e , respectively. An input variable which is not present in a BDD path is a do not care variable of the path. Algorithm GROUP intends to find the fully specified product terms. Hence, in GROUP, initially the M f = {x 1 x 1 ..x n }, W lf = ∞ for STR fault and W lf = −∞ for STF fault for each BDD node. The do not care variables on the path of each BDD node are considered and included in the M f depends upon their polarity. In order to find the set of input patterns which produce the minimum active weight summation value for STR fault, the W The conditions in lines 7 and 8 choose the W lf value based upon the fault, i.e., the minimum summation for an STR fault and the maximum summation for an STF fault [16] .
GROUP is illustrated for the STR fault on gate G 3 at the circuit in Fig. 3 with the help of Fig. 4 [16] . In this figure, the pointer to the root node (labeled I) represents the functionality of G 3 in terms of its five input variables x 1 , x 2 , x 3 , x 6 , and x 7 . BDD node I corresponds to variable x 1 , nodes E and H correspond to variable x 2 , nodes C, D, and G correspond to variable x 3 , nodes B and F correspond to variable x 6 , and node A corresponds to variable x 7 . The weight value of the input variables (x 1 , x 2 , x 3 , x 6 , and x 7 ) is (6, 4, 4, 2, and 2), and the threshold value w 0 is 11. This example shows how to find the onset minterms that produce the minimum active weight summation which excites the maximum rising transition delay at gate G 3 . Fig. 4 lists for each function f at a BDD node the M f and W lf values.
The BDD is traversed in reverse topological order [16] . In GROUP, initially M f = {00000} and W lf = ∞, for all BDD nodes. The node traversal starts at the top pointer node and traverses the then path until it reaches the constant 0 (false) or [16] .
Then the else path pointer nodes out of the top pointer node are visited. All the pointer nodes are visited and updated in a similar manner. In our example, the top pointer node I results in M I = {11010, 11001, 10110, 10101, 01111} and W lI = 12 [16] . The set of patterns responsible for maximum rising transition delay of gate G 3 with respect to the local gate inputs obtained by procedure GROUP is shown on the right-hand side of Table I. In the case of an STF fault, the same process is applied except that the maximum active weight summation is considered (line 8). The offset minterms of the given TLG function with highest active weight summation result in the maximum propagation delay for a falling transition. Hence, the complement of the given gate functionality G 3 is used by GROUP.
In order to represent the functionality of the embedded TLG in terms of primary inputs, the local input variables (embedded gate inputs) are replaced by primary input variables by function substitutions. In particular, each gate input is replaced by its function. This procedure is referred as MAP. In our example, the five minterms in terms of embedded gate inputs are mapped into seven patterns expressed in terms of primary inputs and shown on the left-hand side of Table I .
The input to algorithm DTL is a transition (STR or STF) fault F at gate G in a TN. The output of the DTL is a random test vector from a collection of patterns, called the vector set S. Set S consists of all patterns that excite the fault at gate G with maximum possible delay and propagate the transition to an observable point of the TN.
The overview of algorithm DTL is presented in Algorithm 2. First, procedure PROPAGATION (line 1) generates the set of patterns P which ensure that the latched error propagates to an observable point. Then procedure GROUP (line 5) generates the set of input patterns set_M i that produce maximum possible transition delay at the gate G. Subsequently, procedure MAP (line 6) transforms the test vectors in terms of primary inputs (D). Essentially, this procedure generates input patterns that brings each binary pattern in set_M i at the inputs of gate G. Then the intersection of D and P forms set S (line 7) out of which a test vector is selected.
Algorithm DTL is explained considering the STR fault on gate G 3 in the circuit in Fig. 3 . All patterns that propagate the latched error are kept as a function P and are computed by procedure PROPAGATION. Procedure PROPAGATION propagates the error to an observable point in the TN. It is implemented using a Boolean difference operation between two functions, the fault free function and the faulty function, respectively. For the STR fault at gate G 3 in Fig. 3 , the propagation vector set is obtained by propagating SA0 at G 3 . The obtained input patterns that excite and propagate the error are listed on the left-hand side of Table II. The objective of the vector set S is to sensitize the given transition (STR or STF) fault of the selected gate G with maximum possible delay and propagate the transition to an observable point. For an STR fault, for each iteration of procedure GROUP (line 5) produces all the minterms of the function f that excites the maximum delay. The resulting set_M i is substituted in terms of primary inputs by MAP (line 6). The final sensitization vector set S (intersection of D and P) is shown in the right-hand side of Table II. If D ∩ P results in an empty set, then it indicates that no pattern that excites the highest delay at the gate propagates to an observation point. In this case, algorithm DTL tries to excite the fault with patterns that excite the next possible highest delay at the gate. This is done by removing the set_M i from function f (line 9), and then the reduced function is used to find the next highest delay patterns group by GROUP. This is repeated continuously until a nonempty S is generated or function f becomes null (line 4). That way, any test vector in S guarantees that a transition (STR or STF) fault at the gate G is sensitized with the maximum possible delay and the potential error propagate to an observable point.
The systematic removal of the maximum delay patterns group set_M i (1 ≤ i ≤ n) maintains the unate property of the given TL function f . This is a necessary condition for a function to be a TL function. If set_M i (1 ≤ i ≤ n) is removed arbitrarily from the function f then f becomes binate [1] . Thus, algorithm GROUP always produces the group of patterns so that all patterns in the group exhibit the same maximum delay. Each pattern in the set set_M 1 excites the highest propagation delay at the fault site.
IV. ATPG TO COPE WITH WEIGHT DEVIATIONS
Let us assume that each weight w i (0 ≤ i ≤ n) during manufacturing has an absolute deviation of
This section presents algorithm EDTL, an enhancement of DTL, which generates a set of patterns for each fault such that one of them is guaranteed to excite the maximum delay under any weight deviations.
Let a be the set of onset minterms, and b be the set of offset minterms of the given TLG function. Let τ be the maximum allowable shift in the weight summation W l of any minterm p l due to the deviation in the manufactured weights which does not affect the TLG functionality. From [1] , we have that
where
Let us assume for simplicity that all weight deviations have the same value δ. Then the maximum value of δ for the given weight configuration of the TLG is [1] 
It is shown below that although the manufactured weight values may deviate from the designed value, the set of patterns that excite the maximum delay are the patterns that belong to set_M i that was generated by procedure GROUP when considering the designed weights. (This is the set of patterns expressed in terms of local gate inputs. See line 5 of Algorithm 2.) Patterns (minterms) belong into different groups when considering their designed weights. Consider the patterns of two separate groups set_M i and set_M j of a TLG so that i is less than j. Under weight deviations, the patterns of any such group may further partitioned into subgroups.
Consider the gate G 1 shown in Fig. 3 . For the given weight configuration (w 1 , w 2 , w 3 , w 4 : w 0 ) = (4, 4, 2, 2 : 5), we get τ = 1 and δ = 0.2. For the eleven onset minterms of the TLG function, we get four delay patterns groups. The set_M 1 Theorem 1: For any n input gate G of an implemented TN, no patterns in set_M i will excite lesser or equal delay at G than a pattern in set_M j , for any i < j.
Proof: Theorem 1 is shown considering only the onset minterms. Similar arguments hold for the offset minterms. For simplicity in explanation, assume that n is even.
From (4), we have that the maximum allowable shift in the weight summation for any minterm due to weight deviation is τ = (n + 1) * δ. Consider minterms p x and p y in any two groups set_M i and set_M j so that i is less than j.
Let g ij = |W x − W y | for any two minterms p x and p y in set_M i and set_M j . Due to the weight deviations during manufacturing of a TLG, the new g ij value will be in the range
Case 1: Assume that there is no variable that is active in both minterms p x and p y . Let n x and n y be the number of active variables in p x and p y , respectively. In order for the theorem to be violated, the weight summation of minterms in set_M i must increase, and those in set_M j must decrease. Therefore, W x is increased by n x * δ and W y is decreased by n y * δ. However, n x + n y is always less than or equal to n. This results in g ij being no less than g ij − (n * δ).
Case 2: Assume that there is n xy number of variables that are active in both minterms p x and p y . In order for the theorem to be violated, the weight summation of minterms in set_M i must increase and those in set_M j must decrease. In the worst case either all the common active variables will have a positive deviation or all will have a negative deviation. In the following, we prove the case assuming that they all have a positive deviation. Similar arguments hold when they all have a negative deviation. We have that W x is increased by n x * δ and W y is decreased by (n y − n xy ) * δ. However, n x + n y − n xy is always less than or equal to n. This results in g ij being no less than g ij − (n * δ).
From (5), we have that g ij − g ij is less than τ . If the weight configuration of a TLG satisfies that τ is not greater than min{g ij } then none of the minterms in the group set_M i will have less delay than the any minterm in the group set_M j for any weight deviated configurations, when i is less than j. This proves Theorem 1.
Not all the test vectors of S generated by DTL using designed weights may excite the maximum delay for some deviated weight configurations. However, at least one will do. In the above example, the highest delay group (set_M 1 ) of the designed weights is subdivided into two subgroups (set_M 1.1 and set_M 1.2 ) for one of the deviated weight configurations. Only the patterns in the set_M 1.1 excite highest delay for this deviated weight configuration. Hence, all patterns of S must be applied instead of only one. However, only the vectors in S that bring different input assignment at a gate G must be applied. Using the above, DTL is enhanced into algorithm enhanced DTL (EDTL).
The overview of the EDTL is presented in Algorithm 3. First, GROUP (line 2) generates the set of input patterns set_M i , where i is the minimum value in algorithm DTL that produces a pattern which excites maximum possible transition delay at gate G. The patterns that excite and propagate the latched error to an observable point are generated by PROPAGATION (line 3). Then procedure MAP (line 5) constructs one or more input patterns D j for each pattern p j in set_M i at the inputs of the gate G. A pattern S j which bring the distinct maximum delay pattern p j at the gate G is formed by selecting one of the patterns in the intersection of D j and P (line 7). That way, a collection of patterns S j (j ≥ 1) is formed, where each S j will bring distinct p j of set_M i at the gate G which sensitizes maximum possible delay and propagate to an observable point in the TN. This collection of patterns S j (j ≥ 1) is denoted by S.
The vector set generation by EDTL is illustrated for the STR fault on gate G 3 in the circuit in Fig. 3 . In this example, set_M i = set_M 1 . First, the set of vectors set_M i and P is 
determined for the given fault. The patterns in P are listed in the right-hand side of Table II. The set of patterns in set_M 1 is shown in column 1 of Table III . There are five input patterns in set_M 1 which excite the maximum delay for rising transition at gate G 3 . Patterns D j and D j ∩ P for each minterm p j in set_M 1 are shown in columns 2 and 3 of Table III . There are no input patterns that justify which bring the patterns "10101" or "11001" of set_M 1 at gate G 3 .
Each S j , 1 ≤ j ≤ 3 consists of one test vector from each D j ∩ P, 1 ≤ j ≤ 3 which brings distinct input assignments that excite the maximum delay at gate G 3 for STR fault under any weight deviations. Patterns S j are listed in the fourth column of Table III. Observe that the number of patterns generated by EDTL to detect TF under weight deviations is reduced to three.
V. TEST SET COMPACTION
This section presents a compact ATPG which we call compact EDTL (CEDTL). For each STR or STF fault at gate G, algorithm EDTL generates several test functions D j ∩P (step 6 of Algorithm 3) and then selects a test pattern S j (step 8 of Algorithm 3). A compact test set is obtained by manipulating the test functions of several gates.
Algorithm CEDTL is presented in Algorithm 4. Clusters of gates are formed by traversing the TN in reserve topological order. The size of each cluster is limited to a predetermined constant value c. The clustering phase helps the scalability of algorithm CEDTL. This procedure is called CLUSTERING.
Consider an STR fault for gate G at some cluster C, and an immediate predecessor gate G at C which is connected to G with an input that has a positive weight. Then CEDTL will generate a compact test set by considering the test functions for an STR fault at G . If the weight of that input is negative then CEDTL will compact by considering the test functions for an STF fault at G . This is due to the unate timing property of TLGs [1] . That way, two sets of test functions are formed for each cluster C: 1) set C 1 consists of all functions that are compatible with the test functions for an STR fault at the output gate of cluster C and 2) set C 2 consists of all functions compatible to an STF fault at the output gate of the cluster. This procedure is called COMPATIBLE.
Then a greedy algorithm is applied to the functions in C i , 1 ≤ i ≤ 2. Any two test functions in C i are covered by a single function as long as their intersection is nonempty. The two functions must target faults at different gates in set C i since different test functions for the same gate contain disjoint minterms. For any nonempty function intersection, the two functions are substituted by their intersection, they are not considered any further. This process is repeated for the test functions until only empty intersections are encountered among the test functions in each C i . This greedy algorithm is called COMPACT.
Procedure COMPACT is illustrated with the help of Tables IV and V. Table IV considers a cluster C containing gate G 3 of the TN of Fig. 3 and its two immediate predecessor gates G 2 and G 1 . We consider an STR fault at G 3 , i.e., procedure COMPACT operates on the set of functions C 1 . Since the input weights of G 3 are positive, the test functions for STR fault at G 2 and G 1 are considered by COMPACT (Line 5).
The second column of Table IV contains the three test functions for gate G 3 . According to the notation used in algorithm EDTL, they are labeled as S 1 , S 2 , and S 3 . The third column of Table IV lists the two test functions S 1 and S 2 for gate G 2 . Finally, the fourth column lists the two test functions generated by EDTL for gate G 1 . Clearly, EDTL will return seven patterns for this cluster. For simplicity in the notation, let the test function S i for the gate G j be denoted as S i (G j ).
In this example, algorithm COMPACT first considers S 1 (G 3 ), and tries to determine whether there is an nonempty intersection among the test functions S i , 1 ≤ i ≤ 2, of the predecessor gate G 2 . These functions are examined in increasing order. Therefore, it first examines whether S 1 (G 3 ) intersects with S 1 (G 2 ), and this turns out to be an nonempty test function. At this point, S 1 (G 3 ) and S 1 (G 2 ) are covered by the intersection of sets S 1 (G 3 ) and S 1 (G 2 ), and are not considered any further. The test set for the gates in the cluster is already reduced by one pattern.
Next, the test function resulting from the intersection of sets S 1 (G 3 ) and S 1 (G 2 ) is considered for possible intersections with the two test functions S 1 and S 2 of gate G 1 in column 4. TABLE V  CEDTL FOR THE EXAMPLE IN TABLE IV They are considered in increasing order. The first intersection is empty but the second intersection turns out to be nonempty. Let T 1 be the set resulting from the intersection of S 1 (G 3 ), S 1 (G 2 ), and S 2 (G 1 ). Therefore, the sets S 1 (G 3 ), S 1 (G 2 ), and S 2 (G 1 ) are not considered any further. The test set for the cluster is reduced by another pattern. Now the algorithm backtracks to the test functions in the second column of Table IV , and considers S 2 (G 3 ). It does not examine if it will intersect with set T 1 since the latter function covers S 1 (G 3 ) and the result is known to be negative. Thus, it examines whether S 2 (G 3 ) intersects with S 2 (G 2 ). This results in an nonempty set T 2 which covers S 2 (G 3 ) and S 2 (G 2 ). The test set is reduced by another pattern at this point.
Next, the algorithm finds whether T 2 intersects with the only possible test function S 1 (G 1 ). Set S 1 (G 1 ) is disjoint with T 2 and their intersection results in empty set. Now the algorithm backtracks and considers the test function S 3 (G 3 ) in the second column. The algorithm finds whether it intersects with S 1 (G 1 ) which is the only remaining uncovered test function. However, these two test functions [S 3 (G 3 ) and S 1 (G 1 )] are disjoint, and the algorithm terminates.
In this example, CEDTL reduces the test set from seven pattern to four patterns. The right-hand side of Table V lists the four test functions, and the left-hand side list the test patterns, one pattern per test function.
VI. EXPERIMENTAL RESULTS
Experiments are presented which show that some pattern excite significantly more delay than the others and they must be selected when testing for delay defects. Then a software tool was employed to synthesize pipelined TNs for each ISCAS 85 and ITC 99 benchmark. Finally, the section demonstrates the efficiency of the three ATPG tools presented in this paper: the basic ATPG tool called DTL, the enhanced ATPG called EDTL that accommodates weight variation, and the compact ATPG method called CEDTL. All software tools were implemented in the C++ language. The experiments were conducted on an Sun-Blade 100 workstation with a 1 GB RAM on the ISCAS 85 and ITC 99 benchmark circuits [17] , [18] . As described in Sections II-V, the proposed method uses BDDs for each embedded TLG in order to identify the maximum delay pattern set for the gate, and a separate BDD for the whole circuit in order to generate the test patterns. Therefore, we did not experiment with the multiplier circuit c6288 since traditional BDDs cannot easily handle such circuit functionalities. For such circuits, biconditional BDDs have been introduced in [19] , and the proposed method can generate patterns by operating on this data structure. However, such implementation details are beyond the scope of this paper.
First, an experimental study was conducted to determine the impact of the different input patterns on the delays that they excite at CMTL gates. There exist 17 representative TL functions which represent all the TL functions of exactly four inputs [1] . All these functions were considered. The rising transition delay value for each onset minterm of each function was calculated using (2) . Many minterms of a function exhibited the same delay. For each function, they were categorized into different groups set_M i , and all the minterms in the same group set_M i had the same delay. For each function, up to five groups were formed, i.e., set_M 1 to set_M 5 with the patterns in set_M 1 exhibiting the highest delay. Fig. 5 shows the percentage reduction in the rising transition delay value for set_M i , i > 2, in comparison with the rising transition delay value for set_M 1 . The reduction is listed for each possible four input TL function. Note that only 16 out of 17 functions are listed because when analyzing the results of Fig. 5 we observed that for one of them only set_M 1 can be formed. The reduction in the delay value for set_M 2 is at most 43% when compared with the delay value of set_M 1 , and 34% on average. Similarly, the reduction in the delay value for set_M 3 is at most 54% when compared with the delay value of set_M 1 , and 46% on average. Additional experiments have indicated similar behavior for falling transition delay values. These results show that it is important to test each TF with patterns in set_M 1 . Patterns for set_M 2 should only be applied if none of the patterns in set_M 1 propagates to an output, and so on.
Subsequently, a synthesis tool was developed, so that each benchmark is synthesized into a pipelined TN. The gates of the input combinational circuit were mapped into CMTL gates. It is designed using a modification of [7] , since [7] is based on the assumable property of [1] and does not assign weights. Thus, the tool of [7] was modified in order to assign weights of each TLG. The TL function identification method proposed in [20] was employed to check whether a particular Boolean gate cluster can be implemented as a TLG. The fan-in bound of a TLG was set to eight. 6 presents the total number of transition (both STR and STF) faults in the pipelined TN for each benchmark circuit considered for the proposed ATPG methods. The TFs of the synchronized buffers are not examined by the ATPG because the set of patterns which detects the TFs at the CMTL gate output will also detect the TFs of the synchronized CMTL buffer connected to it. Hence, we consider only the transition (both STR and STF) faults at the output stem of each CMTL gate. Fig. 6 also presents the total number of TFs in the original Boolean circuits for each benchmark circuit. The reduction in total number of TFs in the TN is 39% in the best case when compared with the original Boolean circuit, and 28% on average.
The remainder of the section focuses on the efficiency of the presented ATPG algorithms. Fig. 7 shows the fault coverage by DTL for each benchmark circuit. It shows the fault coverage is at least 96.9% and that observed in circuit c3540. The fault coverage was 99% on average among all benchmarks. The best fault coverage was 100%. Fig. 7 also shows the fault coverage by considering only the patterns in set_M 1 . It was at least 95% and that observed in circuit c3540. On average, it was 97% among all benchmarks. The best observed fault coverage was 99%. It took approximately 13 s to handle all the TFs in the benchmark c1908. It was also observed that the maximum execution time never exceeded 74 s for any of the benchmark circuit. Thus, DTL is a very scalable ATPG method. Fig. 9 shows the time performance of algorithm EDTL that considers weight variations. EDTL took approximately 16 s to handle all the TFs in benchmark c1908. It was also observed that the maximum execution time never exceeded 92 s for any of the benchmark circuit. This is also a very scalable ATPG. Note here that the fault coverage of EDTL is same as of DTL since it substitutes a test pattern for DTL by a test set in order to ensure that the fault is detected under any weight variation.
Finally, the experiments focused on the efficiency (test set reduction) and the time performance of algorithm CEDTL. Let |S| be the total number of patterns needed to detect all TFs by EDTL, and |C| be the total number of patterns needed to detect all TFs by CEDTL. Then the percentage reduction in test set size by CEDTL is calculated as by ((|S| − |C|)/|S|) * 100. Fig. 10 shows the average percent reduction on the number of test vectors that were needed to detect all the TFs under any weight deviation in each benchmark circuit. The cluster size bound was set to four in all benchmark circuits. For circuit b11, the reduction was at least 40%. On average among all benchmark circuits, the reduction was 52%. In the best case, it was 62%. Fig. 11 shows the time performance of CEDTL (excluding the time taken by EDTL). It took approximately 4 s to handle all the TFs in the benchmark c1908. It was also observed that the maximum execution time never exceeded 22 s for any of the benchmark circuit. This is also a very scalable ATPG.
VII. CONCLUSION
In this paper, we presented ATPG tools for CMTL gate circuits which are designed using CMOS technology. They use the TF model that can handle small delay defects due to the pipeline nature of the designs. Since different patterns excite different delays at the fault site, ATPG tools focus on generating patterns that excite the maximum possible delay for each fault. Three ATPG tools have been presented. The basic ATPG tool is very scalable and ensures very high fault coverage. A second ATPG tool was developed to handle instances where the manufactured weights differ from designed weights due to process variations. A compact ATPG has also been presented that reduce the test size for all benchmark circuits by approximately 52%.
