Abstract-In this paper, a new approach for generating test vectors that detects faults in combinational circuits is introduced. The approach is based on automatically designing a circuit which implements the -algorithm, an automatic test pattern generation (ATPG) algorithm, specialized for the combinational circuit. Our approach exploits fine-grain parallelism by performing the following in three clock cycles: direct backward/forward implications, conflict checking, selecting next gate to propagate fault or to justify a line, decisions on gate inputs, and loading the state of the circuit after backup. In this paper, we show the feasibility of this approach in terms of hardware cost and speed and how it compares with software-based techniques.
I. INTRODUCTION
A TPG is the process of either finding input vectors that detect a fault in digital circuits by distinguishing the faulty and fault-free circuit behavior at primary outputs (PO) or flagging a fault redundant when no such vector exists. This process requires a large amount of CPU time and in many cases they abort many of the hard-to-detect faults. It is known that the ATPG is NP-complete even for combinational circuits [13] .
Most existing deterministic ATPG techniques employ a branch-and-bound [1] technique to examine all input combinations. The -algorithm, in [12] , examines all input combinations by making decisions at internal circuit nodes as well as primary inputs and alternates between fault propagation and line justification processes until some faulty values appear at the primary circuit output (the fault is detected) or the search space is exhausted. In this latter case, the fault is flagged as being redundant. The PODEM, in [7] , examines all input combinations by making decisions only on primary inputs (PI). This way the number of nodes appearing in the search tree is reduced. To achieve this, all decisions on internal lines are traced back to the PI. The FAN algorithm, in [9] , presents the following improvements to the basic PODEM algorithm: tracing of objectives stops at some internal lines (head-lines) in addition to PIs and multiple objectives are back-traced instead of a single objective back-tracing as used in PODEM. Further improvements are made to FAN to have a better performance by finding mandatory assignments based on dominators and by finding nodes where values can be assigned independent of other nodes [10] . SOCRATES utilizes a unique sensitization technique based on dominators and implication learning to speed the justification process [11] . Recursive learning that avoids the use of decision tree is proposed in [14] . Other improvements to speed the ATPG process are found in [16] . Emulation systems are being used increasingly in the design, verification, and in rapid prototyping of digital systems [3] . To increase the use of these emulation systems, several methods are proposed to emulate Computer-Aided-Design (CAD) algorithms such as fault simulation [4] , [5] , automatic test pattern generation (ATPG) [1] , [17] , satisfiability (SAT) [1] , [6] , [18] , and fault diagnosis [8] , [15] . In [4] , a method is proposed to emulate serial fault simulation. In [5] , a method is proposed to emulate critical path-tracing algorithm. In [1] , a method is proposed to emulate PODEM algorithm with its application to SAT. In all of those algorithms, a significant speed-up was obtained over software-based implementation.
In this paper, we present a new method to emulate the -algorithm on a reconfigurable hardware. The method achieves significant speed-up over software-based ATPG techniques with similar or better results. The quality of the results is measured in terms of fault coverage. This is achieved by utilizing reconfigurable hardware that provides a way to exploit the fine-grain parallelism in the -algorithm. This paper is organized as follows: In Section II, the concurrent -algorithm is presented. In Section III, the overall architecture of the implementation is given. In Section IV, we present results. Finally, we present our conclusion and future work in Section V.
II. CONCURRENT -ALGORITHM
The -algorithm presented in [12] involves several steps. Concurrency can be utilized in almost all these steps if the algorithm runs on reconfigurable hardware. The major steps and how concurrency can be achieved in them are listed as follows.
1) Computation of forward implications:
It is possible to compute all forward implications concurrently on reconfigurable hardware.
2) Computation of backward implications: Backward direct
implications can be computed concurrently using a backward model of the circuit. 3) Conflict checking: The lines in the forward and backward models can be augmented with conflict check modules that enable concurrent conflict checking. 
4) Maintenance of -and -frontier sets:
The gates in the forward model of the circuit can be augmented with frontier status indicator logic. Thus, all gates' frontier status signals are computed concurrently. 5) Computing decisions: After selecting a frontier from one of the sets, the decisions on its fan-ins can be computed and assigned concurrently. 6) Restoring the circuit state after backtrack: After backup, it is necessary to restore the last values of the lines in the circuit before the decisions caused backup. The last values of the lines are computed from the decisions made on some gates. The decisions can be stored in memory elements, and after backup they can be used to concurrently restore the line values. Besides all these major steps, there are many minor steps, where concurrency can be utilized, such as checking if the error propagated to any primary outputs, or if all primary outputs are binary etc. Considering all these cases, a concurrent version of the -algorithm can be designed.
The concurrent -algorithm is shown in Fig. 1 . The algorithm generates a test for a fault or declares it redundant. The algorithm associates with each gate , a list of fan-in states stored in and four signals , , and indicating that a gate is on the -frontier, -frontier has a direct implication, and the logic value of the direct implication, respectively. For buffers and inverters, only and are generated. In addition, for each line l in the circuit we associate a state .
The algorithm starts by activating a fault and initializing from the gate fan-in states stored in (initially, all fan-in states are set to undefined and they may be changed during the execution of the algorithm). The algorithm proceeds to perform forward implications and during this step sets the indicators , , and . The state of a line l that is connected to the fan-in of a gate with set is updated with a . The conflict signal is set if the implied value in is different from the binary value stored in . Conflict signal is also set during the forward implication where the implied value is different from the binary value stored in . The forward implication and the backward direct implication processes are repeated until a conflict occurs or no more direct implication exists. In the case of a conflict the program transfers control to a backtrack procedure which reverses the previous decision. In the other case, the procedure tries to justify nodes in the -frontier when error is at a primary output (PO) or propagates an error in the -frontier otherwise. If all nodes in the -frontier are justified then a test is found. In the case where justification fails, the control is transferred to backtrack. During the justification, each gate is pushed on the stack and the state of its fan-in are either initialized if the is pushed for the first time (indicated by the increment flag) or the next appropriate fan-in state is computed . These steps are also performed in one clock cycle. Similar steps are performed during the propagation process which is activated when no error is at any POs. Next, the algorithm loads the changes on the fan-in nodes of gate , due to a decision, into and resets the increment and BackUpSet flags. These steps are repeated until a test is found or a fault is redundant.
Note that when selecting a gate from the -frontier/ -frontier the decision on the fan-in gate which was last pushed onto the stack is examined and if more decisions exist, then gate is selected again. Otherwise a new gate is selected. It should be mentioned that in the implementation the same logic is activated depending on ErrorAtPO signal for selection of a gate from -frontier or -frontier sets.
The way that the decision tree is used in concurrent versions of the -algorithm is explained on Fig. 2 . The already tried gates are marked with dotted circles while the untried ones are marked with solid circles. Upon activation of a fault, -frontier set includes gate 3 and gate 7. First, gate 3 is selected to propagate fault and is pushed onto stack. After making suitable decisions on the inputs of gate 3, the updated -frontier set includes gate 2, gate 7, and gate 8. The search continues in a depth-first manner; therefore gate 2 is selected to make decisions on its input to propagate the fault. At the same time gate 2 is pushed onto the stack. The circuit state is determined by the decisions made on gate 3 and 2. We assume now that the fault is propagated to at least one primary output and gates 6, 10, and 13 are the -frontiers. Gate 6 with only one possible decision is selected for justification and it is pushed onto the stack at the same time. In case justification of gate 6 fails, a backup occurs to gate 2. At the same time, gate 6 is stored as the last processed frontier. Recomputing the circuit state from the decisions made on gate 3 and gate 2 regenerates gates 6, 10, and 13 as the -frontiers. Since the frontiers are processed in a prioritized order, and gate 6 is the last processed frontier, the first frontier after gate 6, which is gate 10, is selected as the current frontier to make a decision. In case there is another possible decision on gate 6, during backup the last processed frontier is set to 6-1, viz. 5 which guaranties that gate 6 is reselected to make the next decision. During backups, the BackUpSet flag is always set. This flag is necessary to understand whether all -frontiers at some level (e.g., gates 6, 10, and 13 at level 3) are justified successfully or not. For example, in case gate 13 fails, a backup up occurs to gate 2 at level 2. The circuit state is recomputed from the decisions made on gate 2 and 3. But this time there is no -frontier since the next frontier after gate 13 is null. In case the BackUpSet is set, a backup to gate 3 occurs. Otherwise, the flag indicates that all -frontiers are justified. In that figure, 8.1 and 8.2 indicate the first and next decisions on gate 8, respectively. This search continues the same way until a solution is found or all search space is exhausted.
The circuit depicted in Fig. 3 is used to illustrate execution of the concurrent -algorithm for stuck-at-1 fault at line 8 in this circuit. Also, it is assumed that the gates in the circuit are prioritized in the increasing order (i.e., gate has priority over gate ). The stored line values , the composite line values on the circuit, the selected frontiers (i.e., gates), and the decisions on the fan-ins of the selected frontiers are shown in Tables I-IV, respectively, in time line.
In Table I , each cell corresponds to the stored value of line at time . It is assumed that the value of the faulty line, viz. line 8, is set to 1 at (to inject the 8 stuck-at-0 fault). As seen in Table II , this setting causes backward direct implications on the fan-ins of gate 2 (i.e., is set) and forward implications at lines 10, 11, and 12 in the same clock cycle, viz. at . At , these Table III ). At the same clock cycle, the decisions also are computed. As a result of decision, Line 9 is set to 0. In the next clock cycle, the decisions on the fan-ins of gate 5 are stored in Fanin state (see Line 9 at in Table IV) . At , the Fanin state value is copied to the corresponding line states (see Line 9 value at in Table I ). As a result of storing 0 in line 9 in Table I , is propagated to line 14 at . At , the line states are updated with the implications caused by the decision on line 9. Since there is no direct backward implication at , gate 6 is selected to propagate the fault (see frontier at in Table III ) and its fan-in 13 is set to 1 in Table IV at . At , decision on 13, a 1, is copied to the corresponding line state in Table I . A 1 value at line 13 causes propagation of the fault to the primary output and a backward direct implication on line 7 that is immediately observed in Table II at . At , the line states are updated with the new implications (i.e., 1 is stored in the state of line 7). Setting line 7 to 1 causes direct implications on line 1 and line 2 in the same clock cycle that is shown in Table II . At , the new implications are stored in their corresponding line states in Table I , viz. in line 1 and line 2. Since there is no more backward direct implication, at , a -frontier, namely 3, is selected for justification (see Table III ). In next clock cycle, first possible decisions made on its fan-ins are stored in their corresponding Fanin states in Table IV ( i.e., line 5 is set to 0, and line 6 remains unknown). At
, the values in Fanin state of gate 3 are copied to the corresponding line states in Table I . As a result, the state of line 5 in Table I is updated with 0. Since there is no backward direct implication, no -frontier to be justified, and error is the output, the algorithm finishes its search for a vector for stuck-at-1 fault at line 8. The vector is the value stored in the primary inputs' line states. The vector is for primary inputs .
III. AN ARCHITECTURE AND IMPLEMENTATION OF CONCURRENT -ALGORITHM
The overall data-path architecture is shown in Fig. 4 . It consists of a Fault Activator, 1 Line State, Forward Network, Backward Network, Signals Computation, Frontier Selection, Stack, Decision Block, and Decision Storage blocks. The circuit starts by activating a fault that is performed in the fault activator. The forward network in the data-path computes the effect of the changes in on the faulty forward (FFN) and fault-free (GFN) forward circuit. In fact, the fault-free forward network is the fault-free gate-level structural model of the circuit under test. Similarly, the faulty forward network is the faulty gate-level structural model of the circuit under test. The forward network sets the conflict signal in the case where implied values conflict with previous decision values. The signal block computes for each gate its , , DIMP, and according to (3)-(8) (explained later). The backward network propagates directly implied values one level backward in the faulty (FBN) and fault-free (GBN). These values are stored in corresponding . These steps are repeated until either the FCONFLICT/BCON-FLICT is set or none of DIMP is set. The Frontier Selector Block consists of a priority encoder that selects either a DF gate if error is not at the PO or a JF gate otherwise. The gate identification that was processed previously is stored in LastProcessedGate. This is used to be passed to the stack if more decision can be 1 All equations use the convention described in Table V.   TABLE V  NOMENCLATURES made on this gate otherwise it is used to exclude gates with lower priorities from the selection process because the search space associated with these gates has already been explored. The selected gate is pushed onto the STACK and the associated decision on its fan-ins are computed in the Decision Maker block, and the decisions are stored in corresponding . In the case of a conflict, error is not at the output and all DF are not set, all justification associated with set JF gates fail, and the gate on the top of the stack is popped and stored in LastProcessedGate. All these steps are repeated until a test is found or no more decision can be made which signals a redundant fault.
1) Fault Activator:
The fault activator (FACT) is similar to the circuit in [4] and consists of a shift register where each flip-flop corresponds to a stuck-at fault on a line in the circuit under test. Fig. 5 shows an illustrative fault activator for a fictitious fault set where represents a stuck-at-fault on line . A logic 1 in a flip-flop in the fault activator activates the corresponding fault. Initially, HEAD is what indicates that none of the faults is active. After activating all faults one by one (i.e., shifting the register to the right one by one), HEAD is set back to 1 again which indicates that all faults are activated. For faults, the fault activator uses flip-flops.
After activation of a fault , the fault activator sets in the Line State block to . For example, upon activation of the stuck-at-0 fault on line is set to 1. In the next subsection, the Line States are explained in detail.
2) Line States: The array of line states stores computed or decided good values of the lines in a circuit. The faulty line Table Coding, ], an -line circuit requires flip-flops. Because only one branch of a stem may be faulty, all branches are counted as different lines to enable decisions on the branches individually. Thus, the number includes the number of branches in the circuit as well.
These line states may be updated from the backward network, the decision block , or from the fault activator depending on its structural position in the circuit and existence of stuck-at faults on the lines. Therefore, different logic configurations are used to update line for different cases which are explained below.
1)
: If the line is fault-free and also is an input of a gate with more than 1 fan-in, then the line state can be updated from the backward network or from the decision block only. Such lines are the fault-free lines that are inputs of gates. When the load decision signal is set, receives its value from the decision made on that line in block.
stores all the decisions made on the fan-ins of the gate .
2)
: If the line has only stuck-at 0 fault and also is not an input of a gate with more than 1 fan-in, then the line state can be updated from the backward network or from the fault activator only. Such lines are the stems, primary outputs, and inputs of the buffers and inverters with stuck-at 0 faults. When the fault activator activates the stuck-at 0 signal on line is set to 1. Otherwise, is updated from the corresponding line in the backward network. 3)
: If the line has only stuck-at 1 fault and also is not an input of a gate with more than 1 fan-in, then the line state can be updated from the backward network or from the fault activator only. When the fault-activator activates the stuck-at 1 on line signal, the state of line is set to 0. Otherwise, is updated from the backward network. Such lines are the stems, primary outputs, and inputs of the buffers and inverters with stuck-at 1 faults.
4)
: If the line has both stuck-at 1 and stuck-at 0 fault and also is not an input of a gate with more than 1 fan-in, then the line state can be updated either from the backward network or from the fault activator only. Such lines are the stems, primary outputs, and inputs of the buffers and inverters with stuck-at 1 faults. When the fault activator activates which implies , the line state is set to 1 (0). In case none of the faults are active, is updated from the backward network. 5)
: If the line has stuck-at 0 and also is an input of a gate with more than 1 fan-in, then the line state can be updated from the backward network, from the decision block or from the fault activator. Such lines are the lines with stuck-at 0 fault that are inputs of gates. In case is activated, is set to 1independent of the signal. When is not set and the load decision signal is set, receives its value from the decision made on that line in block. Otherwise, is updated from the backward network. 6)
: If the line has stuck-at 1 and also is an input of a gate with more than 1 fan-in, then the line state can be updated from the backward network, from the decision block, or from the fault activator. Such lines are the lines with stuck-at 1 fault that are inputs of gates. Upon activation of is set to 0 independent of the signal. When is not set and the load decision signal is set, receives its value from the decision made on that line in block. Otherwise, is updated from the backward network.
7)
: If the line has both stuck-at 1 and stuck-at 0 faults, and also is an input of a gate with more than 1 fan-in, then the line state can be updated from the backward network, from the decision block or from the fault activator. Such lines are the lines with stuck-at 1 and stuck-at 0 faults that are inputs of gates. In case is activated by the fault activator, is set to 0 (1) independent of the signal. When and are not set and the load decision signal is set, receives its value from the decision made on that line in block. Otherwise, is updated from the backward network. Fig. 7 shows the logic block for this case. After making a decision on a gate , the decisions on the inputs of this gate are copied into their corresponding line states. To do this, the signal is asserted, which consequently sets the signal where line is an input of the gate . The other case where the states of lines are updated from the decision blocks is after the backup. In that case all of the line states are updated with the corresponding decisions in the decision blocks by setting signal. 3) Forward Network: The forward network in the data-path computes the effect of the changes in on the faulty forward (FFN) and fault-free (GFN) forward circuit. In fact, the fault-free forward network is a gate-level structural fault-free model of the circuit under test. Similarly, the faulty forward network is a gate-level structural faulty model of the circuit under test. So, the forward networks consist of the circuit under test where the gates are interconnected with lines. Moreover, each forward network has the capacity of computing in three-valued logic (i.e., 0, 1, and ).
A relaxed-OR operator is defined in Table VI . This operator is used in this section as well as in the subsequent sections.
To emulate three-valued logic, three-valued logic is encoded in 2 bits. With this encoding, a line in the circuit under test is represented with two lines and a gate in circuit under test is modeled in terms of two gates in each forward network. Together FFN and GFN emulate nine-valued logic. Lines in FFN and GFN are classified as good (i.e., no stuck-at fault on the line) and faulty (i.e., stuck-at fault(s) on the line), and modeled accordingly as in Fig. 8(a)-( line has only stuck-at-0 fault line has only stuck-at-1 fault line has both stuck-at-0 and -1 faults
• Final value and conflict computation for a line in GFN 
The line models presented here are interconnected to the gate models to construct the FFN and GFN networks. Fig. 9(a) shows a two-input AND gate and its corresponding models in good forward network and faulty forward network in Fig. 9(b) and (c), respectively. Also, although the original gate in Fig. 9 (a) computes in binary, the corresponding gates in Fig. 9(b) and (c) compute in three-valued logic. The outputs of the gates are the forward implied values of their output lines. Fig. 10 shows the actual interconnections for the same gate under the assumption that line is fault free, and lines and are faulty.
The forward final line values computed by the forward networks and the direct implication values computed Signal block, explained in the next section, are passed to the backward networks to propagate these values one level backward. In the next section, the backward network is explained. The signal block computes for each gate in the circuit under test signal to indicate that the gate is a -frontier, to indicate that the gate is a -frontier, to indicate that there is direct implication on the gate, and for direct implication value. Those are computed for all gates with more than one fan-in. DIMP and are computed for all gates. Equations (3)- (8) compute , and from the fan-ins and fan-out value of a gate in five-valued logic ( and represent 1/0 and 0/1, respectively). The gate type is encoded into these equations by the inclusion of its controlling value . Controlling value of an AND/NAND gate is 0 while controlling value of an OR/NOR is 1. All these signals for a gate can be computed from its fan-ins' values, and its computed and stored output value. Fig. 11 shows the other parameters used in these equations for a gate with fan-ins. The final values of a gate fan-in in the GFN and FFN are composed and then mapped into a five-valued logic to represent the lines (i.e., ). Similarly, is the composite forward implied value of fan-out of gate in five-valued logic (i.e., ). Definition 1: A gate is a -frontier if its output's state value is known, the implied value or computed value for the same line is unknown, and at least two inputs of the gate have unknown values. signal for gate is computed using the following:
4) Computation of Frontier
Definition 2: A gate is a -frontier if its output's state value is unknown, the implied value or computed value for the same line is unknown, and at least one of the inputs of the gate has either or value. signal for gate is computed using the following: Like line models in the forward network, the lines in FBN and GBN are classified as good and faulty, and they are modeled accordingly as in Fig. 12(a)-(c) . In this figure, is the backward direct implication value for line and is computed using (9) . According to this equation, the backward direct implication value for a line is equal to the direct implication value of gate if the line is a fan-in of gate and the forward final value of the same line is unknown. Otherwise, is unknown for all other It is possible to encounter conflicts during backward implication propagation. Conflicts occur due to information loss in computation of forward composite final values. For example, if is as a result of 0/ composition and the backward direct implication value is 1, then this backward direct implication causes a conflict in the good backward network. Similarly, if is as a result of /1 composition, and the backward direct implication value is 0, then this backward direct implication causes a conflict in the faulty backward network. Therefore, conflict signals are also computed for every line in the backward network using (10b), (10d), and (10f). Computations in the backward networks use the convention given in Table V. is a fan-in of gate (9) • Final value and conflict computation for a line in GBN (10a) (10b)
• Final value and conflict computation for a good line in FBN (10c) (10d)
• Final value and conflict computation for a faulty line in FBN (10e)
Computing backward final value and conflict signal at a stem line in the backward network requires special treatment. The backward final value of a stem depends on the backward final values of its branches in the backward network and the forward final value of the stem in the forward network. In this implementation, a stem is not a fan-in of any gate; therefore, there The conflict signal at the same stem in GBN is computed using (11c). Similarly, the backward final value and the conflict signal of a good/faulty stem in FBN is computed using (11d)-(11i).
• Final value and conflict computation at a stem with branches in GBN (11a) (11b) (11c)
• Final value and conflict computation at an -branch good stem in GBN (11d) (11e) (11f)
• Final value and conflict computation at an -branch faulty stem in GBN
The intermediate value propagated from the stem's branches along with the intermediate conflict signals can be computed using line models presented before. Fig. 13 shows an implementation of intermediate value and intermediate conflict signal computation for a stem with three branches. In this figure, the shaded area (nonshaded area) refers to the modules at a stem in the faulty (good) backward network. The backward good and faulty final values of lines are composed, and -encoding in Table VIII is applied to the composition to obtain , which is stored in the line state . 6) Frontier Selection: The Frontier selector module prioritizes the frontiers globally and selects one frontier either among -frontiers or among -frontiers. In fact, two levels of prioritization are utilized. In the first level, priority is given to propagation over justification if the error is not propagated to any primary output. Otherwise, priority is given to justification. In the second level, the -frontiers ( -frontiers) are prioritized among themselves. In the present implementation, has priority over given that . The priority ordering among -frontiers ( -frontiers) is . To assign the indexes (0 through ) to the gates in the circuit under test, the gates are levelized, and then assignment is made starting from the first level (primary inputs level). However, different assignment methods such as random might be applied as well.
The detailed implementation of such selector block is depicted in Fig. 15 . This block interacts with the Last Processed Frontier block through the PASS signals depicted in Fig. 16 . The Last Processed Frontier block stores the last processed frontier during backup to enable the selection of the next active frontier by blocking all higher priority frontiers including itself. In other times, during forward search, this block is disabled to set all PASS signals. In Fig. 14 , in case ERROR@PO is set, JF signals are passed to the next level. Otherwise, DF signals are passed. At the next level, the priority encoder receives the signals filtered by PASS signals. The encoder sets only one ENGATE signal that indicates the currently selected frontier. ENGATE signals are passed to the ID generator to compute the index of the selected frontier. Also, the gate type of the selected frontier is computed by ORing ENGATE signals corresponding to AND/NAND gates. With this logic, GTYPE is active if the selected frontier is an AND or NAND. Otherwise, GTYPE is inactive for OR or NOR gates. There is no need to distinguish between AND and NAND gate because both have the same controlling value which is zero. Similarly, OR and NOR gates have the same controlling value, therefore there is no need to distinguish them either. The inverse of GTYPE is the controlling value of the selected frontier.
An implementation of the ID generator block is depicted in Fig. 15 . In this block, for a given -bit stream in which there is only one 1 and the rest are zeros, the index of the bit that is 1 is generated. The stream is constructed from the ENGATE signals in the order of their indexes beginning from index 0. In this block, first all bits from left before the 1 bit are converted to 1, (e.g., 0010000 into 1110000) and then the 1s are summed using an adder tree to obtain the index of the 1.
In Fig. 16 , the LASTGATE register stores the Last Processed Frontier popped from the stack during backup. Since the BACKEDUP flag is set during backup, and LASTGATE is not empty, the DECODER is enabled. If the Last Processed Frontier has index , then only is active. An active makes through signals inactive whereas through active. In case the BACKEDUP flag is unset, all signals are inactive which makes all signals active. Additionally, ID, ENGATE, and GTPYE signals from Fig. 14 , and the GATE signals from Fig. 16 are used during decision making and updating the corresponding line states with these new decisions.
7) Stack:
The stack stores the indexes (ID) of the selected frontiers. When a frontier is selected to compute a decision on it, its index is pushed onto the stack. Upon backtrack, the frontier on the top stack is popped and stored in the Last Processed Frontier block.
All gates in the circuit under test might be selected, therefore the depth of the stack is bounded by the number of gates. For an gate circuit (only gates with more than 1 fan-in), the size of the stack is by . Fig. 17 . The decision maker module. Fig. 18 . Implementation of the decision maker module.
8) Decision
Maker: Fig. 17 shows the decision maker block. The task of this logic block is to compute the decisions on the fan-ins of the currently selected frontier identified with ID. There is only one such block in the datapath. Since the same block is used to compute decisions for all frontiers, the size of this block is determined by the maximum number fan-in frontier. Decision computation requires the knowledge of the gate type (GTYPE) to deduce the controlling value, the frontier type ( ), and the initial/next decision (LOAD/INC) on the frontier. In addition to listed parameters above, this block receives the set of fan-ins of each gate either from the forward network to compute the initial decisions or from the previous decisions blocks to compute the next decisions. The outputs of this block are the updated fan-in values of the selected frontier , and also, for -frontiers, the index to the decision-made fan-in of the frontier to reverse this decision in the next decision.
Implementation of the decision maker module is shown in Fig. 18 . It is assumed that a gate in the circuit has maximum 3 fan-ins. To make decisions on the fan-ins, it is sufficient to know if the value of fan-in is a binary, an unknown, or a or . There is no need to distinguish from or vice verse on a fan-in during decision computation. Therefore, the fan-in values supplied in nine-valued logic from the forward network can be -encoded according to Table IX , and the decision maker module can compute the decisions based on this encoding. In TABLE IX -ENCODING this coding, and are mapped to the same unused code, viz. "00." In this implementation, if LOAD is asserted, then the encoded values of fan-ins of all gates from the forward network are passed to next level. Otherwise, the previous decision values of fan-ins of the gates, which are stored in encoded form, are passed to the next level. At the next level, only the fan-ins of the selected frontier are passed using the implementation presented in Fig. 19. In Fig. 18, , and refer to the -encoded fan-in values of the selected frontier. At the third level, the outputs of AND gates are set to 1 if the fan-in has unknown value. At the fourth level, decisions on the fan-ins are computed according to GTYPE and using priority encoders. Here, is equal to in two bits. In case the current frontier is a -frontier, then all unknown are set to . Otherwise, the first fan-in with unknown value is set , the rest remain intact. At this level, decisions are known. At the final level, INDEX is computed using INDEX generator that is similar to ID generator. Fig. 19 shows a 1-fan-in cell implementation for the SELEC-TION GATE module. In this figure, is the th fan-in of the th gate. This cell can be repeated times to implement a SE-LECTION GATE module for fan-ins. It is guaranteed that only the ENGATE associated with the selected frontier is set, and the rest are unset. Therefore, has the value of the th fan-in of the selected frontier.
As mentioned before, the same block is used for the gates with fewer fan-ins (e.g., 2). In such a case, from left to right, all unused fan-ins are set to a constant value such as 1 or 0. Such a configuration has the don't-care effect on the decision computation process.
Below, two examples are presented to explain the decision computation process using the decision maker block.
Example 1: A 3-fan-in AND gate with fan-in values is the selected -frontier. Both GTYPE and are 1 for this gate, therefore decision value is "10" which is the encoded value for logic 1. Since and are coded as "11" and "00," respectively, the outputs of the AND gates at the third level are from left to right. Since is 1, all priority encoders (PE) are enabled, and their input values (i.e., ) are transferred to their outputs which are the enable signals for the selectors (SE). As a result, the first and last selectors are enabled, and the middle one is disabled. Therefore, the values at the outputs of selectors are "10," "00," and "10" from left to right. INDEX has 0 value. These decision values are passed to the decision storage unit for the selected frontier.
Example 2: A 3-input NOR gate with fan-in values is the selected -frontier. Both GTYPE and are 0 for this gate; therefore decision value is "10" which is the encoded value for logic 1. Since is coded as "11," the outputs of the AND gates at the third level are from left to right. The highest priority PE (i.e., the left-most PE) receives enable signal with the input value of , therefore it sends disable signal (i.e., 1) to all lower priority PEs, and also set its output to 1. The outputs of the PEs are 1, 0, and 0 from left to right. As a result, only the first SE from the left is enabled, and its output value is set to "10" (i.e., 1), and the others remain intact. Moreover, INDEX is set to 3. So the current decisions on the fan-ins are 1, , and 0. If this decision fails, then the decision maker block receives as the values of fan-ins and computes as the next partial decision. It is called partial decision because reversing the previous decision addressed by INDEX is deferred to the decision storage unit. So, the complete next decision after reversing sets fan-ins to 0, 1, and 0 from left to right.
The computed decisions are passed to the selected frontier's decision storage unit. In the next section, the functionality of decision storage units is discussed in detail.
9) Decision Storage: The -algorithm needs to store the decisions made on the fan-ins of the gates for two reasons: 1) to restore the line states from them after backup, or 2) to compute the next decisions on the fan-ins. All of the gates can be on the decision tree and are potential -frontiers or -frontiers; therefore, a decision storage unit is statically assigned for every gate in the circuit under test. Such a unit stores the decided values of fan-ins and the INDEX computed by the decision maker block. Also, some control signals such as storage unit select and operation type are supplied to the unit. Fig. 20 shows a decision storage block along with its inputs and outputs for a gate identified with ID. When signal is set, the storage unit for gate ID is enabled/selected. This signal is set in two cases: 1) The selected frontier is ID (i.e., is set) or 2) the last processed frontier is ID (i.e., is set) with no further decisions and the decisions in the storage unit corresponds to this frontier need to be cleared (i.e., RESET is set). The other inputs to the unit are INCREMENT and LOAD, which indicate the first and next decisions, respectively. The unit also receives CLOCK and for global reset signals which are not shown in the figure. The outputs of this unit are the decisions on Fig. 21 shows a storage unit implementation for a two-inputs gate. In this figure, is a globally reset signal and is used to reset all storage units simultaneously. Internal signal GS indicates that the unit is selected for either loading (i.e., the first decision) or incrementing (i.e., the next decision). In case of the next decision computation, the last INDEX value stored in the index register sets one of signal which indicates that the corresponding fan-in value needs to be reversed before storing.
The stored fan-in values are passed to the corresponding line states and back to decision maker module. The values passed to the line states are -encoded before storing them according to Table X. Since only good values are stored in line states, the valued fan-ins (coded as "00") are mapped to unknown (coded as "11"). The decided fan-in values are passed to the decision maker module as they are.
The number of flip-flops needed to store decision and index values for an -fan-in gate are and , respectively. Therefore, the total number of memory elements in a decision storage unit for an fan-in gate is . So, for an gate circuit and average fan-ins per gate, the decision storage units require memory elements in total. During backup, the last processed frontier is checked for the availability of the next decision indicated by the corresponding signal. All signals from the decision storage units are combined as in Fig. 22 to compute LASTSTATE signal. When LASTSTATE signal is set, the last processed frontier has a possible next decision . Since only one is set, LASTSTATE has the value of .
10) Controller and Test Generation Time at Hardware:
The controller of the implementation uses nine states to manage the flow of computation. The flow of computation starts by initializing the fault activator that is completed in one state. Fault activation and initialization of the circuit for the current fault follows the fault activator initialization and is completed in one state. Therefore, two states are executed for each fault. Upon activation of a fault or re-initializing the circuit after backtrace, it requires two states to compute the direct implications and store them in the line states. These two states are repeated until there are no more backward or forward direct implications. All the forward implications and one-level backward implications are computed in one clock cycle. In case there are no direct implications, it requires one state to compute the assignments for the selected frontier following the direct implication computation states. The backtrace requires at most three states to backup the circuit. Therefore, the knowledge about the steps involved in the -algorithm and the state machine can be used to estimate the number of clock cycles required to generate test vectors at hardware.
IV. APPROXIMATION OF TEST GENERATION TIMES
To compute the efficiency of this approach, we compare the run time of a software-based -Algorithm with that of a hardware implementation. For the hardware implementation we include the following parameters: preprocessing time , reconfiguration time , and test generation time . The test generation procedure generates a test for each testable fault and does not fault simulate these vectors. Using the knowledge about how many states are executed for each operations, the test generation time at hardware can be estimated in terms of the number of faults , the number of concurrent backward direct implications , the number of assignments , and the number of backups . The test generation time at hardware is estimated using (12) in which the coefficients indicate the number of states (i.e., clock cycles) required for each operation (12) According to (12) , for each fault, the test generation circuit requires two clock cycles to initialize and inject the fault. During the execution of the algorithm, the circuit requires two clock cycles to process direct implications, one clock cycle to process an assignment, and at most three clock cycles to process a backtrack. As a result, the total time to the test generation for all faults is . Experiments are performed using the ISCAS85 benchmark. Characteristics of these circuits are shown in Table XI . For example, circuit c432 consists of 160 gates and a total of 524 faults to be injected into this circuit. When all faults are targeted using the -Algorithm, the number of assignments , TABLE XI  DESCRIPTIONS OF THE BENCHMARK CIRCUITS   TABLE XII  HARDWARE SPEED-UP OVER SOFTWARE FOR BENCHMARKS   TABLE XIII  HARDWARE OVERHEAD of backtracks, and of implications are 60 366, 5980, and 2350, respectively. These numbers are used to compute the number of clock cycles required by the circuit. Table XII shows for each of the circuits the time required by a software -algorithm, the preprocessing time, the time required by the hardware running at 1 MHz, and the speed up. Also, although most of the implications are computed in parallel in our approach, they are counted as sequential in the approximation. For these, a speed-up achieved over software ranging between 3.25 and 14.8 times for large circuits is shown.
Table XIII presents the hardware overhead in term of gate count. All gates have one outputs and fan-in count less than six. In Table XIII 
V. CONCLUSION
We presented a new approach for generating test vectors that detects faults in combinational circuits. The approach is based on automatically designing a circuit which implements the -algorithm, an ATPG algorithm, specialized for the combinational circuit. Our approach exploits fine-grain parallelism by performing the following in three clock cycles: direct backward/forward implications, conflict checking, selecting next gate to propagate fault or to justify a line, decisions on gate inputs, and loading the state of the circuit after backup. We showed the feasibility of this approach in terms of speed, and how it compares with software-based approaches in terms of speed, and how it compares with software-based techniques.
For large circuits, we achieve high speed-up. The method presented here can be applied to other CAD algorithms.
VI. FUTURE WORK
As future work, the -algorithm will be mapped onto reconfigurable hardware. A computer program will read the CUD description and automatically generate a specialized circuit from it that enables the -algorithm to run on hardware for that particular circuit. With this technique, all forward implications, one level of backward implications, maintaining -frontiers sets, selecting a frontier from the set, and making decisions on that node requires one clock cycle. Moreover, recomputing the state of the circuit when a backup occurs may require a few clock cycles. It would be substantially faster than the software-based -algorithm.
