In this paper, a new approach for generating test vectors that detects faults in combinational circuits is introduced. The approach is based on automatically designing a circuit which implements the D-algorithm, an Automatic Test Pattern Generation (ATPG) algorithm, specialized for the combinational circuit. Our approach exploits fine-grain parallelism by performing the following in three clock cycles: direct backward/forward implications, conflict checking, selecting next gate to propagate fault or to justify a line, decisions on gate inputs, loading the state of the circuit after backup. In this paper, we show the feasibility of this approach in terms of speed, and how it compares with software based techniques.
Introduction
ATPG is the process of either finding input vectors that detect a fault in digital circuits by distinguishing the faulty and faultfree circuit behavior at Primary Outputs (PO) or flagging a fault redundant when no such vector exists. This process requires a large amount of CPU time and in many cases they abort many of the hard-to-detect faults. It is known that the ATPG is NPcomplete even for combinational circuits[ 131.
Most existing deterministic ATPG techniques employ a branch-and-bound [ 11 technique to examine all input combinations. The D-algorithm, in [ 121, examines all input combinations by making decisions at internal circuit nodes as well as primary inputs and alternates between fault propagation and line justification processes until some faulty values appear at the primary circuit output (PO) (The fault is detected) or the search space is exhausted. In this later case, the fault is flagged as being redundant. The PODEM, in [7] , examines all input combinations by making decisions only on primary inputs (PI). This way the number of nodes appearing in the search tree is reduced. To achieve this, all decisions on internal lines are traced back to the Primary Inputs (PI). The FAN algorithm, in [9] , presents the following improvements to the basic PODEM algorithm: tracing of objectives stops at some internal lines (head-lines) in addition to PIS and multiple objectives are backtraced instead of a single objective back-tracing as used in PODEM. Further improvements are made to FAN to have a better performance by finding mandatory assignments based on dominators and by finding nodes where values can be assigned independent of other nodes [lo] . SOCR.ATES utilizes a unique sensitization technique based on dominators and implication learning to speed the justification process [ll] . Recursive learning that avoids the use of decision Pee is proposed in [14] . Other improvement to speed the ATPG prlocess is found in [ 161.
Emulation systems are being used increasingly in the design, verification, and in rapid prototyping of digital systems [3] . To increase the use of these emulation systerns, several methods are proposed to emulate Computer-Aided-Design (CAD) algorithms such as fault simulation [4, 5] , Automatic 'Test Pattern Generation (ATPG) [l] , Satisfiability (SAT) [ 1, 6] , and Fault diagnosis [8, 15] . In [4] , a method is proposed to emulate serial fault simulation. In [5] , a method is proposed to emulate critical pathtracing algorithm. In [l] , a method is proposed to emulate PODEM algorithm with its application to SAT. In all of those algorithms, a significant speed-up was obtained over software based implementation.
In this paper, we present a new method to emulate the Dalgorithm on a reconfigurable hardware. The method achieves significant speed-up over software-based ATPG techniques with similar or better results. The quality of the results is measured in terms of fault coverage. This is achieved by utilizing reconfigurable hardware that provides a way to exploit the finegrain parallelism in the D-algorithm. This paper is organized as follows: In section 2, the concurrent D-algorithm is presented. In section 3, the overall architecture of the implementation is given. In section 4, we present results. Finally, we present conclusion and future work in section 5.
Concurrent D-Algorithm
The concurrent D-algorithm is shown in Fig.  1 . The forward implication and the backward direct implication processes are repeated until a conflict occurs or no more direct implication exists. In our implementation, each iteration requires one clock cycle. In the case of a conflict the program transfers control to a backtrack procedure which reverses the previous decision. In the other case, the procedure tries to justify nodes in the J-frontier when error is at a PO or propagates an error in the D-frontier otherwise. If all nodes in the J-frontier are justified then a test is found. In the case where justification fails, the control is transferred to backtrack. During the justification, each gate is pushed on the stack and the state of its fan-in are either initialized (InitFaninState(G)) if the G is pushed for the first time (indicated by the Increment flag) or the next appropriate fan-in state is computed (NextFaninState(G)). These steps are also performed in one clock cycle. Similar steps are performed during the propagation process which is activated when no error is at any POs. Next, the algorithm loads the changes on the fan-in nodes of gate G, due to a decision, into S [l] (LoadLineStateFromFanin(G)) and resets the Increment and BackUpSet flags. These steps are repeated until a test is found or a fault is redundant. 
Fig. 1. Concurrent D-algorithm.
Note that when selecting a gate from the D-frontier/J-frontier (SelectGateFromDF/JF()) the decision on the fan-in gate which was last pushed onto the stack is examined and if more decisions exist, then gate is selected again; Otherwise a new gate is selected. It should be mentioned that in the implementation the same logic is activated depending on ErrorAtPO signal for selection of a gate from Q-frontier or J-frontier sets.
The overall architecture is shown in Fig. 2 . These steps are repeated until either the FCONFLICT/ BCONFLICT is set or none of DIMP is set. The Frontier Selector Block consists of a priority encoder that selects either a DF gate if error is not at the PO or a JF gate otherwise. The gate identification that was processed previously is stored in LastProcessedGate. This is used to be passed to the stack if more decision can be made on this gate ' All equations uses the convention described in Table 1 otherwise it is used to exclude gates with lower priorities from the selection process because the search space associated with these gates has already been explored. The selected gate is pushed onto the STACK and the associated decision on its Fanin(G,I) are computed in the Decision Maker block, and the decisions are stored in corresponding G [I] . In the case of either a conflict, error is not at the output and all DF are not set, all justification associated with set JF gates failed the gate on the top of the stack is popped and stored in LastProcessedGate. All these steps are repeated until a test is found or no more decision can be made which signals a redundant fault.
The forward network consists of the circuit under test where the gates are interconnected with lines. Each line is modeled as in Fig. 3(a)-(c) . The forward networks compute thefinal values of a line by considering the state value (Si), the implied value computed by the fan-in gate (I;), and the fault injection signal (F,). The final value of line is computed using (Eq. 1.a). A conflict signal is also computed for every line using (Eq. 1.b). In the case where a fault (F,) is not injected at the line the expression in Eq. (1 .a) is simplified. Fig. 4 shows a 2-input AND gate and its corresponding model in faulty forward network. The signal block computes for each gate i the following: JF;, DFi, DIMP;, and VAL;. Those are computed for all gates with more than one fan-in. DIMP and VAL are computed for all gates. The equations (3-8) compute JF,, IDF,, DIMP, and VAL, using 5-valued logic (D and TD represent 1/0 and 0/1 respectively). The gate type is encoded into these equations by the inclusion of its controlling value c. Fig. 6 shows the other parameters used in these equations. ThejGnul line values in the good and faulty forward networks are combined and mapped into a 5-valued logic represent with L, and R, in Fig. 6 . I Gf : the inputs of the gate i from its gate fan-in state 1 are supplied from the forward network. In the case where INC is set then we will be making the next decision on the gate. In this case, the previous decision values are loaded from Fanin(G,l). Next decision is computed after loading and index is set to the location where reverse decision may be made.
Results
To compute the efficiency of this approach, we compare the run time of a software based D-Algorithm with that of a hardware implementation. For the hardware implementation we include the following parameters: Preprocessing time (Tp), Reconfiguration time (T,) and test generation time (TG). The test generation time (TG) is computed in terms of the number of faults (F), the number of direct parallel implications (I), the number of assignments (A) (i.e., the number of frontiers' selections) and the number of backtracks (B). The test generation procedure generates a test for each testable fault and does not fault simulate these vectors. For each fault, the test generation circuit requires 3 clock cycles to initialize and inject the fault. During the execution of the algorithm, the circuit requires 2 clock cycles to process direct implications, 2 clock cycles to process an assignment, and three clock cycles to process a backtrack. Thus, the total test generation time required by the generated circuit is given by TG = 3*F + 2*I+ 2*A + 3*B. Therefore, the total time to finish the test generation for all faults is TP + Tc + T G .
To perform the experiment we use the ISCAS85 benchmark. Characteristics of these circuits are shown in Table 2 . For example, circuit c432 consists of 160 gates and a total of 524 faults were injected into this circuit. When all faults are targeted using the D-Algorithm, the number of assignments, of backtracks and of implications are 60366,5980 and 2350 respectively. These numbers are used to compute the number of clock cycles required by the circuit. Table 3 shows for each of the circuits the time required by a software D-algorithm, the preprocessing time plus configuration time, the time required by the hardware running at 1 MHz and the speed up. Also, although most of the implications are computed in parallel in our approach, we count them sequential in our speed-up computation. For these we can see a speed-up over software ranging between 3.25 and 14.18 times for large circuits. 
Conclusion
We presented a new approach for generating test vectors that detects faults in combinational circuits. The approach is based on automatically designing a circuit which implements the Dalgorithm, an Automatic Test Pattern Generation (ATPG) algorithm, specialized for the combinational circuit. Our approach exploits fine-grain parallelism by performing the following in three clock cycles: direct backward/forward implications, conflict checking, selecting next gate to propagate fault or to justify a line, decisions on gate inputs, loading the state of the circuit after backup. We showed the feasibility of this approach in terms of speed, and how it compares with software based this approach in terms of speed, and how it compares with software based techniques. For large circuits, we achieve high speed-up.
