Carry lookahead adders have been, over the years, implemented in complex arithmetic units due to their regular structure which leads to efficient VLSI implementation for fast adders. In this paper, timingdriven testability synthesis is first performed on a tree adder. It is shown that the structure of the tree adder provides for a high fanout with an imbalanced tree structure, which likely contributes to a racing effect and increases the delay of the circuit. The timing optimization is then realized by reducing the maximum fanout of the adder and by balancing the tree circuit. For a 56-b testable tree adder, the optimization produces a 6.37% increase in speed of the critical path while only contributing a 2.16% area overhead. The full testability of the circuit is achieved in the optimized adder design.
INTRODUCTION
A tree is a circuit constructed from identical modules interconnected in a regular fashion so that there is only one signal path between any two points. The modules or cells used to construct a tree circuit can have internal reconvergent fanout, but fanout is not allowed among the modules. A tree may have multiple outputs and modules interconnected by multiple-bit buses. The circuits studied in this paper are convergent tree circuits. In a convergent tree circuit, as one move from the input module to the output module, the number of (data) signal lines in the circuit decreases. The testing of these adders can be a very difficult task. A large and complicated circuit requires many test patterns to detect all functional faults. In order to decrease the test pattern to detect all possible faults for the entire circuit, the convergent tree circuit should be composed of identical modules interconnected in a one dimensional array so that the array interconnection allows the tests used for one module to be used on other modules [1] .
Pseudo-exhaustive testing techniques based on partitioning are perfectly suited for circuits structured as iterative logic arrays (ILAs). The ILAs can be pseudoexhaustively tested with a number of tests regardless of the number of cells in the ILAs such as the ripple-carry adder.
In this paper, we are interested in the 56-b carry lookahead adder. To make a convergent tree C-testable [2] , we must have one dimensional ILAs. In Fig. 1 , we can see how a convergent tree module can be interpreted as the module of a one-dimensional array [3, 4] . The testability of convergent tree circuits can be characterized in terms of the testing properties of their n one-dimensional array types.
The worst case propagation delays in carry-lookahead adders depend on how full adders are grouped structurally together into blocks as well as the number of levels and fanouts. A fully testable 56-b carry-lookahead adder is studied for timing as well as for testability. The 56-b carrylookahead convergent tree adder has created onedimensional arrays among GP modules. These arrays are used to make the circuit C-testable, thus, reducing the number of tests to detect all faults in the circuit. In the above design, the branch contribution to each of the carry modules causes racing and high fanout at the output of the carry module for the first 16 bits. The 5 fanouts at the carry-16 module cause a load imbalance that affects the delay of the circuit. This adder will then be compared to a timing driven testable adder optimized for better timing performance by reducing the number of fanouts by using a balanced convergent tree. Both designs will be tested and compared for timing, testability and area.
TESTABLE CONVERGENT TREE ADDER (TCTA)
The architecture of the carry-lookahead adder can be obtained from the fundamental carry operation fco. Let þ denote the logic OR operation, and the juxtaposition of two variables denote the logic AND operation. To define the fundamental carry operation, we must first look at the concept of carry generation and carry propagation. For two operands A ¼ {a j ; . . .; a k ; . . .; a m ; . . .; a 0 } and B ¼ {b n21 ; . . .; b k ; . . .; b m ; . . .; b 0 } the carry generate and carry propagate are then defined as
then ½p i:j ; g i:j ¼ ðp i:kþ1 ; g i:kþ1 Þ fco ðp k:j ; g k:j Þ; which is defined as p i:j ¼ p i:kþ1 p k:j and g i:j ¼ g i:kþ1 þ p i:kþ1 g k:j ;
The fco operator has an associativity that can be represented by ½ðp i:mþ1 ; g i:mþ1 Þ fco ðp m:kþ1 ; g m:kþ1 Þ fco ðp k:j ; g k:j Þ ¼ ðp i:mþ1 ; g i:mþ1 Þ fco ½ðp m:kþ1 ; g m:kþ1 Þ fco ðp k:j ; g k:j Þ for j , k , m , i:
Now we can express the carry output equation for a 4-b ripple carry adder as ðððpg 0 fco pg 1 Þ fco pg 2 Þ fco pg 3 Þ:
In this equation, we can see how the propagate and generate signals for the least significant bit (LSB) groups pg 0 and pg 1 are combined first; that result is then combined with the following group, and so on in a linear fashion. If instead we combined the two lower and upper groups simultaneously and then combined the results, we would get the following result for four bits ðpg 0 fco pg 1 Þ fco ðpg 2 fco pg 3 Þ:
As can be seen, by using the associative property of the fco, both results are equivalent.
TCTA Circuit
The convergent tree carry lookahead adder circuit was built by connecting several distinct modules. The first modules are the generate/propagate modules (gp and GP) and are used to find the carry out function in the carry lookahead adder. These modules are grouped for every 8 input bits to the adder. The results of these modules are combined with the carry from the lower 8 bits in the carry module and the sum is then calculated through groups of 4-and 8-b ripple carry adders. All of the modules will be studied in detail.
gp Module
In the previous section, we showed the equations to calculate the generate and propagate bits. Using those results, the carry out function for a 2-b sum can be expressed as
The g and p values are calculated by using the gp module shown in Fig. 2 .
Now that we have calculated the generate and propagate values for a single bit, we must extend the concept to include 2 bits. From the 2-b carry out equation we can find
When the value for C 1 is substituted, the equation becomes
where we can define the 2-b carry generate and propagate, respectively as Figure 3 shows the logic implementation of the GP module.
For the 4-b carry out (C 4 ), the Boolean expression is
When substituting the C 2 , C 3 values,
The structure of the above expression is shown in Fig. 4 . From the above carry out expression, we can also have alternate GP expressions for four bits as
From those equations, we found out ðp 3 p 4 Þ and ðg 4 þ g 3 p 4 Þ are the generate and propagate outputs of a 2-b GP module, which inputs are fed from bit-3 and bit-4 gp modules' outputs. Similarly, ðp 1 p 2 Þ and ðg 2 þ g 1 p 2 Þ are the outputs of a 2-b GP module, which inputs are fed from bit-1 and bit-2 gp modules' outputs. The alternate tree structure for a 4-b GP is shown in Fig. 5 . It is obvious that the critical path of a 4-b GP in Fig. 5 is shorter than that in Fig. 4 . Therefore, our timing driven convergent tree adder design adopts the timing optimized 4-b GP in Fig. 5 .
8-b GP Functional Tree
The 8-b functional tree represented will be modified to reduce racing effects as well as fanouts. The design shown has a combination of gp and GP modules to produce the generate and propagate bits for an 8-b addition function. The 8-b carry equation is
By expanding the equation it takes on the following form
The propagate bit can be expressed as The solutions for the generate and propagate bits produce a C-testable tree that can be seen in Fig. 6 . These equations will be combined to optimize the design and change the tree model.
Carry Module
The carry module has the purpose of propagating the carry out through the carry-lookahead adder by combining the signals of 8-b functional trees with the carry out of the previous 8-b functional tree. The equation that characterizes the carry module is
The logic diagram can be seen in Fig. 7 .
Convergent Tree Adder Design
Combining the above described modules, we reach the first design of 56-b TCTA that was synthesized, studied and tested. The 56-b TCTA is shown in Fig. 8 . The design for this adder makes it fully testable. However, the adder has a significant problem, the output at the carry 16 module, C 16 , has 5 fanouts. This will severely affect the delays of the circuit; furthermore, the high fanout produces very unbalanced circuits that will make delays vary significantly from one output bit to the next. Propagation differences can cause racing effects and produce incorrect outputs. In Fig. 8 , since C 24 , C 32 , C 40 , and C 48 use the same carry value from C 16 , the tree level will increase due to a higher number of partitioned inputs, resulting in a high fanout and unequal propagation delay path from inputs to outputs. Notice the critical path as the bold line starts in input bit 17.
TIMING DRIVEN TESTABLE CONVERGENT TREE ADDER (TDTCTA)
As shown earlier, the 56-b carry lookahead adder had several significant problems with the fanout of the 8-GP bit functional tree modules. We are going to start the optimization process by looking at the equations for the carry of the first four bits
By expanding this equation we get
This equation can be regrouped and expressed as
This expression is more balanced and freer when we implement the logic diagram. We can now extend this notion to the 8-b GP functional tree module
where G 8 ¼ g 8 þ g 7 p 8 þ g 6 p 8 p 7 þ g 5 p 8 p 7 p 6 þ g 4 p 8 p 7 p 6 p 5 þ g 3 p 8 p 7 p 6 p 5 p 4 þ g 2 p 8 p 7 p 6 p 5 p 4 p 3 þ g 1 p 8 p 7 p 6 p 5 p 4 p 3 p 2 ¼ g 8 þ p 8 ðg 7 þ p 7 ðg 6 þ g 5 p 6 ÞÞ þ p 8 p 7 p 6 p 5 ðg 4 þ p 4 ðg 3 þ p 3 ðg 2 þ g 1 p 2 ÞÞÞ and P 8 ¼ p 8 p 7 p 6 p 5 p 4 p 3 p 2 p 1 ¼ ðp 8 ðp 7 ðp 6 p 5 ÞÞÞ ðp 4 ðp 3 ðp 2 p 1 ÞÞÞ:
These two expressions can be optimized now by regrouping the equations as G 8 ¼ ðg 8 þ g 7 p 8 Þ þ p 7 p 8 ðg 6 þ g 5 p 6 Þ þ p 8 p 7 p 6 p 5 ððg 4 þ g 3 p 4 Þ þ p 3 p 4 ðg 2 þ g 1 p 2 ÞÞ and P 8 ¼ ðp 8 p 7 p 6 p 5 Þ ðp 4 p 3 p 2 p 1 Þ: This expression can also be described using the fundamental carry operation as ð pg 4 fco ð pg 3 fco ð pg 1 fcopg 2 ÞÞÞ fco ð pg 8 fco ð pg 7 fco ð pg 5 fcopg 6 ÞÞÞ and then it can be optimized to ðð pg 1 fcopg 2 Þ fco ð pg 3 fcopg 4 ÞÞ fco ðð pg 5 fcopg 6 Þ fco ð pg 7 fcopg 8 ÞÞ:
The equations above give shape to the new optimized 8-b GP functional tree that can be seen in Fig. 9 . Using these expressions we avoid the large fanout problem as well as the unequal propagation delay path from various inputs to respective outputs. In this new adder design, the inputs are partitioned every 8 bits to obtain a more balanced tree circuit. Now using the new 8-b functional tree module we get the new 56-b testable timing driven convergent tree adder circuit from Fig. 10 . The comparison of module count between 56-b TCTA and the above mentioned timing optimized tree design, 56-b TDTCTA, is shown in Table I .
SYNTHESIS AND SIMULATION RESULTS
Both 56-b TCTA and TDTCTA were synthesized and tested in CMOS technology. Table II summarizes the results of critical path delay and silicon area.
HITEC [5] was used for fault grading of both 56-b TCTA and TDTCTA. The total equivalent single stuck-at faults of 56-b TCTA and TDTCTA are 2,672 and 2,600, and all faults are detected by 218 and 190 test vectors, respectively. Both 56-b TCTA and TDTCTA are testable designs with 100% fault coverage. The fault grading results are summarized in Table III. CONCLUSION Implementation of timing driven TCTAs has been presented. The architecture of this design has been studied and analyzed. This circuit was then optimized through the use of the associativity property of the fundamental carry operation. By losing a small amount of area, the performance of the circuit was significantly increased. For 56-b adder, the timing optimized design uses two more GP modules than the original design, and therefore, there is an increase in the area of the design. The addition of the two modules balances the circuit and increases the speed. This addition produces a significant improvement of timing by 6.37% over the original design while maintaining its full testability.
For future work in this area, other circuits will have to be studied and analyzed to take advantage of the timing optimization process. The reduction of the high fanout and the balancing of the circuit would produce faster circuits. The circuits will still maintain the testability properties using the associativity property of the fundamental carry operation. 
