Abstract| We present a performance driven generator for integer adders which has the following interesting feature: The generator is parametrized in the operands' bitlength n, the delay of the addition tn and the fault model FM. FM may in particular be chosen as the classical stuck-at fault model, the cellular fault model or the robust path delay fault model. The output of the generator is a performance oriented conditional sum type adder, i.e., an area-minimal n-bit adder of the \conditional sum type" with delay tn (if it exists) together with a small complete test set with respect to the chosen fault model FM.
I. Introduction
Since 1982, various VLSI designs for fast addition have been proposed 1], 2]. The delays of the adders presented are optimal from the asymptotic point of view.
However, the actual optimal structure of a realistic adder depends on the cell library used (see e.g. 3] ). Moreover, designers often do not want to obtain a fastest area-minimal adder but an area-minimal one in the class of adders with a delay less than some bound t. As an example consider oating point multipliers. Due to the delay of the multiplication of the mantissas, the circuit performing the addition of the exponents has not to be time optimal. This leads to constraint driven adder design, especially time driven adder design. Generators parameterized by the operands's bitlength n and by an upper bound t n for the delay computing an area-minimal n-bit adder with delay t n (if it exists) are demanded.
Recently, Wei and Thompson 4] and Chan et al. 5] formulated the area-time optimal adder design as a dynamic programming problem. Their approach is based on Ladner and Fischer's parallel pre x computation 6], and is essentially a look-ahead addition. Unfortunately, Wei and Thompson do not take into account the testability (with respect to some fault model) of the area-time optimal adders constructed by their generator. However, as a result of technological improvements testability aspects gain more and more importance and should therefore be considered as early as possible.
Modern VLSI electronic circuitry may contain hundreds
Manuscript received June 22, 1993 ; revised April 12, 1994 . This work was supported in part by DFG grant Be 1176/4-1 and SFB124. This paper was recommended by Associate Editor K. Keutzer.
B. Becker and R. Drechsler are with the Department of Computer Science, J.W.Goethe-University Frankfurt, D-60054 Frankfurt/Main, Germany. E-mail: <name>@kea.informatik.uni-frankfurt.de.
P. Molitor is with the Institute of Computer Science, MartinLuther University Halle, D-06099 Halle/Saale, Germany. E-mail: molitor@infsparc2.informatik.uni-halle.de.
IEEE Log Number of thousands of transistors on a single silicon chip. Even if the chips are correctly designed, i.e., design veri cation has been done successfully, a non negligible fraction of them will have physical defects caused by imperfections occurring during the manufacturing process (e.g., open connections induced by dust particles). Therefore, there has to be a test phase in which production veri cation is performed, i.e., in which the good chips are sorted from the bad ones. For a detailed treatment of the topic see 7] . Due to the variety of possible defects restrictions on a subset of the possible faults are necessary; these simplifying assumptions based on experience of many years are manifested in fault models. Since tests are generated to test for the fault mechanisms described by the assumptions, the reliability of the chip is at least partially determined by the accuracy and e ectiveness of the fault model (measured, e.g., in detected physical failures).
The classical stuck-at fault model (SAFM) is well-known and used throughout the industry 8]. It assumes that a defect causes a basic cell input or output to be xed to either logic 0 or logic 1. Thus, all failures with this e ect will be detected by tests for stuck-at faults, but there exist other static faults that are not covered by using this simple model 9], 10], 11], 12]. Thus, other fault models are necessary to overcome this problem, e.g., exible fault models which verify the correct static behavior of a combinational circuit based on inductive fault analysis 13], 14]. The strongest cell-based fault model to control the correct static behavior of a combinational circuit is the cellular fault model (CFM), which tries to verify the function of each basic cell in the circuit completely 15], 16]. However, it has been observed by many authors that a large amount of defects typical of today's VLSI technologies are not covered by static fault models like the stuck-at or cellular fault model. In many cases dynamic fault models which allow to model stuck-open faults or timing issues 17], 18], 19] are necessary. In particular, it is natural (and necessary) to check delays in a circuit that has been constructed by a time driven design methodology, since physical failures can also in uence the timing behavior of a circuit. The generator can only use the information from the library, but not the actual delay behavior. We therefore also consider the robust path delay fault model 18] in this paper. The robust path delay fault model is a very powerful fault model whose objective is to check the propagation delay of every path in the circuit. The more powerful a fault model is the more faults are detected in the circuit, e.g., a complete robust path delay fault test detects many multiple stuck-at faults 20] .
With increasing complexity of VLSI circuits, the costs for the test phase have increased dramatically; at least 25% and up to 60-70% of the total product costs are due to testing 21], 22]. Due to the increasing costs of the test phase, specialists in the eld of testing agree that testability issues have to be considered from the very beginning of the design process to control the test costs and to guarantee the testability of the circuit at the end of the manufacturing process. This requires a detailed analysis of the relation between structural properties of the circuit under test and testability properties. Successful steps in this direction can be observed: Relevant subclasses of circuits, which, due to their structural properties, allow e cient test generation algorithms and as a result yield complete test sets of small size, have to be classi ed and investigated. In this paper we present a realistic and systematic method for constructing area-time optimal parallel adders together with e cient complete tests with respect to a chosen fault model. Our approach to generate adders is based on the conditional sum scheme 29]. The delay model used is the intrinsic-plus-fanout delay model given in lots of industrial cell libraries (see e.g. 30] ) and also used by the logic synthesis and optimization benchmarks of the Microelectronics Center of North Carolina 31] . For the computation of the area, we restrict ourselves to the sum of the cell areas because the adder design presented in the following is done on a symbolic design level. The lengths of the wires cannot be estimated in general, since they largely depend on the technology used. For some design processes this simplifying assumption is realistic, i.e., channelless gate arrays or sea-of-gates 32], 33]. As in 5], 4], the areatime optimal adder design itself is formulated as a dynamic programming problem. We present two methods for this rst phase. The rst one was presented in 34] and is based on a straightforward dynamic program. However, this algorithm is impracticable for large n-bit adders (n > 64), because the memory requirement is too large 1 . The second algorithm solves the problem of area-time optimal adder design by dynamic programming where each step is based on point location 35] . This algorithm decreases the actual runtime and the memory requirements of the rst one. The technique presented can also be applied to other circuit structures, e.g., to the look-ahead addition principle. The algorithms have been implemented, examples are given in Section IV. The experimental results show that the second algorithm is applicable also for large n although its asymptotic worst case behavior is worse. An e cient complete test with respect to a chosen fault model is generated after the performance oriented conditional sum type adder (POCSTA) has been constructed. The testability results concentrate on the generation of complete test sets for the 1 Analogous problems arise during the implementation of the generator of Chan, Schlag, Thomborson, and Oklobdzija 5] . Their generator requires e.g. 60 Mbyte memory to generate the trade-o of the 84-bit adder.
stuck-at fault model, the cellular fault model and the robust path delay fault model. When we use a static fault model we assume that there is at most one fault in the circuit (single fault assumption). This is a usual assumption made in literature and by industry. For static fault models the size of the test set is linear in the depth of the circuit (stuck-at fault model) or linear in the size of the circuit (cellular fault model). For the robust path delay fault model the size of the test set is bounded by O(n 2 depth), where depth denotes the depth of the circuit. The quality of the test sets is analysed by giving lower bounds for their sizes which only slightly di er from the upper bounds provided by the construction. In many cases the tests are even optimal from the asymptotic point of view. For the stuck-at fault model test we generalize a test strategy given in 36] for the classical conditional sum adder (CSA). The construction method for the cellular fault model can easily be modi ed to any static cell based fault model between the cellular fault model and the stuck-at fault model. To obtain a fully testable adder using the robust path delay fault model the original adder has to be modi ed.
The worst case runtime of the rst phase, i.e., of the generation of the n-bit POCSTA with delay t n , is O(n 2 t 2 n logt n ), where t n is measured, e.g., in units of 0:1ns. The runtime of the second phase, i.e., of the generation of a complete e cient test, is bounded by O(n 2 ) for static fault models and is bounded by the size of the test set for the robust path delay fault model. The paper is structured as follows: Adders of the conditional sum type (CST) are described in Section II. Section III provides an introduction to basic notations and concepts of testing. The fault models used are introduced. The testability of the CST adders is examined and we obtain lower and upper bounds for the sizes of the test sets. The observations on testability are valid for all CST adders independent of the structure of the speci c adder. So we guarantee that all generated adders are testable; thus the designer does not have to worry about testability. The formulation of the dynamic programming problem performing area-time optimal adder design with respect to a given library is presented in Section IV where the computation models used are also described. We nish with a resume of the results in Section V.
II. Conditional sum type adders
In this section the class of the CST-adders considered in this paper is described.
CST-adders are a generalization of the adder presented in 29]. They are de ned in the following way: Let a = P n?1 i=0 a i 2 i , b = P n?1 i=0 b i 2 i be two n-bit numbers, and for any k 2 f1; . . .; n ? 1g let a (k;1) := P n?1 i=k a i 2 i?k be the number represented by the most signi cant n?k bits of a, and a (k;0) := P k?1 i=0 a i 2 i be the number represented by the least signi cant k bits. b (k;1) , b (k;0) are de ned analogously. Now, suppose that we already have designed a k-bit adder and an (n ? k)-bit adder computing simultaneously both, the sum and the sum plus 1 of their inputs. Then a+b and a + b + 1 can be computed by a k-bit adder, an (n ? k)- The selection stage is denoted by SEL n?k+1 and is called a SEL-chain or SEL-stage of size n ? k + 1. It consists of n?k+1 SEL-cells. The leftmost SEL-cell of a SEL-chain is denoted by carry-SEL. The remaining ones are denoted by block-SELs. The right input signals of a SEL-chain are connected to the right input pins of each SEL-cell in this chain. The SEL-cell itself is described as shown in Fig. 2 . Here, a MUX-cell is a multiplexer, where the select input is on the right. If logic 1 (0) is applied to the select input, the cell outputs the value applied to the leftmost (rightmost) upper pin.
The speci cation of the recursively de ned adder is completed by description of the 1-bit adder A 1 (see Fig. 3 ). To simplify matters in the sequel, we will call a A 1 -cell to be at bit position j, if a j and b j are applied to it. A SEL-cell is said to be at bit position j if its upper input pins are connected either to two output pins of the A 1 -cell being at bit position j or to the two output pins of a SEL-cell being at bit position j.
For notational convenience we use symbolic values to describe the functional behavior of the adders, namely, A = (0; 0), G = (1; 1), P = (1; 0), and R = (0; 1). The abbreviations A (absorb), G (generate), P (propagate), and Table I shows the symbolic function computed by the A 1 -cell (see Fig. 3 ). 
A P G R (c 1 ; c 0 ) A P G P (s 1 ; s 0 ) P R P R For the following it will be helpful to look at CST-addercircuits as tree-like structures computing products over M and thus lift our testability discussions on a symbolic level. Obviously, the choices of k determine the tree-like structure of the adder and, as a detailed analysis in the following will show, in uence its performance.
The tree-like structure is best re ected in the structure tree of a CST-adder. We will see that the structure tree uniquely determines a CST-adder A n and vice versa. The structure tree of A n is a labeled binary tree with n leaf nodes. Each inner node corresponds to exactly one recursive composition step by a chain of SEL-cells. The labels at the nodes de ne the bit size of the adder resulting from the composition step: Each leaf node has label 1, a non leaf node v has label k 1 + k 2 , if the left child (right child) of v has label k 1 (k 2 ). The length of the SEL-chain at node v is then given by k 1 + 1. The leftmost SEL-cell is the carry-SEL of the chain at v. Notice that the structure tree of A n is contained as a subcircuit in A n : Consider the tree built by the carry-SELs. For an illustration see Fig. 4 .
A SEL-chain is a boundary chain, if the outputs of the corresponding block-SELs are directly connected to primary output ports. Block-SELs (carry-SELs) in a boundary chain are called boundary block-SELs (boundary carrySELs). It easily follows that the number of boundary blockSELs is exactly n ? 1 for any CST-adder A n independent of the structure of A n . We will use boundary chains for the construction of complete test sets and the estimation of the quality of the sets as well.
Two other notions are important: The left-depth d left (A n ) of a CST-adder A n is the maximum number of left (= vertical) edges on a path from a leaf to the root of the structure tree. d left varies from 1 to n ? 1 in the class of n-bit CST-adders. The left-depth of the classical n-bit conditional sum adder (n = 2 k ) is log n = k (= depth of the corresponding structure tree), since in this case the structure tree is a complete balanced binary tree of depth k. In the example of Fig. 4 we have d left = 2. The height i of the structure tree contains all nodes whose distance to the root of the structure tree is i. A SEL-cell in A n has height i, if its SEL-chain corresponds to a node of height i in the structure tree. Thus, the height of a cell measures the distance from the primary outputs, whereas the usual depth d of a cell measures the distance from the primary inputs. The height of A n h tree (A n ) is the maximum height of a node in A n . Clearly, h tree (A n ) = d(A n ), which denotes the usual depth of the circuit A n .
III. Testability of CST-adders
In this section we study the testability of CST-adders with respect to the stuck-at fault model (SAFM), the cellular fault model (CFM) and the robust path delay fault model (RPDFM). In Section III-A we provide some basic notions and concepts of testing which are important for the derivation of the testability results in Sections III-B{III-D. In III-B{III-C we give lower and upper bounds for the complexity of complete test sets of an n-bit adder A n with respect to the SAFM and CFM. In case of the RPDFM we rst describe a modi cation of the CST-adder to obtain robust path delay fault testability for all paths and then derive bounds for a complete test also in this fault model (Section III-D). All upper bounds are based on the e ective construction of complete test sets. For an overview Table II summarizes in asymptotic form the results that will be proved in this section: 
The proof methods are independent of the choices of the parameter k, which determine the structure of the adders. All test sets can be generated from the structure tree of the adder. Thus, the test sets can be constructed e ciently after the generation of the area-time optimal adders (see Section IV).
A. Notations and de nitions
As mentioned before we consider combinational logic circuits realizing the addition of two binary numbers. In general, a combinational logic circuit (CLC) is de ned over a xed library and modeled as a directed acyclic graph C = (V; E) with some additional properties: Each vertex v 2 V is labeled with the name of a basic cell or with the name of a primary input (PI) or primary output (PO). The collection of basic cells available is given by the xed library. There is an edge (u; v) in E from vertex u to v, i an output pin of the cell associated to u is connected to an input pin of the cell associated to v, i.e., edges contain additional information to specify the pins of the source and sink node they are connected to. Vertices have exactly one incoming edge per input pin. Nodes labeled as PI (PO) have no incoming (outcoming) edges.
In this paper we use two libraries, the library LIB CST := Fig. 3 can easily be replaced by its basic cell realization. A CST-adder-circuit C 1 over LIB CST can easily be interpreted as a circuit C 2 over STD just by replacing each basic cell by its standard cell realization. C 2 is then called an expansion of C 1 .
2
The results obtained in the following can be directly transfered to the case where other libraries are given, e.g., NAND-, NOR-, NOTbased libraries that are most often used in CMOS design processes.
We are now ready to give a short description of the fault models (FM) considered in this article.
A Usually, the stuck-at model is only considered on the gate level, i.e., for circuits over the library STD. Here, we have given an obvious generalization to circuits over arbitrary libraries.
In the cellular fault model (CFM) 15] , 16] it is assumed that a fault modi es the behavior of exactly one node v in a given CLC C and that the modi ed behavior is still combinational. Since this fault can be detected by observing the incorrect output values of v for one suitable input combination, it su ces to test for faults of the following kind. A cellular fault in C is a tuple (v; I; X=Y ), where v is the faulty node (= fault location), I is the input for which v does not behave correctly, and X (Y ) is the output of the correct (faulty) node on input I.
In general, a cell based static fault model (CBFM) is obtained from the CFM by a restriction of the fault set that has to be considered. Any function g v;faulty which is unequal to the correct behavior g v at node v de nes a possible fault in the CFM. In many cases it is too expensive to check this large variety of possible faults, and methods like inductive fault analysis may indicate that satisfying results are obtainable, if only a certain subset F v of the possible faults at node v is checked. The stuck-at fault model dened above is a well-known example of a CBFM.
We nish the discussion of CFM, CBFM and SAFM with some general de nitions and remarks on the relation between the fault models. For this let C be any CLC over a xed library and FM a fault model as de ned above. A fault f in FM is testable, i there exists a test t for this fault, i.e. i the output values of C on applying t in the presence of f are di erent from the output values of C in the fault free case. The goal of any test pattern generation process is a complete test set for the circuit under test in the considered fault model FM, i.e., a test set that contains a test for each testable fault. It easily follows from the denitions that, given a xed circuit C, a complete test set in CFM is also complete in any CBFM and thus also in SAFM. The cellular fault model therefore is stronger than all other static fault models.
A widely accepted criterion for the quality of a complete test set is the size of the set, since it determines the test application time. We therefore de ne the test complexity of a circuit C with respect to a static FM by TC FM (C) := minfjT j; T is complete for C in FMg, where jT j denotes the size of the test set, i.e., the number of tests. Now, consider a circuit C 2 which results from a circuit C 1 by \expansion". Then, one can easily show that a complete test set for C 1 in CFM is also a complete test set for C 2 in CFM. Thus, the CFM is more powerful, if the size of the basic cells increases. We call this property the completeness property of CFM. Notice, that in SAFM there is a trend in the opposite direction. This is the reason why in general the strongest version of SAFM, i.e., SAFM for circuits over STD, is considered.
The other fault model we consider in this paper is the robust path delay fault model (RPDFM). The RPDFM 18] is a dynamic FM which checks whether the propagation delays of all paths in a given combinational logic circuit are less than the system clock interval. For our discussion of the path delay fault model the CST-adders are considered as combinational circuits over STD. A path is given by an alternating sequence of nodes and edges (v 0 ; e 0 ; v 1 ; . . .; v n ; e n+1 ; v n+1 ) starting at a PI v 0 and ending at a PO v n+1 . Inputs of nodes on the path where no edge e i of the path ends are called side inputs. A rising (0 ! 1) or falling (1 ! 0) transition propagates along , if a sequence of transitions t 0 ; t 1 ; . . .; t n+1 occur at the nodes v 0 ; v 1 ; . . .; v n+1 , such that t i occurs as a result of t i?1 . has a path delay fault for the rising (falling) transition, if the actual propagation delay of the transition along exceeds the system clock interval. For the detection of a path delay fault a pair of patterns (I 1 ; I 2 ) is required rather than a single pattern as in the cellular fault model and the stuckat fault model: The initialization vector I 1 is applied and all signals of the circuit C are allowed to stabilize; then the propagation vector I 2 is applied and after the system clock interval the outputs of C are controlled. A two-pattern test is called a robust test for a path delay fault on , if it detects that fault independently of all other delays in the circuit and all other delay faults not located on .
In Section III-D of this paper we concentrate on robust testing of path delay faults. A controlling value at the input of a node is the value that completely determines the value at the output, e.g., 1 (0) is the controlling value for OR (AND) and 0 (1) is the non-controlling value for OR (AND). It turns out that the construction of tests with the following property is possible: For each path delay fault there exists a robust test (I 1 ; I 2 ) which sets all side inputs to the non-controlling values on application of I 1 and remains stable during application of I 2 , i.e., the values on the side inputs are not invalidated by hazards or races. These tests are called strong robust PDF tests. In the following we only use such tests for the RPDFM, but for simplicity we call them robust PDF tests, too. For a detailed classication of path delay faults see 38].
B. Testability of CST-adders in the SAFM Next, we discuss the testability of the CST-adders in the fault models introduced before. In this section the weakest fault model, the SAFM, is considered. It is shown that a lower bound for the test complexity is given by twice the left-depth of the CST-adder. An upper bound of O(depth) is obtained by the e ective construction of a complete test set. For this, two construction phases \gene-ration at the lower leaves" and \generation to the higher leaves" are introduced which allow to e ciently apply methods presented in 36] for the test of CST-adders with a complete balanced binary structure tree.
We start with the lower bound. Theorem 1 (lower bound) Let A n be an n-bit CSTadder de ned over the library LIB CST . Then we have
Proof: For the lower bound we specify a set F of 2 d left (A n ) stuck-at faults, such that each fault in F requires an extra test. At rst, consider a path 0 with maximal left-depth in the structure tree. Look at the corresponding path of carry-SELs in A n . Then there are exactly d left (A n ) carry-SELs on whose upper input lines correspond to a left edge on 0 . Among these cells exactly the carry-SEL which is nearest to the destination of is a boundary carry-SEL. The remaining d left (A n )?1 cells are non boundary carry-SELs. Then the following two classes of stuck-at faults at the right input lines of these carrySELs de ne F: The stuck-at 1 fault at the upper right input line and the stuck-at 0 fault at the lower right input line, if the cell is a non boundary carry-SEL; the stuck-at 1 fault and the stuck-at 0 fault at the lower right input line, if the cell is a boundary carry-SEL.
The following observations now lead to the lower bound of 2 d left (A n ):
As mentioned earlier, only the values A; P; G can be generated at the inputs of carry-SELs. This can be easily proved by induction. Thus, the application of P at the upper input is necessary for the detection of any stuck-at fault at the right input lines, because A and G are leftstable elements.
It can easily be seen that a stuck-at 1 fault at the upper right input line and a stuck-at 0 fault at the lower right input line of a non boundary carry-SEL are testable and require the application of the values A and G at the right input lines, respectively. Therefore tests which apply PA and PG to the non boundary carry-SELs of must be contained in any complete test set.
Let x be the boundary carry-SEL on . It follows by inspection of Table I that the stuck-at 1 fault at the upper right input line is redundant, the stuck-at 0 fault at the lower right input line fault remains testable and therefore requires the application of PG to this cell. A stuck-at 1 fault at the lower right input line is testable and requires the application of PA or PP to the cell inputs.
If P is to be generated at the upper input of a carry-SEL x on , then the whole subtree of carry-SELs rooted at x, especially the carry-SELs of below x, must receive PP as inputs and cannot receive PA or PG which would be necessary to test a fault at carry-SELs below x.
We now come to the construction of complete test sets, which proves that the lower bound is also an upper bound from the asymptotic point of view, for adders with non differing depth and left-depth. A complete test in the SAFM for the classical conditional sum adder, where the carry tree is a complete binary tree, has already been constructed in 36]. We show how this construction can be generalized to CST-adders. At rst some notions for the description of test patterns have to be introduced: It turns out to be helpful to consider the structure tree, if patterns are to be applied to the cells of the circuit. Let v be a node whose left (right) child v 1 (v 2 ) is labeled with k 1 (k 2 ). Then the left incoming, the right incoming, and the outgoing edge of v correspond to k 1 + 1; k 2 + 1 and k := k 1 + k 2 + 1 uniquely de ned pairs of signal lines in A n , respectively. A valid assignment of the left incoming edge (right incoming edge) of v is a string S 1 :
). X i denotes the output value of the carry-SEL at node v i , the Y i j denotes the value of the j-th pair of signal lines (numbered from left to right). If (S 1 ; S 2 ) are applied to node v, the resulting output assignment S at node v is given by
Since S results from S 1 ; S 2 by repeated application of , we write S = S 1 S 2 . Now consider the patterns P k 1 := G k P, P k 2 := A k P, P k 3 := P R k , P k 4 := A P R k?1 , and P k 5 := G P R k?1 which are valid assignments for a node with label k. For simplicity of notation we often omit the superscript k and write P i instead of P k i . Using this and the composition rules P 1 = P 1 P 1 , P 2 = P 2 P 2 , P 3 = P 3 P 3 , P 4 = P 4 P 3 , and P 5 = P 5 P 3 we can easily conclude:
Lemma 1: Let v be any node in the structure tree of a CST-adder A n . Then P i for i = 1; . . .; 5 can be generated at the output edge of v by setting the inputs of A n appropriately.
The lemma above is a generalization of a corresponding lemma in 36] and demonstrates that the composition laws for P i do not depend on the length of the SEL-chains. A complete test set in the SAFM for CST-adders with complete balanced binary structure tree (CBBST) is now given by the following lemma, which is a summary of the testability properties derived in 36].
Lemma 2 (upper bound for CBBST) A CST-adder A n with CBBST is completely testable over the library LIB CST in the SAFM with 6 logn + 2 (= 6d left (A n ) + 2) test patterns, as de ned in Fig. 5 . Furthermore, the test set ful lls the following propagation properties:
{ Let F be a testable fault at the input lines of a cell x of height j. Then F is tested by a pattern constructed in step (1) or a pattern constructed in step (2j), . . ., (6j) or (7j).
{ If x is a block-SEL at bit position k the faulty di erence is propagated by the uniquely determined sequence of block-SELs to the output port for bit position k. Along this path the faulty di erence is only combined from the right with output values of carry-SELs.
{ If x is a carry-SEL at bit position n, the faulty di erence is propagated by the uniquely determined sequence of carry-SELs to the output port for the carry bit of the adder A n . Along this path the faulty di erence is only combined from the right with output values of carry-SELs.
{ If x is a carry-SEL at bit position k < n, the faulty di erence is propagated to an output port for a bit position k 0 > k. k 0 uniquely de nes a sequence of carry cells followed (1) For i = 1; 2 apply (P i ; P i ) to all nodes.
For any height j = 0; . . .; logn ? 1 (2j) Apply (P 3 ; P 4 ) to all nodes of height j (3j) Apply (P 3 ; P 5 ) to all nodes of height j (4j) Apply (P 1 ; P 2 ) to all nodes of height j (5j) Apply (P 2 ; P 1 ) to all nodes of height j (6j) Apply (P 3 ; P 4 ) and (P 3 ; P 5 ) alternately to the nodes of height j starting with (P 3 ; P 4 ) at the rightmost node (7j) Apply (P 3 ; P 5 ) and (P 3 ; P 4 ) alternately to the nodes of height j starting with (P 3 ; P 5 ) at the rightmost node by a sequence of block cells which propagates the di erence. The rst block cell x 0 in this sequence is the only cell of the whole sequence where the faulty di erence is not combined with the output value of a carry-SEL but with the value P at the upper input lines of x 0 . Now consider the SEL-stage of x 0 . Assume x 00 is another block cell in this SEL-stage that receives P at the upper input lines. Let k 00 be the bit position of x 00 . Then the faulty di erence is also propagated to an output port for a bit position k 00 > k. For the proof of the lemma we refer to the original paper 36]. We rather show that a (non trivial) generalization to CST-circuits is possible. For this, consider the structure tree of a CST-adder-circuit A n , that is not necessarily complete and balanced. Now, consider the smallest complete balanced binary tree which can be pruned to the structure tree of A n , i.e., the smallest complete binary tree obtained by expanding the structure tree of A n . We call it the pseudo structure tree of A n . For illustration see Fig. 6 . The non-bold part does not belong to the original structure tree. The pseudo structure tree corresponds to a larger CBBST-adder, in which A n is embedded. Now, a complete test set for A n is constructed in the following way: Compute a pseudo complete test set for this pseudo structure tree by using Lemma 1 and 2. For each test pattern this leads to a labeling with the P i 's of the pseudo structure tree, values are in particular assigned to the outputs of all carry-SELs. Use this information to modify any pseudo test pattern P pseudo to a test pattern P of A n as follows: Consider the leaf nodes of the structure tree of A n one by one. If the leaf node considered is also a leaf node in the pseudo structure tree, maintain the labeling of the node. If the leaf node corresponds to an inner node of the pseudo tree, replace the label by one of P 1 1 ; P 1 2 ; P 1 3 , such that the rst element (= output value of the carry-SEL) remains unchanged. This uniquely de nes the new labels, if one takes into account that the equations P 1 1 = P 1 5 and P 1 2 = P 1 4 hold. Of course, the modi cation de nes an input pattern for A n . Now, look at a cell x pseudo in the pseudo adder, that stands for a \real" cell x in A n . Assume that x pseudo and x have height j and that the pseudo test pattern P pseudo is the test constructed for a stuck-at fault at cell x pseudo .
Then the resulting test pattern P for A n is a test for this stuck-at fault at x because of the following reasons: Due to the special patterns considered and the generation laws given in the proof of Lemma 1, P and P pseudo exactly generate the same values at the inputs of x and x pseudo , respectively. Furthermore, our modi cation of P pseudo to P guarantees that the values on outputs of carry-SELs remain unchanged. Look at Fig. 5 and the de nition of the P i 's to see that P pseudo and P apply P to the upper inputs of at least one block-SEL in any SEL-chain with height < j. Now apply the propagation properties of Lemma 2 to obtain that P is a test for the considered fault.
We end with a complete test set for A n in SAFM. The construction is conceptually simple, but it may require exponential time, since the pseudo tree is possibly exponentially larger than the structure tree of the circuit considered. In the proof of the next theorem we show, that this disadvantage can be avoided. On the whole, we get the following theorem:
Theorem 2 (SAFM-test) Let A n be an n-bit CST-adder.
Then a complete test set T SAFM for A n in SAFM with j T SAFM j = 6 d(A n ) + 2 can be constructed in time O(n d(A n )). Proof: Note that the depth of the pseudo structure tree is equal to the depth of the structure tree of A n . Therefore, only the time bound remains to be proved. For this look at the test patterns of Fig. 5 . The construction has to be done e ciently for all seven types of patterns de ned there. For type (1) there is nothing to prove.
Therefore consider a xed height j and the pattern of type (2j). (P 3 ; P 4 ) is easily generated at the inputs of any non leaf node of height j by choosing the leaf inputs of the leaves in height k with k > j appropriately according to the composition laws given in Lemma 1. We call this phase of the construction \generation to the higher leaves" There may be leaf nodes in a height k j. Remember, that it was important for the construction of the test patterns from the pseudo test patterns, that the value at the output of the carry-SELs remained unchanged. Application of (P 3 ; P 4 ) in height j generates carry values A in the lower heights. We therefore place P 1 2 at the leaves in heights k with k j. This phase of the construction is called \generation at the lower leaves" This completes the construction for patterns of type (2j).
Type (3j), (4j), (5j) are handled analogously. P 1 , P 1 , P 2 , respectively, is used for the generation at the lower leaves.
The remaining types (6j) and (7j) are slightly more complicated. We rst de ne the parity of a node v in the struc-ture tree as follows: If the outgoing edge of v is a right input edge of a node in the next (lower) height, the parity of v is even. Otherwise the parity is odd. Now consider type (6j). In a rst phase, the \generation to the higher leaves" we generate (P 3 ; P 4 ) ((P 3 ; P 5 )) at the even (odd) non leaf nodes of height j. The \generation at the lower leaves" works as follows:
Even (odd) leaf nodes of height j are labeled with P 2 (P 1 ). (Even (odd) nodes of height j have to produce A (G) at the outputs of their carry-SELs!) Leaf nodes in heights k with k < j are labeled with P 1 . (At the nodes in heights k < j the outputs of carry-SELs have to be G.) The construction for type (7j) is done analogously.
It follows from the construction that we obtain the same complete test set as with the help of the pseudo structure tree. Furthermore it can easily be seen that the time complexity can be bounded from above by O(n d(A n )).
Thus, we have given the construction of a complete test set whose size is linear in the depth of the circuit. From Theorem 1 we know that the left-depth is a lower bound, i.e., for circuits with non di ering depth and left-depth we have constructed an optimal test from the asymptotic point of view. The algorithm described above has been implemented to generate test sets for CST-adders.
Our test for the SAFM is based on the library LIB CST . A complete test based on a library with primitives of lower complexity often is desirable in the SAFM to obtain a set of patterns covering many potential physical defects. From 36] it is known that a (logn)-test is possible for the classical conditional sum adder viewed as a circuit over STD. Since the same methods are used as before, a generalization along the lines given above is possible for the CST-adders. We omit the detailed construction and switch over to the construction of a complete test set for a more powerful static fault model.
C. Testability of CST-adders in the CFM
CFM is a more powerful fault model than the SAFM. We show that for CST-adders this leads to a lower bound of n for the test complexity. We construct complete test sets whose worst case size is quadratic in n. We start again with the proof of a lower bound.
Theorem 3 (lower bound) Let A n be an n-bit CSTadder over the library LIB CST . Then we have TC CFM (A n ) n.
Proof: For the lower bound consider the n ? 1 boundary block-SELs and the output of the rightmost A 1 -cell.
Notice that for each cell there exists a testable cellular fault that requires the output P in the correct case. Fix the boundary block-cell x at a bit position i. P as value at the output requires P on the upper and right input (R cannot be generated at the right input!). Now, it follows by induction on n that the right and upper input of any boundary block-cell with bit position < i necessarily have value P and R, respectively, and that the two rightmost output pins of the rightmost A 1 -cell have value R. In particular neither a boundary block-SEL to the right of x nor the rightmost A 1 can have output P at the same time as x. The theorem above shows that in CFM there is no hope to nd a complete test set which has a sublinear size compared to the number of input bits. In the SAFM this is possible and can really be done as we have shown above. In this sense the CFM is computationally harder than the SAFM. A lower bound for a CBFM cannot be given in a general form. It strongly depends on the subset of faults that have to be tested.
For the construction of a complete test set it is useful to determine the input combinations which can be generated at SEL-cells and the di erences that can be propagated from the output of SEL-cells to the primary outputs. For this consider a SEL-cell x. The upper input of x, the right input of x and the right inputs to the SEL-cells on the path from x to the output are pairwise independent from each other, since they all correspond to disjoint subcircuits. The value R can never be generated at an input of a carry-SEL. A and G can never be generated at the right output of A 1 -cells. All the remaining input combinations can be generated at the inputs of SEL-cells. We therefore obtain the following lemma for any CST-adder A n .
Lemma 3 (generation for CFM) { At the inputs of carry-SELs 9 di erent input combinations described by XY with X; Y 2 fA; P; Gg can be generated. { At block-SELs that receive their upper input from an A 1 -cell 6 di erent input combinations given by XY with X 2 fR; Pgand Y 2 fA; P; Gg can be generated.
{ At the remaining SEL-cells 12 di erent input combinations described by XY with X 2 fA; P; G; Rg; Y 2 fA; P; Gg can be generated. It turns out that the di erences X=Y with X=Y 6 = A=P; P=A; G=R; R=G can be made visible at the primary outputs independent of the fault location from where they have to be propagated. We therefore call them easy for the following. For the remaining di erences propagation may be impossible. They are called hard. Note that hard differences are exactly the di erences which lead to a faulty result only on the sum+1 line. More precisely, we obtain the following lemma whose proof follows from the overall structure of CST-adders and Table I : Lemma 4 (propagation for CFM) { Hard di erences at the rightmost outputs of the rightmost A 1 -cell and at the outputs of boundary block or boundary carry cells cannot be made visible. (The nal sum plus 1 is not of interest!) { If an easy di erence has to be propagated from the output of a carry-SEL, consider the tree of carry-SELs and propagate the di erence along the uniquely determined path to the output of this tree by combining it with the neutral element P.
{ If a hard di erence has to be propagated from the output of a carry-SEL, that is not a boundary carry cell, again consider the tree of carry-SELs and the corresponding path to the output. There exists at least one side input on this path, which is a right edge. Set this input to G and thus map the hard di erence to an easy one. The remaining side inputs are set to P.
{ If an easy di erence has to be propagated from the bottom output of a block-SEL, consider the uniquely determined path to the primary output at the same bit position and set the side inputs, which all are outputs of carry-SELs, to P.
{ If a hard di erence has to be propagated from the bottom output of a block-SEL, which is not a boundary block cell, again consider the uniquely determined path to the primary output at the same bit position. Set the rst side input to G, the remaining ones to P.
For an n-bit CST-adder A n de ne n SEL as the number of SEL-cells in A n , n carry as the number of carry-SELs in A n , n lowblock as the number of block-SELs that receive their upper input from an A 1 -cell, and n highblock = n SEL ? n carry ? n lowblock . Note that the equations n carry = n ? 1 and n lowblock = n ? 1 hold. We combine the lemmas above to obtain the following theorem:
Theorem 4 (CFM-test) Let A n be an n-bit CST-adder.
Then a complete test set T CFM for A n in CFM with jT CFM j = 8n + 18n carry + 12n lowblock + 24n highblock = 24n SEL ? 10n + 18 = O(jA n j)
can be constructed in time O(n 2 ). Proof: For each cell x and each input combination which can be generated at x, two test patterns are required, namely one to propagate an easy di erence and one to propagate a hard di erence from the output line of x. Thus, 8 patterns su ce for the complete test of an A 1 -cell. The remaining number of tests su ces for the test of the SELcells according to Lemma 3 and 4. Since O(jA n j) = O(n 2 ), the time bound follows, if we make sure that each test requires constant construction time. We use again Lemma 3 and 4. From the lemmas it follows that we only have to store the values at special positions for the generation and the propagation, but not all input values. Therefore we perform a bottom-up computation and store all information in the structure tree of the adder. Thus, each node in the tree is visited only once.
From an asymptotic point of view the test is not optimal in many cases, e.g., for the classical conditional sum adder. At the moment it is open, whether there exists a test with test complexity O(n) for any n-bit CST-adder. It should be mentioned at this point that the test set can easily be adapted to a general cell based static fault model: Patterns that test for cellular faults which are not contained in the fault model are simply deleted.
D. Testability of CST-adders in the RPDFM
We now focus on a powerful dynamic fault model. We rst show that the adders described in Section II are not fully robust path delay fault (RPDF) testable and so a reconstruction is given to obtain a testable circuit. The modi cations described are similar to the one applied in 39] to design testable adders. The construction of the test sets is completely di erent, since the structures of the adders are di erent. For the modi ed adders we prove that they are fully RPDF testable with a test set of size O(n 2 d left (A n )). Furthermore, we show that this is optimal up to the factor of d left (A n ).
Construction of testable adders
We start with a consideration of the carry-tree of the CST adders. The carry-computation is realized by multiplexers. It is easy to see that MUXes are not RPDF testable without the application of value R=(0,1) at the northern input (see Fig. 7 ), since for a robust test we must apply non-controlling values at the side inputs along the path to be tested. But R=(0,1) can never be generated at the northern inputs of the carry-tree. One posibility to overcome this di culty is to modify the function on the don't care set, so that the same function is realized on the applicable values.
A cell for such a function is shown on the left side in Fig.  8 . If we replace the multiplexers of the carry computation by the cell on the left side in Fig. 8 we obtain a circuit Table III . This is the same operation on the applicable values A, P and G as shown in Table I , i.e., such a replacement will not change the correct functional behavior of the circuit.
The new SEL-cell is \locally" RPDF testable, but the composition of two such cells (illustrated in Fig. 8 on the right) still has some untestable paths. When we have a look at the path starting at the upper left input and then going to the AND-gates X and Y, it is not di cult to check that this path cannot be robustly tested, since we have a reconvergency at Y . By examining the thick line in Fig.  8 we can see, that a stuck-at 1 fault on this signal cannot be propagated to the outputs: A test for a stuck-at 1 fault requires a 0 at the upper left input and this determines the values of the outputs. Thus, we can remove the thick line from the circuit and also the AND-gate X. We have to directly connect the output of the OR-gate Z with the former output of gate X. It can be checked that the resulting circuit has no untestable paths with respect to the RPDFM. By induction we obtain that the vertical composition of n such cells, called carry stage, is RPDF testable without the application of the value R.
Next, we have a look at some carry-stages simultaneously, so that the output of the rst one is the side input of the next one (see Fig. 9 ). It is shown that this can lead again to an untestable path due to a reconvergence (thick It turns out that the idea of \path doubling" and subsequent removal of stuck-at redundancies as it is presented in 40] for the removal of PDF untestabilities can also be applied in the case of reconverging paths: The AND-gate X in Fig. 9 has to be doubled. One AND-gate is then used for the carry-computation and the other one for the multiplexers. On the left input of the AND-gate for the carry-computation is an untestable stuck-at 1 fault. So it can be removed as in Fig. 8 . Again this construction can easily be generalized to the complete carry-tree and results in the following modi cations: Carry cells whose output lines correspond to a vertical edge in the carry tree are replaced by the upper right cells in Fig. 9 . Carry cells whose output lines correspond to a non-vertical edge in the carry tree are replaced by the cell given in Fig. 10 . Boundary cells are left unchanged.
All in all, we obtain modi ed adders of the CST with identical functional behavior, that, as we will see in the following, are fully RPDF testable. These testable adders are called RCST-adders (= robust CST-adder).
Testability and test complexity
In this subsection we prove the RPDF testability of all paths in the adders obtained from the modi cations described in the last subsection. We assume k 2 f n 2 ; . . .; n ? 1g. This assumption is based on a heuristic optimizing the generator with respect to runtime (see Section IV). It simplies the construction of the test patterns and the calculation of the sizes of the test sets.
We divide the paths to be tested into three classes: The test patterns for the test of the paths will be denoted by the values at the output of the A 1 -cells, using the coding introduced in Section II. The number of tests obtained in this way has to be multiplied by a factor 4 to determine the test complexity of the whole adder. (It is easy to see that all paths in the A 1 -cells are RPDF testable.)
The patterns are denoted as indicated by the following example: PR,..,PR,PR/GP,PR,..,PR,GP,PR,..,PR means, that all A 1 -cells keep their value except one, which has the PR/GP-change. This notion is only used in Lemma 5. All the other proofs use arguments on the structure of the adders to describe the test sets. If we have a bound for the number of paths, this directly implies a bound for the number of test patterns, too.
Lemma 5 (sum paths) All sum paths S(A n ) of an RCST-adder are RPDF testable by 2 n 2 test patterns.
Proof: At the s-outputs of the A 1 -cells we only have the values R or P (consider the EXOR-NOT-combination and Table I ). The signals directly run through the SELstages to the outputs (see Fig. 11 at the left). At the SEL-stages we use all combinations of A and G at the side inputs, since the right (left) upper input of a multiplexer in a SEL-cell is tested using the value A (G). This is su cient to sensitize a sum path and thus we get a number of tests exponential in the number of SELstages where the path runs through. The number is limited by log n, since k 2 f n 2 ; . . .; n ? 1g. A detailed analysis shows that we obtain patterns of the form: PR,..,PR,PR/GP,PR,..,PR,GP,PR,..,PR,AP,PR,..,PR.
The positions where the values AP and GP must be applied can be directly derived from the structure tree. We get the upper bound by using this construction for all A 1 -cells separately.
The RCST adders were constructed in such a way, that all paths are robust PDF testable. In the following we omit the detailed construction of the test patterns. Instead, we prove upper and lower bounds for the size of the test set. For this, we use the structure of the adders as described in Section II. We rst give an upper bound on the number C(A n ) of carry paths depending on the left-depth d left (A n ) of the speci c adder A n . For this, we determine C i (A n ), which describes the number of carry paths in an n-bit RCST-adder without boundary SELs, i.e., A n is a subcircuit of a larger adder corresponding to a subtree of the structure tree, whose root node is not a boundary cell.
Lemma 6 (carry paths) Let A n be an n-bit RCST adder. Then C i (A n ) c d left (A n ) n and C(A n ) 2 c d left (A n ) n hold for all c 3.
Proof: It can directly be seen that the rst proposition holds for n = 2, since C i (A 2 ) is equal to 5. The general case is proven by induction. The number C i (A n ) of \inner" carry paths is given by C i (A n ) = C i (A k ) + C i (A n?k ) + n ? k: (1) Equation (1) (3) Again the formula can be derived from Section II. We obtain the factor 2 CS(A n?k ), because in the new SELchain each path can continue in two ways. 2 C(A k ) (n?k) describes all carry paths of A k that can branch out in the SEL-stage. The proposition holds for n = 2, since CS(A 2 ) is equal to 8. d denotes a constant that will be determined in the following. Thus we obtain:
The last estimation holds for d 4c = 12 and k n 2 . This can be seen by a straightforward computation.
We resume the results of Lemmas 5-7. Theorem 5 (upper bound) All paths in an n-bit RCSTadder A n are robust PDF testable by a test set of size O(n 2 d left (A n )). (A n ) n (2 n + 1) . Now, we give a general lower bound for the size of the test set for an arbitrary member of the class of RCST-adders.
Theorem 6 (lower bound) Each test set of an adder of the RCST has size (n 2 ).
Proof: We have a look at the inputs from a 1 to a n 4 and to the outputs of the adder from the 3 4 n-th to the n-th position. We only examine the carry outputs of the A 1 -cells. Each a i can in uence each s j and thus there must exist a carry-sum path for each pair (a i ; s j ). Thus all in all n 2 16 carry-sum paths must be tested.
The RCST adders are de ned recursively and by the assumption k 2 f n 2 ; . . .; n ? 1g it follows that all the input signals a i and all the output signals s j are only connected by two carry signals (see Fig. 1 ).
We now examine the carry-sum paths in more detail. The paths starting at the c-outputs of the A 1 -cells must run through the multiplexers in the SEL-chains. Therefore a P must be present at the northern inputs of the multiplexer. To generate a P we must use a G or an A at the inputs of the A 1 -cells (see Table I ), but this destroys a parallel test since A and G are left-stable elements (see Table III ).
Thus, the paths enumerated above must be tested separately and we obtain the lower bound.
We nish with an examination of two special adders of the RCST and prove optimal asymptotic bounds for their test sets: The classical conditional sum adder (CCSA), where k is recursively chosen as n 2 , and the carry ripple adder (CRA), where k is recursively chosen as n ? 1. We prove the lower bounds for these adders using the equations (1), (2) and (3).
The carry paths of the CCSA C(A n ), with k = n 2 , are given by: C(A n ) = C(A k ) + 2 C i (A n?k ) 2 C i (A n?k ) n 4 log n 2 Thus, the number of carry-sum paths can be calculated by equation (3):
3 CS(A k ) + n log n 4 n 8 = (n 2 logn) So the size of the test set is optimal from the asymptotic point of view, since the left-depth of the CCSA is d left (CCSA) = logn.
The same considerations for the CRA show, that the number of carry-sum paths are bounded by (n 2 ). This is also optimal, since it has left-depth d left (CRA) = 1.
IV. Generator
In this section we focus on the problem of nding an area-time optimal adder of the CST. First a de nition of the computation models for area and time is given (Section IV-A). Then area-time optimal adder design is formulated as a dynamic programming problem (Section IV-B). We start with a description of a straightforward implementation. Unfortunately, this generation method is impracticable for large bitlengths because the memory requirement is too large. Therefore we integrate point location into the multi-dimensional dynamic programming approach which makes the method applicable for practice because it reduces the memory requirement. The computational complexity of the algorithm proposed is investigated in Section IV-D. We close Section IV by presenting some heuristics to speed-up the computation (Section IV-E). Statistical experiments illustrate and motivate the di erent steps shown.
We do not focus on the construction of RCST adders, as they have been de ned in the last section. But the methods presented can be directly transfered.
A. Computation models
The delay model is based on the intrinsic-plus-fanout delay model which is also used by many industrial VLSI design systems, e.g., VENUS S-S, Semicustom Design System, 1.0 m from SIEMENS, Munich 30] , and by the logic synthesis and optimization benchmarks of the Microelectronics Center of North Carolina 31] . In this model, the description of a cell x with one output pin contains 41] { the intrinsic delay times t up (x; i) and t down (x; i) of the ith input of cell x, i.e., the propagation delay times of x with respect to the rising edge i and the falling edge i, respectively, if the capacitive load CL on the output of x is equal to 0, { the capacitance c in (x; i) of the ith input of x measured in load units (LU), where one load unit corresponds to the input capacitance of a two-input NAND gate, { the load dependences ld up (x) and ld down (x) of the output of x measured in ns/LU.
For our purpose we only have to consider basic cells with one output pin. Since t up (x; i), t down (x; i), t up (x; j) and t down (x; j) di er from each other only a little bit for i 6 = j and x 2 fAND; OR; EXOR; MUXg, we consider t(x) := max i ft up (x; i); t down (x; i)g as the intrinsic delay time t(x) of cell x in order to simplify matters. Furthermore, we consider ld(x) := maxfld up (x); ld down (x)g as the load dependence ld(x) of cell x.
The timing of a CMOS cell depends on the capacitive load on the cell output, i.e., the input capacitances of the following CMOS cells and the wiring. The delay time d(x) of cell x for a given capacitive load CL can be approximately determined by the equation d(x) = t(x) + ld(x) CL where CL := C cells + C wires holds and C cells denotes the sum of all the capacitances of the cell input pins at the driven network. C wires denotes the sum of the wiring capacitance. Since the design presented is done on a symbolic design level and since the physical wire lengths are unknown, we assume C wires = 0 for this paper. Note that the wire lengths, i.e., the wire capacitances, can easily be approximated if the physical design is done along the lines of the symbolic design. The cell characteristics used in our examples are taken from the SIEMENS library mentioned above. For each cell x except the MUX-cell the capacitances of the inputs of x di er from each other only a little bit so that we can con ne ourselves to the maximum of these values. The capacitance of the select input s of the MUXcell is much larger than the capacitance of the remaining data inputs (denoted by d) so that we di erentiate between these two values.
The area-time optimal adder design presented in the following section is done on a symbolic design level. Therefore, we approximate the area of a circuit by the cell area, i.e, we do not take the interconnect area into account. The lengths of the wires cannot be estimated in general, since they largely depend on the technology used. For some design processes this simplifying assumption is realistic , i.e., channelless gate arrays or sea-of-gates 33], 32]. Note once more that the wire length can easily be approximated if the physical design is done along the lines of the symbolic design.
B. Formulation as dynamic programming problem
The problem can be described as follows:
Given the number n of bits and the delay t n (e.g. measured in units of 0.1ns) of the n-bit addition.
Construct an area-minimal n-bit adder of the conditional sum type with delay t n if it exists.
We denote such an adder by performance oriented conditional sum type adder of type 1 (POCSTA 1 ). Now, assume that for any m < n and t m t n an m-bit If k is not uniquely determined by the constraints above, we choose any one minimizing the resulting area. Now, consider an array AREA of n rows and t n columns. Assume that for all m < n and t m t n the entry AREA m; t m ] is equal to +1 if no m-bit adder of the conditional sum type with delay t m exists, and equal to Let find1(n; t n ) be the function computing k as well as the area of the corresponding POCSTA 1 and storing these values in the arrays K and AREA given the entries of K m; t m ], and AREA m; t m ] for all m < n and t m t n . Then the program given in Fig. 12 computes a POCSTA 1 for given n and t n . Procedure init initializes the arrays, namely the entries for m = 1 and m = 2 because there is no area-time trade-o for the 1-bit POCSTA 1 and the 2-bit POCSTA 1 . The term t 2 denotes the delay of the 2-bit POCSTA 1 . The runtime of the function find1(m; t m ) is c 0 m. Thus, the runtime of the dynamic program above performing area-time optimal adder design is c n 2 t n . The constant c is small. Theorem 7: Let n and t n be given. An n-bit POCSTA 1 with delay t n can be generated in time O(n 2 t n ) and in space 8 n t n + O(1) bytes if there exists such an n-bit adder.
Proof: Because of the discussion above we only have to prove that 8 n t n +O(1) bytes su ce. Each entry of the array AREA occupies 4 bytes, that is 4 n t n bytes for the whole array. After having computed the area of the areaminimal adder which satis es the delay requirement, the corresponding adder has to be constructed. This can be done top-down because of array K which occupies 4 n t n bytes memory, too.
The algorithm has been applied to the 16-bit POCSTA 1 s. There are 21 di erent POCSTA 1 s by using the SIEMENS library mentioned above and a maximum fanout up to 20 LU. The area of the slowest adder (with delay 17.7ns) is two thirds of the area of the fastest one (with delay 6.9ns). There is no area-time optimal adder with tree depth equal to 4, i.e., the adder computing the carries through a total balanced tree is not area-time optimal with respect to the library used. Fig. 13 shows an example of a 16-bit Looking at Fig. 1 we observe that the 2 k rightmost output pins of the k-bit adder are directly connected to the output pins of the whole n-bit adder whereas the 2 carry output pins of the k-bit adder have to drive the multiplexers of the selection stage. Thus, in order to obtain better area-time optimal adders, it seems to be reasonable to differentiate between the signal delays of paths going from the input pins of an adder to the carry output pins and the signal delays of those paths going from the input pins to the remaining output pins.
As in Section III we denote paths going from the input pins to the carry output pins of an adder by carry-paths and paths going from the input pins to the remaining output pins of an adder by sum-paths. The delays of the carrypaths and of the sum-paths are denoted by carry-delay and sum-delay, respectively.
Di erentiating between carry-delays and sum-delays has the consequence that a generator parametrized by the operands' bitlength n, by an upper bound t (c) n for the carrydelay, and by an upper bound t (s) n for the sum-delay computing an area minimal n-bit adder with carry-delay t (c) n and sum-delay t (s) n is demanded. In any adder constructed by the generator proposed in the last paragraph the carry-delay always equals the sumdelay (see Fig. 1 ). Thus, to realize the idea of di erentiating between carry-delays and sum-delays we have to \disconnect" the carry-paths and the sum-paths from each other. This can be done by directly routing the carries driving the selection stage to the carry-SELs and by disconnecting these paths from the sum-paths by introducing non-inverting bu ers (denoted by 4) in front of the blockSELs (see Fig. 14) . Of course, the bu ers are not inserted if the cut is at position k n ? , where is a constant which has to be determined dependent on the library used. Note that this modi cation does not in uence the test complexity and testability. Now, the problem is given as follows: Given the number n of bits and the delays t (c) n and t (s) n (e.g. measured in units of 0.1ns) of the n-bit addition.
Construct an area-minimaln-bit adder of the CST with carry-delay t (c) n and sum-delay t (s) n . We denote such an adder by performance oriented conditional sum type adder of type 2 (POCSTA 2 ). Analogously to the solution presented above, this problem can be formulated as a dynamic programming problem. Assume that for any m < n, t ( Fig. 15 realizes the generator required. Note that only during the generation process we di erentiate between carrydelays and sum-delays. The designer himself only asks for an area minimal n-bit adder with delay less than some bound t n . The runtime of function find2(m; t (c) m ; t (s) m ) is c 00 m. Thus, the runtime of the program is C n 2 t 2 n . The constant C is small. The memory requirement increases from 8 n t n bytes to 8 n t 2 n bytes.
Theorem 8: Let n and t n be given. An n-bit POCSTA 2 with carry-delay t n and sum-delay t n can be generated in time O(n 2 t 2 n ) and in space 8 n t 2 n bytes if it exists.
We have applied this new algorithm to the 16-bit POCSTA 2 s, too. As it was hoped, the trade-o consisting of 31 di erent POCSTA 2 s is much ner than the previous one which consists of only 21 di erent POCSTA 1 s. The fastest POCSTA 2 is 0.9ns (about 13 percent) faster than the fastest POCSTA 1 although its area is 5 percent smaller. Once again there is no POCSTA 2 of tree depth equal to 4. proposed always requires about 8 n t 2 n bytes of memory. Of course, the memory requirement can be decreased to 4 n t 2 n bytes by not storing the cut positions, i.e., by not computing the entries of array K. (Note that they are only needed during the computation of print POCSTA 2 (n; t n ). The entries needed can be computed in a top-down phase.) Nevertheless, this requirement is too high. Consider Table  IV where the memory requirement is shown for the two cases. The rst column denotes the operands' bitlength, the second one denotes an approximation of the delay of the slowest n-bit POCSTA 2 measured in 0.1ns, and the third and fourth column denote the memory the generator requires. So to generate the trade-o of the 64-bit adders, where the delay of the slowest one is about 70.4ns, the generator requires 4 64 704 2 = 126877696 bytes or 8 64 704 2 = 253755392 bytes memory. In the remainder of this section we show how to re ne the dynamic program in order to solve this problem. There is an important feature of our problem which has not yet been exploited, namely that tightening up the delay requirements results in larger adders. Let N := f1; 2; . . Proof: Obviously, if we decrease the delay time we can only construct larger adders, since more computations must be performed in parallel. The case that we can construct smaller adders with smaller delay can never occur, since in this case we can also choose the smaller adder for the position in the matrix where more area is available. The lemma is also valid, if we take the interconnection area into account, although the resulting adders might be di erent. Note that the proof of the lemma is independent of the structure of the circuits considered. Thus, the statement also holds for other structures like other adder types, multipliers and so on. V m yet, add it. In this case, (x; y + 1) lies on a directed edge which has to be broken. Note that the edge starts from a vertex which dominates (x; y), { a directed edge from (x; y) to (x + 1; y), if (x; y) 2 V m n fs; tg is a vertex of outdegree 0. If (x + 1; y) is not in V m yet, add it and break the edge on which this new vertex lies. Note that the edge ends in a vertex which dominates (x; y).
In order to realize function find2(n; t (c) n ; t (s) n ), it su ces to store the graphs G m for all m < n. If this is done, the problem to be solved is point location 35] for G m . In general, point location is de ned as follows: Given a subdivision of the plane induced by a planar graph G and a test point p, identify the region which contains p.
In general point location can be solved by the following lemma: Lemma 9 ( 43]) Point location in an N-vertex planar subdivision can be e ected in O(log N) time using O(N) storage, given O(N log N) preprocessing time.
In the next subsection we show that the application of the lemma does not improve the generator on the assumption that every entry of AREA m; :; :] is a vertex of G m . However, experiments showed that this worst case seems to never occur. In order to present these experiments, we have to work out the key idea of the proof of the lemma above. This leads to the introduction of notions we need in the following. The exact proof of the lemma can be found in 43], 35].
The crucial point of this lemma is the construction of a monotone complete set of paths in G. A set M of paths is called complete if the union of all the paths contains the graph. A set M of paths is called monotone if for any two paths C i and C j of M, the vertices of C i which are not vertices of C j \lie on the same side" of C j , i.e., the paths are ordered. Thus, one can apply the bisection principle, i.e., binary search, to G. If there are r paths in M and the longest path has p vertices, the search time uses O(logp log r) in the worst-case. The search time can be reduced to O(logN) by the bridged chain method introduced by Edelsbrunner, Guibas and Stol 43]. A further improvement is motivated by the observation that the bitlength n?k of the left adder A n?k (t (c) n?k ; t (s) n?k ) is less than the bitlength k of the right adder A k (t (c) k ; t (s) k ) of an n-bit POCSTA 2 in most cases. Table VI shows the amount of n-bit POCSTA 2 s where the inequation ( ) k n 2 ? 1 does not hold (20 n 90) using the SIEMENS library mentioned above. Thus, it seems to be tolerable to restrict the algorithm to cuts k n 2 ? 1. This modi cation of the algorithm has only small in uence upon the trade-o itself. However it speeds up the actual runtime by a factor 2.
V. Conclusion
We presented a realistic and systematic method for constructing area-time optimal adders of the CST together with e cient complete test sets with respect to a chosen fault model FM.
FM can be chosen as a static (SAFM, CFM) or dynamic (RPDFM) fault model. We have proven that all generated adders can be tested using small test sets. The optimal test set for an adder of the (R)CST can be directly generated from the structure tree of the adder.
We presented various methods to generate the area-time optimal adders. The complexity analysis of the generators and the experimental results showed that it is also possible to generate adders of large bit-lengths. In Fig. 18 the time for the computation on a Sparc 2+ with 32 MB of memory is presented.
The idea of the generation is based on point location. This technique can also be applied to other circuit structures, not only CST-adders. One key point is the monotony of the function AREA in Section IV-C. This property is also valid, if we take the interconnection area into account, although the resulting adders might be di erent. Finally to speed up the runtime additionally some heuristics were presented.
