This paper presents the design and implementation of a time driven adder generator architecture. There exists a large variety of adders designed to satisfy different computation requirements, in particular we list the Carry Look Ahead (CLA) adder, the skip adder, the ripple adder, the carry select adder (CSA), etc. These different architectures will offer different delays and it is up to the user to chose among them. The design we present here allows the parametrization of the architecture to fit ones design constraints. From the word length and the wanted delay the generator outputs a suitable architecture.
INTRODUCTION
The exists a so large set of different adders generators (Sklansky, 1990) , (Bedrij, 1962) , (Brent, 1982) , (Cavanagh, 1984) , (Hwang, 1979) , (Muller, 1989) , each one implementing a particular architecture, that the choice of an adder may be tough. We present here an alternative which will replace all the others. We impose the time delay criteria to the generator and this one will output the right adder (the architecture of the adder is thus variable).
Moreover, the need for such a generator is justified by the optimization of electrical power consumption and area. In fact, we find addition units in almost all complex circuits. For example, in a general purpose processor, certain adders are allocated for address computation, others are integrated in floating point units and most of the time linked to multipliers. The applications are various and the constraints (computation time, area, power consumption) vary from application to another. However, we find almost the same type of adders (the fastest) in such designs, even though this is not always necessary. For example, in the address computations, we need one clock cycle to carry out the operation, we can thus relax the computation delay requirement and use a slower adder. This will allow a certain gain in power consumption and Silicon area. The generators we designed allow us to fit exactly our performance requirements with the best optimizations possible.
From the word lengths of the two operands and the computation time the generator outputs an adder in four different views (structural, behavioral, physical and placement) it also outputs a certain number of functional patterns. This is illustrated in figure 1.
Figure 1 Generator Interface
The generator is designed following a methodology developed at the MASI laboratory (Aberbour, 1995) , (Houelle, 1994) , and directly inherited from the silicon compiler approach (Johansen, 1979 The architecture of the tree can in fact be in several forms, more or less parallel. To illustrate this we propose an example of an 8 bits adder in three configurations, as depicted in figures 2(a), 2(b), 2(c).
Figure 2 Alternative adder architectures
The first configuration, figure 2(a), computes the propagation and generation values in a serial fashion and represents a full sequential adder. This adder contains eight cells and its computation delay in the one of eight combinational stages.
The carry anticipation adder, figure 2(b), uses a binary tree to compute the P and G signals. Its delay evolves logarithmically (Sklansky, 1990) , three stages in our case. 
The last configuration, figure 2(c), represents an intermediate adder between the full serial adder and the carry anticipation adder. It is built up of ten logic cells and presents a delay of four stages. The variable architecture adder generator is thus capable of generating a tree containing at the same time a parallel section and a serial section. For an N bits adder the generator outputs an operator with a number of stages varying from log(N) to N.
THE MAIN ADDERS ARCHITECTURES
We start the discussion with the ripple adder. It is the slowest ( delay in O(N)), but it occupies the smallest area (in O(N)). It is used where a very small area and power consumption in needed.
Opposite to the ripple adder, is the Carry Look Ahead adder. It has a computation time of the order of O(log 2 ,(N)), the area is in O(N*log 2 ,(N)), the fact which makes this adder largest in terms of size. It is readily built in a recursive fashion and this makes it suitable for an implementation as a generator There exists a large set of architectures with intermediate characteristics. A skip adder architecture offers still better performances.
We find also in the literature the adder with carry selection (Bedrij, 1962) . It is broken down into several blocks. Each block carries out two additions in parallel, one anticipating a null carry in and another a 1. The result is then determined depending on the true value of the carry in. The delay is in ( )
and the area is also in ( )
The addition architecture used is introduced by (Slansky, 1990) . This operator allows the simple computation of the carry propagation and generation functions P i j , G i j , starting from position i up to position j. The properties of this operator are listed below
• Idempotence
• Non Commutativity
The most important property is the way the intermediate propagation and generation functions are computed
which corresponds to ( ) ( )
Figure 3
The configurable adder architecture
CLA Section Ripple Section

MSB LSB
The chosen architecture must be modulable depending on the imposed computation time. This means that the length of the critical path must vary from a configuration to another. Moreover, since the generator must be able to generate adders with a propagation delay comprised between a CLA delay time and a ripple delay time, the implemented architecture is hybrid, in between the CLA (the fastest) and RIPPLE (the lowest) architectures. Now suppose that the delay constraint forces us to build an adder in which the critical path is constituted of k ∆ cells; the adder will then be as shown in figure 3 .
We notice three distinct parts : Let n be the number width of the adder and k the number of stages (k represents also the number of cells of the Ripple part of the adder).
The first group of x cells in parallel is placed between the position (n-x,k) and (n-1,k); x will be determined later on. The inputs to these cells are
The first values P G i n x − are generated by the group of ∆ cells situated at the top of the previous group, precisely from the point (n+1-x,k-x+1) to the point (n-1,k-1). It is then sufficient to specify the value of x and we get a basis to build the adder. We have seen that cells are placed from the point (1,1) to the point (n-x-1,k-1) included. Since these cells are cascaded, i.e. they are on a diagonal then k-1=n-x-1 which yields x=n-k
Unfortunately, there exists a limiting case which restricts the application domain of the algorithm. Since the cells of the second group start from position (n+1-x,kx+1), where k-x+1 is the reference number of the stage, and the highest stage number is 1. Then we conclude that k-x+1≥1 with x=n-k we get k≥n/2 or n≤2k
However, it can happen that this inequality be violated. In this case we apply the same process as to build a CLA adder. This means that we instanciate n/2 cells from (n/2+1,k) to (n-1,k), then we elaborate two adders of n/2 bits starting from the stage referenced by the number k-1.
RESULTS
Three different VLSI comparisons have been carried out on a 32 bits adder and this for each configuration, meaning for every value of k varying from 5 to 31. For these tests we used the cells library ECPD07 of the ATMEL-ES2 company.
First of all, we focused our comparisons on the routed circuit area. The automatic placement and routing have been done with the CADENCE tools. As shown in figure 4 , and at a first glance, the curve seems not meaningful. However, we can distinguish two intervals. The first, for a number of stages less than 16 (=n/2), the area gain is very interesting, the curve is sharp. Elsewhere, the curve is flat and doesn't constitute an advantageous zone to find the best areadelay compromise for the adder. The curve indicates that the area decays exponentially when the architecture tends to become fully serial. Now lets focus on the propagation time results for the used technology, illustrated in figure 5 .
The curve is quasi-ideal because the delay grows linearly with respect to the number of stages. This proves that the delay grows as expected with respect to the number of stages. The messured delays for a 32 bits adder varies from 9 to 30 nano seconds. The obtained curve is smooth and grows exponentially. Once more we can extract two distinct intervals:
A sharp and fast growth part for a number of stages less than 16 (=n/2). And an almost straight line, representing a not really interesting power consumption-delay compromise.
CONCLUSION
The main goal achieved in this work is the replacement of all possible adders generators by a generator with a parametrized time driven addition architecture. This is not possible only if we impose the addition computation time to the generator. In fact, the generator provides very good results since we can adjust very precisely the computation time, by using different numbers of intermediate combinatorial stages. The study of the curves representing the performances of the adders comes up with a conclusion that the compromise is optimal for a number of stages less than n/2, where n is the precision of the adder. In fact, the area and power consumption decrease very rapidly in this interval, whereas the delay grows slowly. This means that if we can tolerate increasing the computation time of the addition by about 10%, this will be equivalent to increasing the number of stages by a few units, then we can achieve a power consumption and area gain of about 15%. Outside this interval (≥n/2), the area and power decrease slowly, and this presents a negligible profit.
