The performance of arithmetic adders varies widely in their power consumption, delay, and area requirements. To acquire more fine-grained trade-offs in the power-delay tradeoff curve of a binary adder, the heterogeneous adder architecture is adopted. In heterogeneous adder architecture, a binary adder is decomposed into sub-adder blocks with different carry propagation schemes and bit-widths. Thus the method allows us to expand the original design space of a specific type of adder into the more fine-grained design space by mixing that of each sub-adder. In this paper, a design for heterogeneous adder through power optimization under delay constraints or delay optimization under power constraints was presented by determining the bit-width of each sub-adder. Also the effectiveness of the proposed method was demonstrated by showing the ratio of the power consumption of heterogeneous adder to that of conventional adder.
Introduction
The heterogeneous adder can be described as an adder in which various type of carry propagation adders, such as ripple carry adder (RCA), carry skip adder (CSKA) and carry look-ahead adder (CLA), are concatenated using the carry-in and carryout signals of these component adders.
1, 2 Therefore, for an implementation of n-bit adder, each bit-width of component adders can be adjusted for optimum design. In this paper, an architecture of heterogeneous adders is extended, permitting design trade-offs with power and delay. The proposed method can also be used for area related design optimization. An integer linear programming (ILP) based methodology is applied to configure heterogeneous adders. The ILP provides the best type and bit-width of sub-adders for two applications; (i) power-constrained delay optimization and (ii) delay-constrained power optimization. Compared to previous works, [3] [4] [5] the proposed approach provides a higher level view of arithmetic optimization without considering low-level circuit design issues such as fanout size and wiring complexity. Also, it facilitates a more systematic optimization method through mathematically modeled delay and power.
Commercial CAD tools such as Synopsys Design Compiler perform an optimization for a conventional adder by selecting a specific type of adder from the design library 6 and gate-level optimization is conducted to meet the given design constraints. However, the optimization of heterogeneous adders is performed in the mixed design space composed of heterogeneous bit level component adders, so it provides more flexibility by allowing us to acquire the power or the delay data from various sources such as gate-level design, layout design, and numerical estimation. This paper is organized as follows. The architecture of heterogeneous adders and their characteristics are explained in details in Sec. 2. In Sec. 3, we will explain the mathematical modeling and optimization of the heterogeneous adders. Experimental results are presented in Sec. 4 to show the efficiency of the proposed method. Finally, conclusions are drawn in Sec. 5.
Heterogeneous Adders
The conventional adder implementation for a specified bit-width is done via a selection of a single type of adder in a given library. Consider Fig. 1 showing the delay and the power of three different types of adders synthesized by Synopsys tools with 0.18 µm CMOS library.
9 Delay variation for a given type of adder with a given bitwidth is due to synthesis or circuit optimization. It is seen that for a given bit-width the delay of RCA is the largest followed by those of CSKA and CLA, whereas the power consumption of RCA is the smallest following those of CSKA and CLA. The implementation of a heterogeneous adder is done by combining different types of component sub-adders with various bit-widths. Figure 2 shows an example of an architecture for heterogeneous adders. The heterogeneous adder shown in Fig. 2 consists of CLA, CSKA, and RCA with variable bit-width for each sub-adder. Figure 2 suggests that the overall delay or power consumption be a combination of individual delay or power consumption of different type of sub-adder. For example, when bit-width = 96, the range of overall delay or power consumption with heterogeneous adders is determined by the individual bit-width allocation to each type of sub-adder.
Let a S ub Adder SA i (n i ) be an n i -bit sub-adder whose carry propagating scheme is denoted by SA i . When the number of available sub-adders is I, an n-bit heterogeneous adder is defined as an n-bit adder which concatenates SA i (n i ) where 1 ≤ i ≤ I and 0 ≤ n i ≤ n. SA i (n i ) uses the carry-out signal of SA i−1 (n i−1 ) as its carry-in signal. The sum of n i for all SA i (i = 1, . . . , I) should be equal to n.
The bit-level implementations of such heterogeneous adders enable extended design space exploration, allowing more fine-grained power-delay trade-offs.
Optimization by ILP

Introduction to ILP
Many problems can be modeled as maximizing or minimizing an objective, with the given limited resources and forcing constraints. When the objective function can be defined as a linear function of integer variables, "integer linear programming" is acquired from the formulation. The standard form of integer linear programming is defined as follows. In the above equations, A is an m × n matrix with its elements
T x is the inner product of two vectors, c T and x. In Eq. (1), Ax is a matrix-vector product.
ILP has a variety of applications especially in computer-aided design area such as scheduling, resource allocation, covering, and matching 8 and numerous algorithms to solve ILP have been introduced already. Thus, once we succeed in formulating of a target problem in ILP form, the problem can be solved by available linear program solver such as "CPLEX", "lp solve", etc.
Example 1. (Application of ILP to the knapsack problem)
Let us consider a popular "knapsack problem" for ILP formulation. In Table 1 , the object to be put in a knapsack, its value, and its capacity are represented. With the given capacity of the knapsack as 17, we want to maximize the total weight of objects to be placed in the knapsack. Finding the number of object to be put in the knapsack under the given condition can be formulated by ILP. From Definition 1, c = [
, and b = 17. Here the variable x j indicates the number of object to be put in the knapsack and takes only integer value larger than 0. Thus, the acquired ILP formulation for the above knapsack problem is as follows: 
Problem formulation
With a given order of I types of sub-adders, seeking a solution of delay-constrained power optimization problem is equivalent to "finding each bit-width n i for subadder SA i to minimize the power of the n-bit heterogeneous adder, while satisfying a constraint that the total delay should be less than θ delay ", where θ delay denotes the upper bound of the total delay allowed for the heterogeneous adder. In case of power-constrained delay optimization problem, delay is minimized under power constraints.
ILP formulation
A delay-constrained power optimization can be described by the following expressions:
min POWER (Heterogeneous Adder) under constraints
In the foregoing expressions, the "POWER", "DELAY" are functions returning average power consumption and delay, respectively. The functions "POWER" and "DELAY" are expressed by linear combination of variables in order to use ILP 10 for finding an optimal solution. The function "POWER" is expressed as follows:
POWER (ith type Sub-Adder) . (4) As mentioned previously, the order of sub-adder types of heterogeneous adders is assumed as shown in Fig. 2 , and I(= 3) different types of sub-adders SA i , 1 ≤ i ≤ 3 are placed from the least significant bit (LSB) to the most significant bit (MSB).
• Power Modeling: For ith type of sub-adder, POWER (SA i ) can be expressed as follows:
where P
SAi ni
is the power consumption of ith type sub-adder and x
is a binary integer variable taking values of "0" or "1". The inequality constraint With a heterogeneous adder of a given configuration as shown in Fig. 2 , the maximum delay of SA 1 (n 1 ), corresponding to CLA sub-adder, is D Finally, with all the sub-adders RCA, CSKA, and CLA, the delay is given as max{D Fig. 2 . It is known for such generalization that the delay of sum is larger than the delay of carry for each type of sub-adder.
To implement delay-constrained power optimization fit for ILP, the DELAY (Heterogeneous Adder) is formulated as follows:
From foregoing discussions, the detailed expressions for delay-constrained power optimization are obtained, depending on I value, as follows: 
Similarly the power-constrained delay optimization is expressed as follows:
3 :
In the above expressions, θ power and θ delay denote the upper bound allowed for average power consumption and delay, respectively. d max is a variable indicating the upper bound of delay and also the minimax objective, in power-constrained delay optimization. From the proposed ILP formulation for the delay-constrained power optimization, note that the following proposition is hold for an arbitrary θ delay . This is meaningful in that if we take a upper bound (θ delay or θ power ) from an implementation of a conventional adder, it can be applied to ILP optimization to find optimal configurations of the heterogeneous adder satisfying the upper bound. Proof. Let us assume that we have a set of adders, CA 1 , CA 2 , . . . , CA n such that DELAY(CA 1 ) ≥ DELAY(CA 2 ) ≥ · · · ≥ DELAY(CA n ) and POWER(CA 1 ) ≤ POWER(CA n ) ≤ · · · ≤ POWER(CA n ) for all the bit-widths. For an example, CA 1 can be RCA, CA 2 can be CSKA, CA 3 can be CLA. Assume that we take a specific adder CA i and we have a given upper delay bound, θ delay , for the adder CA i . In this case, θ delay can be expressed in DELAY(CA i ) + δ where δ is a positive or zero number indicating some marginal delay. Then, simply by finding a heterogeneous adder, HA, using only the CA i , the θ delay can be applied to HA trivially, which satisfies θ delay ≥ DELAY(HA) = DELAY(CA i ). Now we can take CA j such that j < i (thus, DELAY(CA j ) ≥ DELAY(CA i ) and POWER(CA j ) ≤ POWER(CA i )) to make a heterogeneous adder by combining CA j and CA i . Using the marginal delay, we can increase DELAY(HA) by combining CA i with a slower adder component CA j on a heterogeneous adder with a proper bit-width assignment of those adders. The bit-width of CA j can be selected to meet the condition "θ delay ≥ DELAY(HA)". The increased delay, which is less than δ, leads to decreased power consumption since POWER(CA j ) ≤ POWER(CA i ).
Proposition 1. Let
Without loss of generality, the proposition can be rewritten with θ power instead of θ delay with little modification.
Experiments
The three types of sub-adders RCA, CSKA, and CLA were considered for the experiment of power-delay optimization of four bit-widths of heterogeneous adders. Possible bit-widths of sub-adders are in the range of bit-widths from 2 to 128 bits, since 2 bits are minimum bit-width to distinguish each sub-adder type. The delay and power estimates D were acquired from synthesized designs using Synopsys with 0.18 µm CMOS technology. 9 The ILP optimization was performed using an LP solver called "lp solve". 10 The ILP solver produced the sub-adder concatenations of RCA, CSKA, and CLA (the symbol " " denotes a concatenation between adjacent sub-adders). For example, in the case of RCA CSKA CLA, a CLA was located from the LSB and CSKA was used in the middle part, and RCA was located up to the MSB.
In Fig. 3 , the heterogeneous adder architecture obtained by "lp solve" is shown for the power-constrained and delay-constrained optimization of the four bit-widths of heterogeneous adders. Figures 3(a), 3(c) , 3(e), 3(g) display power-constrained delay optimization results for the heterogeneous adders with the bit-widths of 128-bit, 96-bit, 64-bit, and 32-bit. The delay-constrained power optimization results of 128-bit, 96-bit, 64-bit, and 32-bit heterogeneous adders are depicted in Figs. 3(b) , 3(d), 3(f), and 3(h), respectively. With changing θ delay (or θ power ) upper bound, the optimal heterogeneous adder configurations are found. Several configurations of the optimized heterogeneous adder are indicated in Fig. 3(a) . For instance, RCA97 CSKA31 (17.00/1600) obtained as the result of domination indicates that the optimal delay of the heterogeneous adder is 17.00 ns when θ power = 1600 µW with 97 bits of RCA and 31 bits of CSKA. In these figures, the two values in parentheses are "delay" and "θ power " pairs or "θ delay " and "power" of the corresponding heterogeneous adders. The design space covered by RCA CSKA CLA in Fig. 3 contains solutions with the various configurations of RCA, CSKA, and CLA. In Fig. 3(a) , it is observed that RCA can be solely used to get optimized delay with tight upper bound θ power (θ power = 1490 µW), as expected from Fig. 1 . As θ power increases, the configuration of RCA CSKA becomes the optimal architecture of the heterogeneous adder due to their relative superiority in consequent power consumption to CSKA. With the maximum θ power (= 3240 µW) used for ILP, the CLA must be used to get small delay with relaxed power consumption constraints. In the middle range of θ power , a combination of RCA, CSKA, and CLA is used to acquire the configurations of the optimized heterogeneous adder.
As indicated in Fig. 3(a) , RCA CSKA is a good candidate ordering for replacing CSKA and RCA CSKA CLA, RCA CLA are good candidate orderings for replacing CLA in power-constrained delay optimization. It is found that RCA2 CSKA126 (i.e., RCA is used for MSB and the CSKA is adopted for the remaining 126 bits) is the optimal configuration for the range of 1980 ≤ θ power < 2320. Figure 3 (b) corresponding to delay-constrained power optimization illustrates similar configurations of the heterogeneous adder. Above explanation about the 128-bit heterogeneous adder configuration is also applied to Figs. 3(c)-3(h) for the other bit-widths of the heterogeneous adders except that the RCA CSKA CLA does not appear in Fig. 3(g) , i.e., the power-constrained delay optimization of 32-bit heterogeneous adder. This is due to that of the heterogeneous design space RCA CSKA CLA being absorbed into RCA CLA since the compromised design space of RCA CSKA cannot replace that of RCA for a short bit-width such as 32. Therefore, the optimal configuration of the heterogeneous adder can be obtained by either power-constrained delay optimization or delay-constrained power-optimization through ILP formulation and "lp solve".
The bit-width of each sub-adder shown in Fig. 3 explains clearly that heterogeneous adder indeed allows power-delay trade-offs much better than the conventional adder design. The solutions with RCA CSKA, RCA CSKA CLA, and RCA CLA are newly introduced design points. These newly introduced pareto-optimal points cannot be obtained without using heterogeneous adder architecture. Figure 4 shows the delay and the power reduction in percentage. The reduction of delay and power is calculated with respect to the delay and the power of a conventional adder matched to θ power or θ delay . For example, in case of RCA CSKA for delay optimization, the component adder RCA incurring the larger delay while satisfying the power upper bound, is used as the 128-bit, 96-bit, 64-bit, and 32-bit adder to compute the reference delay. In Fig. 4(a) , DR4 MAX , DR3 MAX , DR2 MAX , and DR1 MAX denote the maximum delay reduction of the heterogeneous adders with their bit-width, 128, 96, 64, and 32, respectively. It is observed that around 68% of DR4 MAX , when θ power = 3200 µW, is achieved by RCA CSKA configuration. As the bit-width becomes smaller, the DRi MAX also gets lower, with DR1 MAX ≈ 54%.
Exploration of Power-Delay Trade-Offs with Heterogeneous Adders by ILP 799
However, it is noteworthy that the difference among the power reductions of the four bit-widths of heterogeneous adders is negligible because the differences between each other are smaller than 1%. The power reduction of the four bit-widths of heterogeneous adders, P R MAX is around 40%.
Finally, Table 2 shows the maximum running time of "lp solve" from the optimizations with the whole range of upper bounds. "lp solve" ran once for each ILP optimization with a specific θ delay or θ power and its running time depended on the upper bound for each ILP optimization. The experiments were performed by Intel Pentium 2.4 GHz CPU with Linux OS. For example, the maximum running time of "lp solve" for 128-bit delay-constrained power optimizations is 3.04 s, for the 174 runs of "lp solve". These results show the high efficiency of the proposed method even for considerably large bit-width such as 128.
The experimental results in Fig. 3 present the delay-constrained power optimization of the various bit-width of heterogeneous adders, or vice-versa. Also the reduction of power when compared to the case of using conventional adder and the running time of "lp solve" for an ILP formulation promises the feasibility of the proposed method for real applications. As shown in Fig. 4 , the delay reduction increases as the bit-width of the heterogeneous adder becomes larger, whereas the power reduction according to the different bit-width of the heterogeneous adder, is almost identical.
Conclusions
For designing power-delay efficient adders, the idea of a heterogeneous adder architecture is adopted and the ILP formulation of power and delay for the heterogeneous adder is introduced in this paper. Considering four bit-widths of adders (128, 96, 64, and 32), it is shown that multiple solutions for configurations of the heterogeneous adder were found by ILP. In comparison with conventional and homogeneous adders, the heterogeneous adders are more useful in that significant delay reduction and power consumption reduction can be achieved with a small optimization time.
