Power-aware Design of Logarithmic Prefix Adders in Sub-threshold Regime: A Comparative Analysis  by Gupta, Priya et al.
 Procedia Computer Science  46 ( 2015 )  1401 – 1408 
Available online at www.sciencedirect.com
1877-0509 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of the International Conference on Information and Communication Technologies (ICICT 2014)
doi: 10.1016/j.procs.2015.02.058 
ScienceDirect
International Conference on Information and Communication Technologies (ICICT 2014) 
Power-Aware Design of Logarithmic Prefix Adders in Sub-
threshold Regime: A Comparative Analysis 
Priya Guptaa,*, Anu Guptab, Abhijit Asatic 
aResearch Scholar,EEE Dept,BITS Pilani ,333031, India 
bAssociate Professor, cAssistant Professor, EEE Dept,BITS Pilani, 333031, India  
Abstract 
This paper involves the design and comparative analysis of Han-Carlson and Kogge-Stone adders in sub-threshold regime using 
three different hybrid logic families. The performance metrics considered for the analysis of the adders are: power, delay and 
PDP. Simulation studies are carried out for 8, 16, 32 and 64 bit input data width. The proposed circuits show an energy efficient 
agreement with Spectre simulations using BSIM3v3 and BSIM4 models for 90nm CMOS technology at 0.4V supply voltage. 
The adder implementation outperforms its counterparts exhibiting low power consumption and lesser propagation delay as 
compared to conventional adders operated in the sub-threshold region. 
 
© 2014 The Authors. Published by Elsevier B.V. 
Peer-review under responsibility of organizing committee of the International Conference on Information and Communication 
Technologies (ICICT 2014). 
 
Keywords: Power delay product (PDP); Reverse body biasing (RBB); Pass transistor (PT); Transmission gate (TG); Kogge stone (KS); Han   
Carlson (HC) 
 
1. Introduction 
In many VLSI systems such as DSP, microprocessors and other specific application architectures, binary adders 
are the fundamental arithmetic units. To design a high performance energy efficient arithmetic circuits using cell 
based VLSI design, designers rely on energy efficient fast adder architectures. Logarithmic prefix adder topologies 
 
 
*
 * Corresponding author. Tel.:+91 8823970682 ; fax:+91 9899156015. 
E-mail address: priya.gupta@pilani.bits-pilani.ac.in 
 
© 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of the International Conference on Information and Communication 
Technologies (ICICT 2014)
1402   Priya Gupta et al. /  Procedia Computer Science  46 ( 2015 )  1401 – 1408 
such as Kogge Stone (KS) adder, Brent Kung adder, Han Carlson (HC) adder offers a highly efficient solution to 
binary addition problem, assures a low computation delay and low power in sub-threshold regime. Many papers 
have been published based on the optimization of power, performance per watt as well as the Si area of the sub-
threshold adders which claims the importance of this area.  
 
 Megha Talsania et al.1 has been investigated the performance of six different parallel prefix adders 
implemented using four different TSMC technology nodes. Based on the simulation studies the KS adder has the 
best performance among all the adders for all the input data width considered. Hoang Q. Dao et al.2 has been done  
the transistor-level analysis for different 64-bit adder topologies and validated by circuit simulation using H-SPICE 
0.18μm fujitsu technology at 1.8V power supply. The analysis is done using the application of logical effort. 
Simulation result shows KS adder gave the best results among all. T. Han el al.3 presented a new prefix algorithm 
for prefix computation which gives fast, area-efficient binary adder. G. Dimitrakopoulos et al.4  proposed a novel bit 
level algorithm to implement  an energy efficient parallel-prefix adders using static-CMOS logic family. 
Experimental results shows that the proposed adders achieve significant power reductions while maintaining the 
same operation speed as compare to conventional parallel-prefix adders. Deepa Yagain et al.5 presented the overall 
analysis on the design and comparison of high-speed, parallel-prefix adders using both static CMOS logic and 
transmission gate (TG) logic. KS Ling adder performs much efficiently when compared to the other adders   
 
 On the other hand, to reduce the overall power consumption of the VLSI circuit  several optimization 
approaches have been proposed, either by manipulating the gate sizing (W/L) with an increase in the maximum 
delay of the VLSI circuit, or by using lower power supply voltages in non-critical paths.  Different arithmetic 
algorithms have also been proposed in order to improve computational efficiency in terms of performance, power, 
PDP and regularity of structures. The novelty of the proposed approach is that it directly reduces the overall PDP of 
the logarithmic prefix adders via a usage of hybrid logic families which includes reverse body biasing (RBB) 
scheme in sub-threshold regime. The logarithmic prefix adders chosen for this analysis were: radix-2 KS and radix-2 
HC adders. The performance metrics considered for these two proposed adders are power, delay, PDP using hybrid 
pass-transistor (PT) logic, hybrid transmission gate (TG) logic and static CMOS logic with/without RBB scheme 
implementation.  Simulation studies are carried out for 8, 16, 32 and 64 bit input data width  using UMC 90nm 
CMOS technology at 0.4V supply voltage. 
Section 2 describes the impact and the usage of hybrid logic families in sub-threshold regime. Section 3 gives some  
background information on logarithmic prefix adders, while Section 4 gives the detailed analysis of low power 
logarithmic SUHIL[ DGGHU¶V DUFKLWHFWXUHV In Section 5,simulation results and their comparison are drawn. Finally 
Section 6 concludes the final remarks. 
 
2.   An Analysis of  Basic Logic Gates using Competing Hybrid Logic Families : Reverse Body Biasing Effects  
 Dynamic logic styles are often a favourable choice in the design of high speed digital electronics modules, but 
not for low-power circuit implementations due to the higher toggle rate than the static logic. In this paper, the 
detailed analysis of RBB scheme and their insight effects into the device response at the gate level has been 
observed.  
 This paper focuses on static logic style which is suitable for low-power implementation of arbitrary 
combinational circuits. In order to compare the performance and power dissipation in sub-threshold regime using the 
competing hybrid logic families such as Static CMOS8,9,  PT10,11, CPL12, SRPL8,  DPL6, and TG7 logics, we first 
implemented basic gates like AND gate, OR gate and XOR gate with/without RBB scheme. The most energy 
efficient basic gates are used to design .6DQG+&DGGHU¶VFLUFXLW7KHFLUFXLWUHDOL]DWLRQRIWKHVHEDVLFORJLFJDWHV
with competing logic families is shown in Fig .1. The technology used for implementation of the basic gates is a 90 
nm CMOS technology at 0.3V power supply. 
 
 
 
 
1403 Priya Gupta et al. /  Procedia Computer Science  46 ( 2015 )  1401 – 1408 
 
Fig. 1. Circuit Realization of Basic Logic Gates with Competing Logic Families6-12 
 
  After the extensive research, it is found that swapped/ reverse body biasing (RBB) scheme efficiently 
works and well suited in sub-threshold arithmetic circuits for the further improvement.  The body biasing scheme 
manage threshold voltage (Vth) and power supply voltage (Vdd) at the same time. This effective body biasing 
techniques can also adjust the transistor Vth for temperature fluctuation, maintaining a uniform performance and 
thus adjusting leakage current.  In the conventional configuration the bulk terminal of both NMOS and PMOS 
devices are tied to ground and power supply voltage (Vdd) respectively. In the RBB scheme, the bulk terminal of 
both the devices would get interchanged i.e. PMOS tied to ground and NMOS tied to Vdd. Due to this terminal 
interchange,  there is a noticeable increase of the drain current leading with higher switching speeds and less power 
dissipation. On the other hand interchanging  the bulk terminal connections in above threshold degrades the delays 
significantly. The ability to increase sub-threshold conduction currents in this phenomena is called examination of 
the OFF current (IOFF) with modified bulk potential .  
 In the research paper 13, the test chip of TCP processor core with RBB scheme has been designed and 
compared with conventional body biasing scheme in  90nm Technology. The overall result shows that the PDP of 
the CMOS logic gates with RBB scheme are higher than conventional body biasing scheme at low voltages. The 
principle objective of designing is to minimize the overall power delay product (PDP). For the logic styles 
comparison which are used in low power, energy efficient digital system in sub threshold regime, PDP is a good 
measure terms. Table1 gives a measured PDP summary for optimized logic gates & overall comparison of different 
logic families. 
 
 AND Gate OR Gate XOR Gate 
 
Static CMOS 
 
 
 
 
PT 
   
 
CPL 
  
 
SRPL 
 
  
 
DPL  
 
 
 
 
TG 
 
 
 
1404   Priya Gupta et al. /  Procedia Computer Science  46 ( 2015 )  1401 – 1408 
 
Table1. PDP summary for optimized logic Gates of Different Logic Families14 
Logic Families AND OR XOR 
Static 
CMOS+RBB 
1.92E-17 2.63E-17 6.23E-16 
TG+RBB 4.95E-18 9.46E-18 4.51E-16 
PT+RBB 3.66E-18 5.86E-18 2.29E-16 
SRPPL+RBB 3.43E-17 2.45E-17 1.16E-15 
CPL+RBB 6.84E-17 6.16E-17 1.83E-15 
DPL+RBB 1.65E-16 1.85E-17 7.87E-15 
(a) Without RBB Scheme                                                                                        (b) With RBB Scheme 
 
 From the Table 1, The cumulative result concludes as hybrid PT/TG (conventional PT/TG with RBB 
scheme) , Static CMOS, SRPL and DPL are most the energy efficient logic families that can be used to implement 
arithmetic circuits in sub-threshold regime whereas CPL shows the least efficient  result amongst all.   
The comparison among logic gates illustrate that XOR logic gate with RBB scheme  gives the efficient results as 
compared to conventional logic XOR gate whereas AND logic gate with RBB scheme showed energy efficient 
results in all  PT/TG logic families except static CMOS logic family. OR gate does not show any effect with RBB 
scheme as all values are similar in both7 cases 
3. Logarithmic Adder 
   Due to the slow speed and high power dissipation of carry-look ahead adders has led to the implementation 
of logarithmic prefix-based adders, particularly where large amount of adders are required. Small group of 
intermediate prefixes computes the carry first then large group prefixes compute the carry till the computation of all 
the carry bits.  In prefix-based adders, the carry computation scheme significantly increases the speed of the adder 
(at the expense of increased complexity, delay increases with order logb (n), where b is the radix and n is the number 
of bits per input).  
The performance of the final addition is divided into three stages. In the first stage also known as pre-computation 
logic stage, generate signals, propagate signals, and temporary sum signals of two input operands Ai and Bi are 
computed bitwise. For any given bit-position, the generate (Gi), propagate (Pi) and temporary sum (Ti) signal 
expressions are defined as follows: 
                                                  iiiiiiiii TPBAG %$ %$  ;;         
Where · , ْ and + represent AND, XOR and OR operations and i iV DQ LQWHJHU DQG  L QNotice that both 
propagate, generate and temporary sum signals depend only on the input bits and thus will be valid after one gate 
delay.  
To avoid waiting for a ripple group generates and group propagates signals are produced from two special operators 
of dot and semi-GRWƔżin the second stage (known as prefix tree logic block), which in turn increases the speed 
and reduces the delay. The final expressions of  two operators of dot and semi-dot are given in equation (1) and 
equation (2) respectively 15. 
     
      )2........(...........,,
)1.....(.,,,
111
1111


 
 x
iiiiiii
iiiiiiiii
GPGGPGP
GPGPPGPGP
$
 
This stage is the major functional unit with dense orientation of calculations and logical functionalities to produce 
carry signal. The complexity and function rich feature of the stage plays a major impact on power consumption. In 
the final stage known as post-computation logic block, the final sum and carry-out are computed and are defined 
according to following given equation.   
  
Logic Families AND OR XOR 
Static CMOS 8.23E-18 1.80E-17 7.25E-16 
TG 5.85E-18 5.41E-18 9.43E-16 
PT 1.13E-17 2.04E-18 3.97E-16 
SRPPL 2.79E-17 2.47E-17 1.45E-15 
CPL 6.89E-17 6.17E-17 2.45E-15 
DPL 1.80E-17 2.34E-17 1.01E-15 
1405 Priya Gupta et al. /  Procedia Computer Science  46 ( 2015 )  1401 – 1408 
)4......(..........
)3....(..............................
1:211
1


 
 
nnnout
iii
GPGC
GTS
 
                                                                                                      
Where -1 is the position of carry-input  ii CG  1: .   
All logarithmic prefix structures can be implemented with the equations above; however, to get the same correct 
carries, equation (1) & (2) can be interpreted in various ways which leads to variety of parallel prefix trees. The first 
stage and last stage involve only simple operations on signals local to each bit position, So these stages are 
intrinsically fast.  The general diagram of n-bit logarithmic parallel -prefix structures is shown in Fig 2.  
 
  
Fig. 2.   N-bit logarithmic Parallel-Prefix Structure 
4. Low Power Logarithmic Prefix Adders Architectures   
 In this paper radix-2 KS and radix-2 HC logarithmic prefix adders architecture has been chosen for the low 
power analysis. These logarithmic Prefix adders compute addition in two steps: one to obtain the carry at each bit, 
with the next to compute the sum bit based on the carry bit. The hybrid construction of a HC logarithmic prefix 
adders are the combinations of Kogge-Stone construction which have log2n stages and Brent-Kung construction 
which have 2log2n±1 stages. The combine effects of  both the adders provide a reasonably high speed at less 
complexity.  The internal architecture of HC & KS  adders as shown in Fig. 3. reveals that for the same word size 
the number of prefix computation stages are one extra logic level than the KS design, Whereas in the transistor level 
design  the number of the prefix operations is fewer in the HC design than in the KS design. Thus, the HC adder 
reduces the area used by the adder circuitry in return for one extra stage of delay as compared to the KS adder. HC 
prefix tree can be viewed as a sparse version of KS prefix tree. In fact, the fan-out at all logic levels is the same (i.e. 
2). The simplest prefix trees of KS and HC adder reveals that the pseudo-code for Kogge-Stone's structure can be 
easily modified to build a HC prefix tree. The major difference is that in each logic level, HC prefix tree places cells 
every other bit and the last logic level accounts for the missing carries. Moreover the implementation of theses low 
power logarithmic adders for different data width has also been done using static CMOS with/ without RBB scheme, 
hybrid TG and hybrid PT logic families. The deep analyzed results with different hybrid logic families are discussed 
in the next section. 
 
 
 
 
 
(a)                                                                                                                        (b) 
Fig. 3.  (a) Han-Carlson (HC) adder;    (b) Kogge Stone Adder 16 
1406   Priya Gupta et al. /  Procedia Computer Science  46 ( 2015 )  1401 – 1408 
5. Experimental Results  
 This section gives the overall performance achieved by a combination of two techniques, one at the circuit 
level using sub-threshold design and another at the architectural level using hybrid logic families which includes 
RBB technique. Fig 4. gives the overall PDP comparison graphs between the Kogge-Stone and Han-Carlson adder 
with competing logic families in sub-threshold regime.   
 
1.00E-14
1.00E-13
1.00E-12
Static CMOS Static CMOS + RBB PT+RBB TG+RBB
PD
P
8b-KSA 8b-Han-Carlson
 
1.00E-14
1.00E-13
1.00E-12
Static CMOS Static CMOS + 
RBB
PT+RBB TG+RBB
PD
P
16b-KSA 16b-Han-Carlson
 
1.00E-14
1.00E-13
1.00E-12
Static CMOS Static CMOS + RBB PT+RBB TG+RBB
PD
P
32b-KSA 32b-Han-Carlson
 
1.00E-14
1.00E-13
1.00E-12
Static CMOS Static CMOS + 
RBB
PT+RBB TG+RBB
P
D
P
64b-KSA 64b-Han-Carlson
 
 Fig. 4. Comparison PDP graph between 8b, 16b, 32b, 64b Kogge-Stone and Han-Carlson adder with competing logic families 
 
 
 
 
1407 Priya Gupta et al. /  Procedia Computer Science  46 ( 2015 )  1401 – 1408 
The performance metrics considered for these two proposed adders are power, delay, PDP using hybrid pass-
transistor logic, hybrid transmission gate logic and static CMOS logic with/ without RBB scheme implementation. 
Simulation studies are carried out for 8, 16, 32 and 64 bit input data width  using UMC 90nm CMOS technology at 
0.4V supply voltage. From the graphs, it is observed that there is not much effects of hybrid PT logic and static 
CMOS logic with RBB scheme. So only hybrid TG and static CMOS logic is being considered for the comparison 
purpose.    
The Table.2. summarized the power, delay and PDP of HC and KS adders using hybrid logic families in sub-
threshold regime. The analyzed results compared with existing published results. As expected, the proposed adder 
with hybrid logic consistently gives better results in sub-threshold regime among all. The KS and HC adders using 
static CMOS logic exhibits the best performance in terms of power, delay and PDP for both the 8 and 16 bit adder 
categories. Whereas for 32 and 64 bit input data width hybrid TG logic has better circuit characteristics as compare 
to conventional logic families. 
 
            Table 2. Analyzed Results 
 
6. Conclusions 
The main focus of this paper was to find out the best logic families for the low power implementation of 
logarithmic prefix adders in sub-threshold regime. The overall PDP improvement is achieved by the combination of 
two techniques, one at the circuit level (the usage of sub-threshold design technique) and another at the architectural 
level (usage of hybrid logic families which combines RBB scheme).  The simulated result shows for low bit 
operands (i.e. 8b & 16b) proposed static CMOS logic family gives the most energy efficient results , approx  63.7% 
and 29.24% PDP improvement in radix-2 KS adders , while 32.69% and 43.64% PDP improvement in radix-2 HC 
adders . On the other hand for higher bit operands (i.e. 32b & 64b) proposed hybrid TG logic family is better, 
approx 40.59% and 58.99% PDP improvement in radix-2 KS adder, while 3.23% and 15.32% PDP improvement in 
radix-2 HC adder . It is acknowledged that logic family greatly affects the PDP of the circuits.  
Logic 
Family 
No. of 
BitsRef 
Kogge Stone Adder Han Carlson Adder 
 
 
Static 
CMOS 
 Power Delay  PDP Power Delay PDP 
85 4.13xE-3 2.18xE-11 9.01xE-14 10.8xE-3 61.6xE-9 6.65xE-10 
165 7.69xE-3 2.87xE-11 2.20xE-13 13.5xE-3 82.2xE-9 1.11xE-9 
325 13.6xE-3 3.01xE-11 4.09xE-13 13.9xE-3 104.8xE-9 1.45xE-9 
6413 172xE-3 1.40xE-09 2.40xE-E 47xE-3 1.5xE-9 7.05xE-11 
 
 
TG 
8 5 1.87xE-3 2.33xE-10 4.35xE-13 1.91xE-3 60.1xE-9 1.14xE-10 
165 5.27xE-3 2.38xE-10 1.25xE-12 6.41xE-3 81.33xE-9 5.21xE-10 
325 10.3xE-3 2.70xE-10 2.78xE-12 9.82xE-3 100.3xE-9 9.84xE-10 
 
Proposed 
Static 
CMOS 
 Power Delay  PDP Power Delay PDP 
8 1.92xE-8 0.26xE-6 4.99xE-15 1.62xE-8 3.95xE-7 5.93xE-15 
16 0.44xE-7 0.479xE-6 21.07xE-15 3.66xE-8 4.49xE-7 1.64xE-14 
32 1.03xE-7 1.14xE-6 1.17xE-13 8.11xE-8 5.34xE-7 4.33xE-14 
64 2.34xE-7 1.618xE-6 0.378xE-12 1.77xE-7 6.19xE-7 1.09xE-13 
 
Proposed 
Hybrid 
TG 
8 2.03xE-7 67.67xE-9 13.77xE-15 1.99xE-7 4.42xE-8 8.81xE-15 
16 4.07xE-7 73.19xE-9 29.78xE-15 4.15xE-7 7.01xE-8 2.91xE-14 
32 8.832xE-7 78.7xE-9 69.50xE-15 8.38xE-7 5.00xE-8 4.19xE-14 
64 1.905xE-6 81.4xE-9 0.155xE-12 1.73xE-6 5.31xE-8 9.23xE-14 
1408   Priya Gupta et al. /  Procedia Computer Science  46 ( 2015 )  1401 – 1408 
References 
1. Megha Talsania , Eugene John,  A Comparative Analysis of Parallel Prefix Adders  12th Euromicro Conference on Digital System 
Design, Architectures, Methods and Tools, 2009, p. 281-286. 
2.  Hoang Q. Dao,  Vojin G. Oklobdzija,  Performance Comparison of VLSI Adders Using Logical Effort  12th International Workshop, 
PATMOS 2002 Seville, Spain, 2002, p. 25-34. 
3.  T. Han , D. A. Carlson,  Fast Area-Efficient VLSI Adders,  8th IEEE Symposium on Computer Arithmetic, Como, Italy,1987, p. 49-56. 
4. P. M. Kogge, H. S. Stone,  A parallel algorithm for the efficient solution of a general class of recurrence equations , IEEE Trans. on 
Computers,  1973;22:786-792. 
5. Deepa Yagain, Vijaya Krishna A, Akansha Baliga  Design of High-Speed Adders for Efficient Digital Design Blocks  International 
Scholarly Research Network, ISRN Electronics,2012, p. 1-12 
6. Zahi Moudallal, Ibrahim Issa, Mohammad Mansour, Ali Chehab , yman Kayssi,  A Low-Power Methodology for Configurable Wide 
Kogge-Stone Adders  International Conference on  Energy Aware Computing (ICEAC), 2011, p. 1-5.  
7.  Akansha Baliga, Deepa Yagain  Design of High speed adders using CMOS and Transmission gates in Submicron Technology: A 
Comparative Study  Fourth International Conference on Emerging Trends in Engineering & Technology,2011,  p. 284-289. 
8.  R. Zimmermann, W. Fichtner,  Low-power logic styles: CMOS versus pass-transistor logic  IEEE J. Solid-State Circuits,1997; 32: 1079-
1090.  
9. U. Ko, P. T. Balsara,  W. Lee,  Low-power design techniques for high-performance CMOS adders,  IEEE Transactions On Very Large 
Scale Integration (VLSI) Systems,1995;3: 327-333. 
10. D Markovic, B Nikolic, V G Oklobdzija   A general method in synthesis of pass-transistor circuits  Microelectr. J, 2000; 31: 991-998 
11. Makoto Suzuki, Toshinobu Shinbo, Toshiaki Yamanaka, Akihiro Shimizu, Katsuro Sasaki, and Yoshinobu Nakagome ,  A 1.5-ns 32-b 
CMOS ALU in Double Pass-Transistor Logic  IEEE Journal of Solid-State Circuits,1993; 28 : 1145-1151. 
12. Arijit Raychowdhury, Bipul Paul, Swarup Bhunia,  Kaushik Roy, Computing with Subthreshold Leakage: Device/Circuit/Architecture 
Co-design for Ultralow-Power Subthreshold Operation , IEEE Transactions on Very Large Scale Integration Systems (TVLSI),  2005; 
13: 1213-122. 
13. G. Dimitrakopoulos, P. Kolovos, P. Kalogerakis,  D. Nikolos,  Design of High-Speed Low-Power Parallel-Prefix VLSI Adders  
Springer-Verlag, 2004, p. 248-257. 
14. Priya Gupta, Anu Gupta, Abhijit Asati,  Design and Implementation of n-bit Sub-threshold Kogge Stone Adder with Improved Power 
Delay Product  European Journal of Scientific Research, 2014; 123:106-116. 
15. Uming KO, Poras T. Balsara, Wai Lee,  Low-power design techniques for high-performance CMOS adders  IEEE Transactions On Very 
Large Scale Integration Systems, 1995; 2: 321-323. 
16.  Patrick Ndai, Shih-Lien Lu, Dinesh Somesekhar, Kaushik Roy,  Fine-Grained Redundancy in Adders  Proceedings of the 8th 
International Symposium on Quality Electronic Design (ISQED'07),2007, p. 317-321. 
 
 
