Implementation of An Efficient Gate Level Modified Square-Root Carry Select Adder Using HDL by Kumari, K.Surya & Kumar, K.Hemanth
International Journal of Science Engineering and AdvanceTechnology, IJSEAT, Vol 1, Issue 7, December - 2013 ISSN 2321-6905
www.ijseat.com Page 186
Implementation of An Efficient Gate Level Modified Square-Root Carry
Select Adder Using HDL
K.SURYA KUMARI, K.HEMANTH KUMAR
1Assistant Professor, ECE, Pragathi Engineering College, (Affiliated to JNTUK, A.P)
2M.Tech Scholar, Pragathi Engineering College, (Affiliated to JNTUK, A.P)
ABSTRACT:
Contribution of this work is reduce the area
and power of the CSLA by a simple gate level
modification. Based on this modification CSLA
architecture have been developed and compared
with the regular SQRT CSLA architecture. Carry
Select Adder is one of the best adders used in many
data processing processors to perform fast and
robust arithmetic functions. It reduces the area and
power consumption and became a reputed one. As
co pared to the SQRT CSLA it is increased in
delay. In this work we have evaluated the
performance based delay, area and power with
logical effort and through FPGA design by using
Xilinx tool for synthesis and simulation for
graphical verification by using Modelsim tool. The
result analysis shows that the proposed CSLA
structure is better than the regular SQRT CSLA.
KEYWORDS:
FPGA(Field Programmable Gate Array), Area
efficient CSLA.
I.INTRODUCTION
Design  of  area and  power efficient
high-speed data  path  logic  systems  are  one  of
the  most substantial  areas  of  research  in
VLSI  system design. In digital adders, the speed
of addition is limited by the time required to
propagate a carry through the adder. The sum for
each bit position in an
elementary adder is generated sequentially only
after  the  previous  bit position has  been summed
and  a  carry  propagated  into  the  next  position.
The CSLA is used in many computational systems
to alleviate the problem of carry propagation delay
by independently generating multiple carries and
then select a carry to generate the sum [1].
However, the CSLA is not area efficient because it
uses multiple pairs of Ripple Carry Adders (RCA)
to generate partial sum and carry by considering
carry input Cin=0 and Cin=1, then the final sum
and carry are selected by the multiplexers  (mux).
The basic idea of this work is to use
Binary to Excess-1converter (BEC) instead of
RCA with Cin=0 in the regular CSLA to achieve
lower  area  and  power  consumption  [2]-[4].  The
main advantage of this BEC logic comes from the
lesser  number  of  logic  gates  than  the  n-bit
FullAdder (FA) structure. Organisation of the
paper is as below mentioned Section II deals with
the delay and area evaluation methodology of the
basic adder blocks. Section III presents the
detailed structure and  the  function  of  the  BEC
logic. The  SQRT CSLA  has been chosen for
comparison  with the proposed design as it has a
more balanced delay, and requires lower power
and area  [5],  [6]    and modified SQRT CSLA
are presented in Section III. The  FPGA  synthesis
details  and  results  are analyzed  in  Section  IV.
Finally,  the work  is concluded in Section V.
Fig 1: Delay and area evaluation of
an XOR gate
International Journal of Science Engineering and AdvanceTechnology, IJSEAT, Vol 1, Issue 7, December - 2013 ISSN 2321-6905
www.ijseat.com Page 187
Fig 2: 4-b BEC
Fig 3: 4-B BEC WITH 8:4 MUX
The carry ripple adder is composed of
many cascaded single-bit full adder that is
composed of two carry ripple adders with cin_0
and cin_1, respectively.   Through   the
multiplexer, we can select the correct output result
according to the logic state of carry-in signal. The
carry-select adder can compute faster because the
current  adder  stage does  not  need  to  wait  the
previous stage’s carry-out signal. The summation
result is ready before the carry-in signal arrives;
therefore, we can get the correct computation result
by only waiting for one multiplexer delay in each
single bit adder. In the carry select adder, the carry
propagation delay can be reduced by M times as
compared with the carry ripple adder. However, the
duplicated adder in the carry select adder results in
larger area and power consumption.
II. AREA - EFFICIENT CARRY SELECT
ADDER
The carry ripple adder is constructed by
cascading each single-bit full-adder [1]. In the carry
ripple adder, each full-adder starts its computation
till previous carry-out  signal  is ready. Therefore,
the  critical path  delay in a  carry ripple  adder  is
determined by its carry-out propagation path. For
an  N-bit  full-adder  as  illustrated  in  Fig. 1,  the
critical path is N-bit carry propagation path in the
full-adders. As  the  bit  number  N  increases,  the
delay  time  of  carry  ripple  adder  will  increase
accordingly in a linear way. In order to improve the
shortcoming of carry ripple  adder to remove the
linear dependency between computation delay time
and  input  word  length,  carry  select  adder is
presented  [2]. The carry select adder divides the
carry ripple  adder  into  M  parts, while  each part
consists  of  a  duplicated (N/M)-bit  carry  ripple
adder pair, as illustrated  in Fig.  2 as M=16 and
N=4. This duplicated carry ripple adder pair is to
anticipate both possible carry input values, where
one carry ripple adder is calculated as carry input
value is logic “0” and another carry ripple adder is
calculated as carry input value is logic  “1”.When
the actual carry input is ready, either the result of
carry  “0” path or the result of carry  “1” path is
selected by the multiplexer according to its carry
input value. An example of 5-bit carry select adder
is illustrated in Fig. 3. To anticipate both possible
carry input values in advance, the start of each M
part carry ripple adder pair no longer need to wait
for the coming of previous carry input. As a result,
each  M  part  carry  ripple  adder  pair  in the  carry
select adder can compute parallelly.
Fig. 4: The N-bit carry ripple adder constructed
by N set single bit full-adder
Fig 5: The 16-bit carry select adder is divided
by the carry  ripple  adder  into 4  parts,  while
each  part consists  of  a  duplicated 4-bit  carry
ripple adder pair.
In this way, the critical path of N bit adder can be
greatly  reduced.  In  the  conventional  N-bit  carry
ripple adder design, the critical path is N-bit carry.
Fig 6: 5-bit carry select adder [1], [2].
International Journal of Science Engineering and AdvanceTechnology, IJSEAT, Vol 1, Issue 7, December - 2013 ISSN 2321-6905
www.ijseat.com Page 188
propagation  path  plus  one  summation  generation
stage. Alternatively, the critical path is  (N/M)-bit
carry propagation path plus M stage    multiplexer
with one summation generation stage in the N-bit
carry select adder. Since M is much smaller than N
and delay in the multiplexer is smaller than that in
the full adder, the computation delay in the carry
select adder is much shorter than that in the carry
ripple  adder. However,  implementing  the  adder
with  duplicated carry  generation  circuit  costs
almost twice    hardware    and    twice    power
consumption  as  compared  with  the  carry  ripple
adder. Therefore,  in  this  paper,  we  proposed  an
area-efficient  carry  select  adder  by  sharing  the
common   Boolean   logic   term  to   remove   the
duplicated  adder  cells in  the  conventional  carry
select  adder.  In  this  way,  we  can  save  many
transistor counts and achieve a lower PDP.
A. Delay and Area Evaluation methodology of the
basic Adder blocks
The   AND,   OR,   and   Inverter (AOI)
implementation of an XOR gate is shown in Fig. 1.
The gates between the dotted lines are performing
the   operations   in   parallel   and   the numeric
representation  of  each  gate  indicates  the  delay
contributed by that gate.
Fig 7 Regular 16-b SQRT CSLA
Table 1: Delay and area Count of the Basic
Blocks of CSLA
The delay and area evaluation
methodology considers all gates to be made up of
AND, OR, and Inverter, each having delay equal
to 1 unit and area equal to 1 unit. We then add up
the number of gates in the longest path of a
logic block that contributes to the maximum delay.
The area evaluation is done by counting the total
number of
AOI gates required for each logic block. Based on
this approach,  the  CSLA  adder  blocks  of  2:1
mux,  Half Adder (HA), and FA are evaluated and
listed in Table I
Table 2: Function Table of the BEC
As stated above the main idea of this work is to use
BEC instead  of  the RCA  with  Cin=1in  order  to
reduce the area and power consumption of the
regular
CSLA. To replace the  n-bit  RCA,  an  n+1bit
BEC  is required. A structure and the function table
of a 4-b BEC
are shown in Fig.  2 and Table II, respectively.
Fig.  3 illustrates  how  the  basic  function  of
the  CSLA  is
obtained by using the 4-bit BEC together with the
mux. One input of the 8:4 mux gets as it input
(B3, B2, B1,
and B0) and another input of the mux is the BEC
output. This produces the two possible partial
results in parallel
and the mux is used to select either the BEC
output or the direct inputs according to the control
signal Cin. The
International Journal of Science Engineering and AdvanceTechnology, IJSEAT, Vol 1, Issue 7, December - 2013 ISSN 2321-6905
www.ijseat.com Page 189
importance of the BEC logic stems from the large
silicon area reduction when the CSLA with large
number of bits
are designed. The Boolean expressions of the 4-bit
BEC is  listed  as (note  the  functional  symbols
NOT,  AND, XOR).
X0=∼BO;
X1=BO^B1;
X2=B2^ (B0&B1);
X3=B3^ (B0&B1&B2)
Fig.  8 Modified  16-b SQRT CSLA. The
parallel RCA with cin=1 is replaced with BEC.
A. Delay and area evaluation methodology of
regular 16-B SQRT CSLA The structure of the 16-b
regular SQRT CSLA is shown in Fig. 4. It has five
groups of different size RCA. The delay and area
evaluation of each group are shown in Fig. 5, in
which the numerals within [] specify the delay
values, e.g., sum2 requires 10 gate delays. The steps
leading to the evaluation are as follows.
1) The group2 [see Fig. 5(a)] has two sets of 2-b
RCA. Based on the consideration of delay values of
Table I, the arrival time of selection input
c1[time(t)=7]of  6:3 mux is earlier than s3[t=8]
and  later  than  s2[t=6].  Thus,  sum[t=11]  is
summation  of  S3  and  mux[t=3]  and
sum[t=10]  is summation of c1 and mux.
2) Except for group2, the arrival time of mux
selection input is always greater than the arrival
time  of data outputs from the RCA’s. Thus,  the
delay  of  group3  to  group5 is  determined,
respectively as follows:
{c6,sum[6:4]}=c3[t=10]+mux.
{c10,sum[10:7]}=c6[t=13}+mux.
{cout,sum[15:11]}=c10[t=16]+mux.
3)  The  one  set  of  2-b  RCA  in  group2  has  2
FA  for cin=0and the other set has 1 FA and  1
HA for cin=0.. Based on the area count of Table I,
the total number of gate counts in group2 is
determined as follows:
Gate count =57(FA + HA + Mux)
FA=39(3*13)
HA=6(1*6)
Mux=12(3*4).
B: Delay and area evaluation methodology of
modified 16-B SQRT CSLA
The  structure  of  the  proposed 16-b  SQRT
CSLA using BEC for RCA with Cin=1 to
optimize the area  and  power  is  again  split  the
structure  into  five groups.  The  steps  leading to
the  evaluation  are  given here.
1) The group2  [see Fig.  7(a)] has one  2-b RCA
which has 1 FA and  1 HA for Cin=0. Instead of
another  2-b RCA with Cin=1a 3-b BEC is used
which adds one to the output  from  2-b  RCA.
Based  on  the  consideration  of delay values of
Table I, the arrival time of selection input
c1[time(t)=7] of 6:3 mux is earlier than the s3[t=9]
and c3[t=10]and later than the s2[t=4]. Thus, the
sum3 and
final c3 (output from mux) are depending on s3 and
mux and partial c3 (input to mux) and mux,
respectively. The
sum2 depends on c1 and mux.
2)  For the remaining group’s the arrival time of
mux selection input is always greater than the
arrival time of data  inputs  from  the  BEC’s.
Thus,  the  delay  of  the remaining groups
depends on the arrival time of mux selection
input and the mux delay.
3) The area count of group2 is determined as
follows:
Gate count =43(FA + HA + Mux + BEC)
FA=13(1*13)
HA=6(6*1)
AND=1
NOT=1
XOR=10(2*5)
Mux=12(3*4).
IV. FPGA Synthesis / Implementation Results
The  design  proposed  in  this  paper  has  been
developed using VHDL and synthesized Xilinx tool.
The synthesized netlist and their respective design
constraints file (ngd/ngc) are as shown in the below
International Journal of Science Engineering and AdvanceTechnology, IJSEAT, Vol 1, Issue 7, December - 2013 ISSN 2321-6905
www.ijseat.com Page 190
Fig 9: % reduction versus word size
Fig 10: %delay overhear versus
word size
V. CONCLUSION
A simple approach is proposed in this
paper to reduce the area and power of SQRT CSLA
architecture. The  reduced  number  of  gates  of  this
work  offers  the
great advantage in the reduction of area and also the
total power. The modified CSLA architecture is
therefore, low area, low power, simple and efficient
for VLSI hardware implementation. It would be
interesting to test the design of the modified 128-b
SQRT CSLA.
VI. REFERENCES
1. B. Ramkumar, H.M. Kittur, and P. M. Kannan,
“ASIC implementation of modified faster carry
save adder,” Eur. J. Sci. Res., vol. 42, no. 1, pp. 53-
58, 2010
2. Y. He, C. H. Chang, and J. Gu, “An area efficient
64-bit  square  root  carry-select  adder  for  low
power applications,” in Proc. IEEE Int. Symp.
Circuits Syst., 2005, vol. 4, pp. 4082-4085.
3. Y. Kim and L.-S. Kim, “64-bit carry-select adder
with reduced area,” Electron. Lett., vol. 37, no. 10,
pp. 614-615, May 2001.
4. T.  Y.  Ceiang  and  M.  J.  Hsiao,  “Carry-select
adder using single ripple carry adder,” Electron.
Lett., vol. 34, no. 22, pp. 2101-2103, Oct. 1998.
5. O.   J.   Bedrij, “Carry-select adder,”   IRE
Trans. Electron. Comput., pp. 340-344, 1962
6. Neil  H  E  Weste,  David  Harris,  Ayan
Banerjee, “CMOS   VLSI   Design   A   circuits
and   Systems Perspective” Third edition, Pearson
Education, pp.347-349.
7. Pucknell Douglas A, Eshraghan, Kamran, “Basic
VLSI Design,”Third edition 2003, PHI Publication,
pp.242-243.
8. Peter   J.   Ashenden “VHDL   Quick   Start”
The University of Adelaide at © 1998 Peter J.
Ashenden.
9. Morteza  Fayyazi,  Zainalabedin  Navabi  and
Armita Peymandoust “Using VHDL Neural
Network Models for Automatic Test Generation” in
M Fayyazi - 1997.
K.HEMANTH KUMAR is pursuing M.Tech with
the specialization of  Embedded Systems in
Pragathi Engineering College, Surampalem. He
received the B.Tech degree in Electronics &
Communication Engineering from Pragathi
Engineering College, Surampalem in 2008.
K.SURYA KUMARI working as Assistant
Professor in Department of Electronics &
Communication Engineering, Pragathi Engineering
College. Having Teaching Experience of about 12
years.
