Abstract-This paper proposes a new theory of adder and its basic structure. The new adder of asynchronous structure constructed by half adders, called Parallel Feedback Carry Adder (PFCA) as its carry mode is parallel feedback. In theory, the area consumption of n-bit PFCA is close to O(n) and the average length of carry chain is O(log n). A CMOS gate implementation scheme is implemented. HSPICE simulation results show that PFCA has obvious advantages over RCA, CLA, CSeA in speed and area, especially when n is bigger.
I. INTRODUCTION
As a basic computation function of dat a processing, integer addition is th e most commonly used and important operation in digital circuit design. Therefore the speed and area consumption of adders have great impact on the overall system speed and scale.
A ripple-carry adder [1] is implemented by using multiple copies of a 1-bit full adder, where the carry out of the (i-1)-th full adder is fed into the carry input of the i-th full adder and the lowest order full adder ha s a carry input of 0. The carries must ripple from the least-significant bit to th e most-significant bit, resulting in an addition time of O(n)(n is th e word length of th e adder). Obviously the ripple-carry adder scheme is very slow for building a wide adder while area consumption is very small.
To speed up adders, a wide variety of techniques have been proposed. These techniques include synchronous adders, such as: Con dition Carry Adder (CCA), Carry Look ahead Adder (CLA), Ling Adder, Manchester Carry Chain Adder, Conventional Carry Skip Adder(CCSKA), Modified Carry Skip Adder, Carry Select Adder (CSeA), Carry Save Adder(CSA) [2~4] ; And asynchronous adders , such as: C arry Completion Sensing Adder(CCSA), Delay Insensitive Carry Look ahead Adder(DICLA), Speed-up DICLA [5~6] . These synchronous adder circuits operate at the worst rates, wh ile asynchronous ones operate only at average rates. The adders motioned above are all constructed by full adders despite they have different structures. In general, one adder has the smallest area consumption; it cannot reach the fastest speed. As a result, area-time efficiency is proposed to evaluate the performance of various adders, the smaller the product of area and computation time, the better area-time efficiency is.
This paper proposes a new adder theory and its structure, which is constructed by half adders. In theory, this adder has the area asymp totic requirement O(n) and average computation time O(log n). It shows that the area of PFCA is no t larger than RCA and its sp eed is not slower than CLA. Because full adder consists of two half adder and one OR gate, PFCA provides more than 4 times space for increasing the area-time efficiency in both area and speed. An implementation of PFCA based on CMOS gate is pro vided to verify this theory. HSPICE simulation results show that PFCA has obvious advantage over RCA, CLA, CSeA in speed and area, showing potential applications especially when word length is longer.
Ⅱ. PFCA THEORY

A．Basic Unit of Adders
Let A and B be two n-bit binary numbers, as follows:
Let S be their sum:
Where and are the i-th bit of A and B.
. Formula (3) a re the full adder expression, structure and simplified symbol of full adder are shown in Fig. 1(a) and Fig. 1(b) . From Fig. 1(a) , we know that a full adder is consisted of two AND gates, two XOR gates and one OR gate. A half adder consists of one AND gate and one XOR gate (shown in Fig.2) . From the perspective of l ogic gates, one full adder consists of two half adders and one OR gate. Half adder has carry-output but no carry-input. As a result, all ex isting adders are b ased on fu ll adder and a n-bit adder needs n full adders at least. A good e xample is that an n-bit RCA (shown in Fig. 3 ) needs n full adders, and its area co nsumption is th e least wh ile requiring a long computation time. 
B Theory and Basic Structure of PFCA
A new structure s hould be extracted from the basic structure of adders shown in equation (2) and equation (3) in order to imp lement n-bit adder based on half adder. Let A and B be two n-bit binary numbers, the superscript be the time variable of iteration times, the subscript be the order in the binary number. Then 
Obviously, the initial value is: Let and be the sum bit and carry bit of A, B;
which are also in termediate variables of iteration. They are as follows:
When a fee dback mechanism is b rought in, let
,
where , and constitute a n ew group of and
B , and they return to equation (4) and equation (5), then operation repeats from equation (4) and equation (5) to equation (8) and equation (9). The ideal structure of PFC A is sho wn in Fig. 4.
) are not all owed connect to An ideal n-bit PFCA can be drawn out by cascading n units of this kind and its basic structure is shown in Fig. 4 . The area asymp totic requirement of PFCA is O (n) and all the half adders will complete their operations in each iterating, then the results are feedb ack to the inputs for next operations. Therefore, PFCA is a parallel adder with feedback. Compared to all existing adders, PFCA has less time computation and sm aller area and its area -time efficiency can be increased 4 times because of t he two advantages of PFCA: 1. parallel mode, which speeds up the circuits of adder; 2. feedback mode, which increases the number of the basic units used. 
The theorem can be easily proved. Equation (6) generates one new addend:
generates the other new addend:
,(
). As feedback mechanism brought in, the next add operation will be:
). Then the final sum will be got after many times of iteration until the carry-bits are all zero. ii).If
C．The Analysis of Properties in the Basic
and the status changes all the time. The status of port S is reverse to that of port A, forming negative feedback in the external of HA and the delay time for transporting signals from A to S is denoted as T HA in the internal of half adder. Therefore, the half adder trigger will form "0-1" self-oscillation loop with its oscillation period T HA , which can be used as local clock. The curve of C is the same as the A, but there is some delay.
D．The PFCA Theory Validation
PFCA is an asynchronous adder, so we can use average computation time instead of the speed of PFCA. The average computation time is decided by the average length of carry chain. Necessary and sufficient conditions of carry chain existing is as follows:
There must be a natural number m meet the equation: 
The occupied area of n-bit PFCA is close to O(n)( as we can easily get in the Fig. 4) . The average length of carry chain in n-bit asynchronous adder is O(log n) [7~13] . We prove this by statistical method, Table1 and Fig. 6 are the statistics data. The statistics d ata in Table1 and Fig. 6 shows: When the word length of PFCA is smaller than 18, the average number of iterations computed by the method of statistics shows that the length of carry chain of an n-bit PFCA is closed to log n.
Ⅲ. THE IMPLEMENTATION OF PFCA IN CMOS GATES
PFCA has substantial potential advantages in area and speed in theory, but there are some technological problems to resolve in the implementation of hardware circuits:
ⅰ) The feedback iterations may need a synchronous clock to control the circuits, while it is n ot suitable to import system clock.
ⅱ) The feedback signals must be sepa rated from the input signals.
ⅲ) As an asynchronous circuit, PFCA requires a start and a finish signals.
ⅳ) The difference between the delays from port A and B to port S and C will be enlarged after several times of iteration, it may leads to the failure of PFCA.
ⅴ) The attenuation of output signals will shorten the transmitted distance. This section proposed a CM OS gate implementation scheme, two rules obeyed in the design of
B．The Implemen Gates
In the CMOS plementation schem east one drive circ h e t solve signal atte tio lem. ture A is shown in the CM struct basic unit (2-bit PFC is shown in Fig. PFCA: 1) The occupied area and time consumption should be as small as possible; 2) the delay from input A and B to output C and S should be equal. The simulation adopts the platform HSPICE2007.3 with pa rameters of MOS transistor: L=1U, W=20U, LEVEL=1 and other defaults.
A．The Implementation of Half Adder Trigger in CMOS Gates
A half adder plemented in HSPICE2007.3 and the timing sequence is shown in Fig.  7. ig. 7, i trigger shown in Fig. 5 
is im
In F f = 1 B , the status of and C '1' and S is determined by ransie t status which is no t certain, the S is oscillation waves with "0-1"; if 0 = B , C is '0' all th e time no matter the transient status is '0' or the t n result is agrees with the theory of PFCA. 
C．Timing Simulation in HSPICE
The own in Fig.10 . To demonstrate the process of data transmitting, let A = 01111111, B = 00000001. In this case, PFCA has iterated 8 times and then got the right result, S=10000000. In Fig. 10 , when the Start signal fall from '1' to '0', the computation begins; after 8 times iteration, the finish signal falls from '1' to '0', then the results come out in the outputs S0-S7. , thu s PFCA finish an add operation nd the av own in Fig. 9 . In the Fig. 9 , dashed box ① encircles a MUX2-1( not include the NOT gate of Start signal); dashed box ② encircles an AND gate, which generates output S; dashed box ③ encircles an XOR gate, a nd it generates output C; dashed box ④ encircles a drive circuit of carry-output; dashed box ⑤ encircles an OR gate, which is used as a addition completion te mechanism, when OR gate output '0', the addition of the 2-bit P CA structure has additional two MUX2-1s, a Start, a drive circuit a nd a c ompletion test circuit in each unit. The controlling Start and two MUX2-1s can resolve the problem ⅰ) and pro ⅱ), which gives PFCA an enable signal to select the input signals and the feedback signals. Problem ⅲ ) is resolved by designing a mechanism to test all the output bits, If the output bits are all '0', an failling edge is gene timing simulation of 8-bit PFCA is sh once. The total delay of 8 times iteration is 2.55ns a erage delay of iteration is 0.28 ns excluding the initial time 0.35 ns. 
D．Eliminate Race Hazard
Race hazard is an comm on sense in Com binational logic design as is show in Fig. 10 , the main reason of race hazard in Fig. 10 is the delay of input(A, B ) to C and S are not equal. If th e delay of input to C an d S Varies slightly, race hazard wouldn't influence the final re sult, otherwise, circuit will be unstable. To solve this problem, more drive circuit units are in troduced in PFCA too compensate for the unequal delay of inp ut to C an d S. The improved circuit of PFCA is sh own in Fig. 11 and Fig. 12 . Figure 12 . Improved structure of n-bit PFCA Fig. 11 is a basic unit of PFCA shown in Fig. 12 .One drive circuit unit is b rought in after ev ery bit of PFCA. To compare with simulation results in Fig. 10 , set let A = 01111111, B = 00000001, in this case, PFCA has iterated 8 times and then got the right result, S= 10000000. The simulation results in HSPICE are sh own in Fig. 13 . Obviously, race hazard has been eliminated, so the structure in Fig. 12 will be more stable. In order to evaluate the performance of P FCA, this section chose three famous adders (RCA, CLA and CSeA) to compare their area and ti me consumptions. Table 2 shows the simulation results of the 4 adders with different word length. The unimproved scheme (Fig. 8) is selected here for its small time consumption; it can not be the optimized scheme, so the optimized schemes [14~16] o CSeA are not adopted. Normally, we choose the standard CMOS gates, such as NOT gate, OR gate, AOI gate and OAI gate. These 4 adders (PFCA, RCA, CLA and CSeA) ach group is a 4-bit RCA.
A．The Area of PFCA
The structure of PFCA c onsists of t hree parts: half adder trigger, drive circuit and completion test unit.
One half adder trigger includes two MUX2-1, one AND gate and one XOR gate, it needs 42 transistors. Therefore n half adder triggers need 42n in total.
One drive circuit consists of two NOT gates and on e OR gate and it needs 10 transistors. Given that the last two bits of PFCA don't need drive circuit, n-bit adder needs 5n-10 transistors totally.
The completion test u nit includes several levels of n-input OR gates. Table 2 shows the number of transistors used in RCA, CLA, CSeA and PFCA when word length n=4, 8, 12, 16, 32respectively . Compared to the other three adders, the number of transistors used in PFCA is nearly as many as that in RCA, and much less than that in CLA an d CSeA. What is more, the advantage will be more obvious when n is larger. 
B．The Time Consumption of PFCA
The delay of avera ge iteration can be computed as follows: Ⅴ. CONCLUSIONS PFCA is a new adder based on a m ore basic logic structure and i s also feasible in hardware. Com pared to existing adders, PFCA shows obvious advantages which can be demonstrated in four parts: 1) PFCA can shorten the computation time and reduce the area consumption in a large am ount; 2) the adva ntage will be m ore obvious when n is larg er; 3) it is easy to im plement the PFCA even when word length n is large, because it does nothin to do with th of 4-bit PFCA is the speed wi re made use o f sufficiently when implementing the hardware circuits, the hardware implementation of PFCA will perfo rm higher performances. erformance He has b een studying science September, 2008 interes a proces g the leng adder, implementing a 102 as easily as that of a 16-bit PFCA; 4) ll not be influenced by the different bits PFCAs when they are used to complete the same size addition. For the operation of 8-bit addition, the computation time in 32-bit RCA is 4 times as much as in 8-bit RCA; however, they are the same when using PFCA.
The implementation scheme proposed in this paper mainly aims to verify the feasibility of PFCA in hardware implementation and it can not be the optimized scheme. If the functions of half adder trigger a 
