Abstract -Transmission-gate conditional-sum (TGCS) adders have been realized in a standard 2.5-pm CMOS technology. These adders offer short propagation delay and latency time (12.5 ns for 32-bit addition) and consume only moderate chip area (i.e., 80x460 pm2 for 1 bit in a 32-bit adder). They allow static operation and consume only dynamic power (like standard CMOS). The layout exhibits high regularity and can be easily adjusted to various word lengths. Design and layout techniques are described in detail and experimental data are given.
I. INTRODUCTION
EVERAL approaches have been used in constructing S adders for high-speed signal processing. The htghest throughput rate is achieved by carry-save (vector merging) adders [l] . But, as these adders have a long latency time, they are not suited for every application (e.g., recursive structures [2] ). Also they consume a large chip area, which grows with the square of the word length. These adders are, therefore, usually used only for moderate word lengths [l] .
The other ultimate fast adder with word-length independent addition time (latency time equals throughput rate) is the redundant binary adder [3] . This adder consumes about twice the chip area (though NMOS) of our adder, but its application has speed advantages at large word lengths (32 bits or more). In such systems, however, the normal highspeed binary adder is still necessary for conversion to the normal binary number representation [3] .
High-speed adders with low latency time, large word length, and normal binary number representation are still being developed. Commonly they are realized as carryselect adders [4] or as carry-lookahead structures [5] . Carry-lookahead adders have a high transistor count and, hence, they are economical only when employing dynamic circuit techniques [5] . Dynamic adders have the disadvantage that they need proper clock timing and have problems with noise margins. Also the precharge cycle may reduce the speed performance in recursive structures. Hence, the carry-select adders are widely used as the optimum compromise between htgh-speed operation and small chip area. In these circuits groups of ripple-carry adders (most commonly 4 bits) are formed. Their outputs are selected based on the less significant carry signals using multiplexers [6] . In this work we present an adder concept which in our view yields the fastest static operation and the shortest latency time possible in a given CMOS technology. The penalty is an increase in chip area, though moderate when compared to the carry-select adders.
TGCS ADDITION LOGIC
Already in 1960 the conditional-sum addition logic was theoretically regarded as the fastest one by Sklansky [7] , [SI. The very global mathematical approach of Winograd [9] leads to the same conclusion when restricted to fan-ins of 3. These adders are assembled using only two different circuit elements (Fig. 1 ). They consist of the conditional cells and multiplexers with two inputs. For each bit of the adder there is one conditional cell provided. It calculates the sum and the carries for each bit twice ( Fig. l(a) ): CO and CO are valid if the carry-in of the stage is 0, and C' and C' are true for C,, =l. The selection of the right sums and carries is done by the multiplexers, which are controlled by the carry-outs of the lower significant bits Although Sklansky and Winograd regarded thts structure as the fastest one if restricted to low fan-ins, it has very rarely been realized. The adder becomes very large if we construct it using standard NAND and NOR gates, and it will not be faster than a carry-select adder. But the carryselect adder gains its speed advantage from the fact that transmission-gates are used in the carry path.
So our approach was to use transmission gates as well for all the circuit elements of the conditional-sum adder. Not only the multiplexers, but also the conditional cells have been built that way (Fig. l(b) ). It is important that all the functions of the conditional cell can be realized using only one series transistor. Each multiplexer also contributes one series transistor. So the number of transistors connected in series becomes PI. where t is the number of transistors connected in series and n is the word length.
It is an important advantage that the adder structure does not have to be modified for various word lengths to be realized. For example, in a 32-bit transmission-gate conditional-sum (TGCS) adder there are six transistors connected in series, and such a structure is usually considered to be very slow. But in our adder this does not degrade the speed. T h s results from the fact that the transmission gates (see Fig. 2 ) are switching one after the other from the left to the right during the addition. So there is enough time for the parasitic capacitances to be charged correctly. If, for example, the left-most transmission gate would switch as the last one, then all capacitances would have to be charged afterwards and the chain would degrade the speed of the adder. Fortunately, this is not the case. In addition, when the last (i.e., the right-most) transmission gate switches, the properly charged parasitic capacitances diminish the dynamic impedance at its input. Also it should be noticed that the chain in Fig. 2 is not very long if we consider the word length. Furthermore, the chain leads directly from the inputs to the carry and sum outputs. So we do not need any carry generate or carry propagate functions in advance or a sum function to be calculated afterwards. What we do need is an output buffer (see Table 111 ) which could be combined with a latch if appropriate.
Another problem is that if we have a large adder, many multiplexer gates have to be switched. If n is the word length, n/2 multiplexers have to be controlled by one signal (i.e., the carry-out signal of the first half of the adder). So what we need are special optimized cascaded CMOS buffers to drive the multiplexer gates. The design of these buffers is listed in Table I for adders up to 32 bits.
It can be seen from [SI that for every doubling of the word length, one additional row of multiplexers and buffers is needed. The additional delay time for every doubling is given by the delay of one buffer and one multiplexer and increases rather slowly.
REALIZATION AND LAYOUT
The adder was designed using a regular layout that can be compared to the binary tree described in [lo] (see Fig.  3 ). Hence even larger buffers that have to drive more multiplexer gates fit well into the design.
For simplicity of design, all NMOS and PMOS transistors have the same channel widths of 7 and 15 pm, respectively, in the multiplexers and the conditional cells. The channel lengths of both devices have been chosen to be equal, so an adaptation of the adder layout to arbitrary word lengths is very easy. The adders were realized in a standard CMOS technology with single-metal and singlepolysilicon layers. The channel lengths were 2.5 pm drawn, though mask lengths of some devices were shrunk in order to obtain channel lengths of 2.0 and 1.5 pm in addition. Fig. 4 shows the layouts of the conditional cell and of a group of four multiplexers. These multiplexers can be easily assembled in such a way that uninterrupted metal paths are used for the control signals. Fig. 5 shows the layout of the buffer B3.
For ease and validity of measuring, an 8-bit adder was realized combined with a latch in a recursive structure (Fig. 6) . This configuration allows the determination of the maximum possible clock frequency for a digital system using this adder. In addition, the adder was realized for 16 and 32-bit word lengths (Fig. 7) .
The area consumption is 80x460 pm' for 1 bit in a 32-bit adder. The measured delay times are listed in Tables  I1 and Ill . An E-beam measurement of the 32-bit 2.0-pm adder is shown in Fig. 8 . From this figure estimations for larger adders can be made: 13 ns for 64-bit and 17 ns for 128-bit addition time (2.0 pm) seems to be realistic. If there is no latch to be connected to the outputs of the adders, but the outputs are connected to inputs of an adder of the same type, additional buffers have to be inserted. The delay of these buffers is included in Table 11 . Even in that case there is a significant speed advantage over carry-select adders [4] .
The power consumption of the 8-bit adder/latch combination is 180 pW/MHz at 5-V power supply voltage.
IV. CONCLUSIONS
We have described a new adder for standard CMOS technologies. It combines the conditional-sum addition logic with the transmission-gate logic design. The resulting adder has a very low delay time (i.e.. latency time is about 30 percent faster than that of carry-select adders).
The important features are static operation and fully complementary design. Therefore, no problems typical for dynamic techniques occur, such as low noise margin, charge sharing, or clock distribution. Also, there is no dc power consumption.
The adder design exhibits a high regularity, leading to low design effort for adders with arbitrary word lengths. The architecture does not have to be changed if the word length varies.
These adders can find their application in cases where the binary addition is the bottleneck in larger digital systems. Examples are carry-save multipliers [4], the transformation from the redundant binary-number representation to normal binary-number representation [3] , and recursive structures (e.g., predictive coders [2]), where TGCS adders would improve the performance.
