I.
INTRODUCTION
Arithmetic processing units are basic building blocks of many digital systems like Microprocessors, Digital signal processors, and most of the dedicated signal processing circuits [1] [2] [3] [4] [5] [6] . For any arithmetic processing unit, adders are one of the most crucial components because of its resource consumption and the delay involved in processing. Adders also form a part of multipliers which is another resource intensive component of arithmetic circuits [7] . For an efficient implementation of arithmetic circuits, the choice of a particular adder thus becomes an important design consideration [1] [2] [3] [4] [5] [6] [7] [8] [9] . Considering the importance associated with the implementation of an adder, in this work, some of the existing adder implementations are compared with respect to the delay incurred, resource utilization and power consumption.
Addition of two bits can be done by using a half adder, while a basic adder that has always been used for performing addition is a full adder. This 1-bit full adder can be used to add multiple operands by using multiple numbers of full adders [8] . Addition of more than two numbers of operands calls for a multi-operand adder [10, 11] . In 2005, R. D Kenney and M. J Schulte [1] introduces and analyzes three techniques for performing fast operands addition. Multioperand adder designs are constructed and synthesized for 6 to 12 input operands. S. Singh and R. Waxman [8] described a scheme for multiple operand addition and multiplication, applying the bit-partitioning technique so that each partition contains m-bits of each of these k numbers, where m=[log 2 (k-1)] is an integer ≥ log 2 (k-1), the final sum can be obtained in m+1 addition cycles. In 2013, J. Hormigo, J. Villalba et. al. [10] efficiently implemented compressor trees [11] on FPGA, which is more efficient in terms of area and speed, and is made possible by using the specialized carry chains of linear array compressor tree. Linear array compressor trees lead to marked improvements in speed compared to carry propagate adder (CPA) approaches and, in general, with no additional hardware cost. Furthermore the high definition of carry save adder (CSA) arrays based on CPAs facilitates ease-of-use and portability.
Multi-operand adder can simply be represented by an architecture comprising of a compressor tree [13] , which reduces the partial sum and propagated carry [10] . There are different types of multi-operand adders but the adders taken up in this work are Array tree adder, Wallace tree adder, Balanced delay tree adder and Overturned-stairs tree adder.
The operands considered for addition can be single bit or of multiple bits, thus the input and output of the adder can be in multiple bits. Nowadays, 8-bits, 16-bits and 32-bits multi operand adder are used in many circuits, for the purpose of comparison, the above mentioned parameters are used. The adders are implemented in Verilog code, and synthesized in Xilinx ISE 13.4 platform. The device chosen for implementation is Virtex 6 (XC6VLX240T) with FF1156 package.
II. CLASSIFICATION OF MULTI-OPERAND ADDERS
Some of the most popular multi-operand adders [13] which have been chosen for implementation purpose are discussed hereunder:
a. Array Tree Adders
Array tree adder is a straight forward multi-operand adder to add and accumulate partial sums [14] . b. Wallace Tree Adder Figure 2 shows the architectures of Wallace tree adder for 6 and 9 operands respectively. In Figure 2 , all operands are utilized in a parallel manner, in the first level itself using multiple carry save adders (CSAs). The partial sums and carries (S i and C i ) generated from the first level are then operated upon in the subsequent levels of the CSA tree, to generate the input for the carry propagete adder (CPA). The final outputs are then obtained from the CPA adder. the resulting architecture is same as that of a Balanced delay tree adder. Here, it doesn't wait to balance the sets of inputs given to the CSA tree. All operands are accommodated in the first level itself. For 9 operands, in Overturned-stairs tree adder, all operands are accommodated in the first level itself but not in Balanced delay tree. In Balanced delay tree, some sets of operands are taken in the first level and others in the next level so as to balanced the partial sums of the adder.
III. BUILDING BLOCKS
Given below is the brief description of the components used to create the multi-operand adders described in the preceding sections.
a. Carry Propagate Adder (CPA)
Carry propagate adder [18] is designed from a 1-bit full adder (FA). A cascade of n FAs gives a n-bits CPA. Figure 5 shows a block diagram of 4bit-CPA, which add two operands of 4-bits. Carry save adder [17, 19] is simply a ripple carry adder where the carries are stored rather than propagated. Figure 6 shows a block diagram of CSA. Mi 
IV. RESULTS AND DISCUSSIONS
The adders are implemented in Verilog code, and synthesized in Xilinx ISE 13.4 platform. The device chosen for implementation is Virtex 6 (XC6VLX240T) with FF1156 package.
Performance parameters like as delay, power and resource utilization in terms of look-up tables (LUT) have been considered for the comparison purpose.
Performance parameters as a function of logic delay and routing delay of various multi-operand adders is shown in Table I , II and III. In Table I Table III, 12 operand adders have been considered with the same bit length respectively. The simulation results offered, Wallace tree adder gives the lowest overall propagation delay and Array tree adder the highest overall propagation delay. The result of Balanced delay tree and Overturned-stairs tree for 6 operands is same since the architecture is same for 6 operands. So, with increasing number of operands and bit length, Wallace tree adder offered the lowest propagation delay compared to others along-with least consumption of power.
V. CONCLUSIONS
In this paper, different multi-operand adders have been analyzed in terms of propagation delay, power consumption and resource utilization. The adders are implemented in Verilog code, and synthesized in Xilinx ISE 13.4 platform. The device chosen for implementation is Virtex 6 (XC6VLX240T) with FF1156 package. The simulation results shows that Wallcace tree adder gives the best performace among all the adders for all the parameters taken up for consideration.
