Two modified architectures for modulo 2 n +1 adders are introduced in this paper. Only some of the carries of modulo 2 n +1 addition are computed in sparse carry computation unit present in first architecture. This sparse approach is introduced by inverted circular idempotency property of the parallel-prefix carry operator and in this modified preprocessing stage and carry select blocks are combine the multiplexer operation of a diminished-one adder can be implemented in smaller LUT's and less consumes power, while maintain the same operating speed and delay. The modulo adder 2 n +1 adders can be easily derived by adding extra logic of modulo 2 n -1 adders present in second architecture.
INTRODUCTION
Modulo 2 n +1 arithmetic has been used in cryptography [1], [2] .Cryptography is the art of protecting information by transforming encryption into an unreadable format called chipper text. Decryption posses a secret key to change the message into plain text. Generally cryptography systems can be classified into symmetric-key system and public-key system. Symmetric-key system uses single key to key both sender and recipient and public-key systems are used two keys, a public key known to everyone and a private key that only the recipient of message uses.
The modulo 2 n +1 arithmetic unit complexity is determined by chosen for the operands representation. Three representations are considered namely, the normal weighted-one, diminishedone [3] , and the signed-LSB representation [4] . In above only we consider the first two representations, since the adoption of the signed-LSB representation does not lead to more efficient circuits in delay or area terms. In every case, when performing modulo 2 n +1 arithmetic operation, the input operands and results are limited between 0 and 2 n . In the normal-weighted representation, each operand requires n+ 1 bit for its representation but only utilizes 2 n +1 representation out of the 2 n+1 that these can provide. A denser encoding of the input operands and simplified arithmetic operations modulo 2 n +1 are offered by the diminished-1 representation. In the diminished-1 representation, each number Z IS represented by as a z Z * , where a z is a single bit, often called the zero indication bit, and Z * is an n-bit vector, often called the number part. If Z>0, then a z =0 and Z * = Z-1, Where for A=0, a z = 1 and Z * = 0.
Related Work
Many papers have attacked the problem of designing efficient diminished adders. The majority of them rely on the use of an inverted end around carry (IEAC) n-bit adder, which is an adder that accepts two n-bit operands provides a sum increased by one compared to their integer sum if their integer addition does not result in a carry output. Although an IEAC adder can be implemented by using an integer adder in which carry output is connected back to its carry input via an inverter, such a direct feedback is not a good solution. Since the carry input, a direct connection between them forms a combinational loop that may lead to an unwanted race condition [5] .Considering diminished-1 representation foe modulo 2 n +1 arithmetic operation, [3], [4], used an IEAC adder which is based on an integer adder along with an extra carry look ahead (CLA) unit. The CLA unit computes the carry output which is then inverted used as the carry input of the integer adder. Zimmerman [6] , proposed IEAC adders that makes use of a parallel-prefix computation unit along with an extra prefix level that handles the inverted end around carry. Although these architectures are faster than the carry look ahead ones proposed in [7] , for sufficiently wide operands, they are slower than the corresponding parallel-prefix integer adders because of the need for the extra prefix level. In [7] it has been shown that the recirculation of the inverted end around carry can be performed within existing prefix levels, that is, in parallel with the carries computation. In this way, the need of the extra prefix level is canceled and parallelprefix IEAC adders are derived that can operate as fast as their integer counter parts. Unfortunately, this level of performance require more area then the solutions of [6] , since a double parallel-prefix computation tree is required in several levels of the carry computation unit. For reducing area complexity of the parallel-prefix by select-prefix [8] and circular carry select [9] IEAC adders can be proposed. Although a modulo 2 n +1 adder follows the (n+1)-bit weighted representation can be designed following principles of generic modulo adder design [10] , [11] . However, it has been recently shown [12] that weighted adder can be designed efficiently by using an IEAC one and a carry save adder (CSA) stage. As a result, improving the design for an IEAC adder would improve weighted adder design as well.
Parallel-Prefix Addition Basics
Generally parallel-prefix n-bit adder considered as a three stage circuit. They are pre-processing-stage, carrycomputation-unit and post-processing-stage. Suppose that and B = B n-1 B n-2 . . . B 0 represent the two numbers to be added and S = S n-1 S n-2 . . . S 0 denotes their sum. 
Pre Processing Stage
The pre processing stage computes three type of signal bits. They are carry-generate bits G i , the carry-propagate bits P i , and the half-sum bits H i , for every I, 0 ≤ i ≤ n-1, according to Where •, +, denote the logical AND, OR, and EXCLUSIVE-OR, respectively. The pre-processing-stage is shown in the figure 2. 
Carry Computation Unit
The second stage of the adder, here after called the carry computation unit, computes the carry signals C i , for 0 ≤ i ≤ n-1 using the carry generate and carry propagate bits G i and P i . Carry computation transformed into a parallel prefix problem using the • operator, which associate pairs of generate and propagate signals and defined as
In a serious of associations of consecutive generate/propagate pairs (G, P) , the notation (G k:j , P k:j ) with k>j, used to denote the group generate/propagate term produced out of bits k, k-1, . . .j, that is,
Since every carry C i = G i:0 , a number of algorithms have been introduced for computing all the carries using only • operator. The prefix operator is shown in the figure 3. 
Post Processing Stage
The third computes the half sum bits according to
The post processing stage is shown in the figure 4. The computation of modulo 2 n -1 addition is, in fact, a conditional operation defined as, A modulo 2 n -1 adder can be implemented using an integer adder that increments also its sum when the carry output is one, that is, when A + B ≥ 2 n . the conditional increment can be implemented by an additional carry incremental stage as shown in figure 5 . In this case, one extra level of • cells driven by the carry output of the adder, is required.
Depending on the implementation of the modulo 2 n -1 adder, for bitwise-complementary inputs, i.e., when A + B = 2 n + 1, the adder may produce an all 1s output vector, in place of the expected result which is equal to zero. In most applications, this is acceptable as a second representation for zero.
International Journal of Computer Applications (0975 -8887) Volume 70-No.4, May 2013

Fig. 5. Design of a prefix modulo 2 8 -1 adder
The implementation of a modulo 2 n -1 adder requires the connection of the carry output C n-1 = G n-1:0 of a integer adder to its carry-input port. The carries of the modulo 2 n -1 adder C -i that take place also a carry-input port are equal to = G i:0 + P i:0 . C in . Therefore, connecting the carry output to the carry input leads to = G i:0 + P i:0 . G n-1:0 . This relation contains many redundant terms and according and simplified to
The simpler equation can be equivalently expressed using the • operator as follows:
The above equation (3) that computes the modulo 2 n -1 carries has a cyclic form and, in contrast to integer addition, the number of generate and propagate pairs (G i , P i ) that need to be associated for each carry is equal to n. This means that the parallel-prefix carry computation unit of a modulo 2 n -1 adder has significantly increased area complexity than that of a corresponding integer adder. In terms of delay, the carries C -can be computed in log 2 n levels using regular parallel-prefix structures using end around technique. At each level of the parallel-prefix structure, larger groups of (G i , P i ) are progressively associated and the carries C -are computed at the last level. The final sum bits S -i are equal to H i . The above from of modulo 2 n -1 adder suffers from the double representation of zero. Few solutions have been reported on the design of a modulo 2 n -1 adder with a single zero representation. Those proposed by [13] have an increased delay compared to those with a double zero representation since they rely on using H i instead of P i as the carry propagate signal, while those proposed in [14] compute the modulo carries C -as
•(P i+1 , P i+1 ) that is, by using P i+1 instead of G i+1 . Although this change seems minor, it ruins the regularity of the adders, and the interconnect area. In the rest of this paper, we consider modulo 2 n -1 adder with a double representation for zero.
Modulo 2 n +1 Adder
Diminished-1 modulo 2 n + 1 addition is more complex since special care is when at least one of the input operand is zero (1 00…..0). The sum of a diminished-1 modulo adder is derived according to the following cases:
(1) When none of the input operand is zero (a z , bz ≠ 0) their number parts A * and B * are added modulo 2 n + 1. This operation as discussed in the following can be handled by an IEAC adder.
(2) When one of the two input's are zero, the result is equal to the non zero operand.
(3) When both operands are zero, the result is zero.
In any case that the result is equal to zero (case 1 or 3), the zero-indication bit of the sum needs to be should be equal to the all-zero vector. According to above, a diminished-1 adder is needed only in case 1, while in the other cases the sum is known in advance.
When none of the input operands is zero, a z , b z ≠ 1, the number part of the diminished-1 sum is derived by the number parts A * and B * of the input operand as follows:
In analogous way to that of the modulo 2 n -1 case, [8] has shown that the carry at the i th bit position of an IEAC adder, when feeding the carry input C in = with the inverted carry out = can be computed more simply by … (5) Equivalently, using the • operator the IEAC addition carries can be expressed as
Where by definition, is equal to ( , and the final sum bits are equal to Using the above simplified carry equations. 
+1 ADDERS
In this section, we focus on the design of diminished modulo adders with a sparse parallel-prefix carry computation stage that can use the same carry-select block as the sparse integer adders.
4.1Partially Regular Sparse Parallel-Prefix Adders
The carries of the diminished-1 modulo 2 n + 1 addition are associated in the very same way as the carries of the integer addition. To this end, the inverted circular idempotency property is introduced by the following Theorem: The inverted circular idempotency indicates that we can repeat (G i , P i ) terms that appear at the front of a prefix relation of the form suggested by (5) inverted at its tail. Armed with the inverted circular idempotency, we will present the modified proposed methodology by using as an example the design of a sparse-4 parallel-prefix modulo 2 16 + 1 adder. Since we assume a sparsity-4, only one every four carries is generated at positions 3, 7, 11, 15, and 31 or equivalently -1.
Totally Regular Parallel-Prefix Units
The methodology presented in [7] is the only approach known so far that can organize the computation of the carries , in case of modulo 2 n + 1 addition, in a parallel-prefix-like form with prefix levels. As also shown in the figure some prefix operator are double up, since two carry computations need to be performed in parallel; one on normal propagate and generate signals, while the other on their complements. The problem gets worse when the input operands' width is not a power of two. Although, the sparse version of the parallelprefix adders introduced in this paper alleviates a lot the regularity and the area-overhead problem, as it can be verified from figure 6 , there is still a lot space for improvement.
In the following, this problem is solved by introducing a new prefix operator and an even simpler parallel-prefix carry computation unit. The new technique will be presented via an example. Let the design of a sparse-4 diminished-1 modulo 2 16 + 1 adder be considered. In this case, a carry computation unit is needed that implements the following prefix equation
To this end, a new operator called gray operator is introduced. The implementation of a gray operator is given in figure. It accepts five inputs and produces four outputs. Three of the inputs of a gray operator residing at prefix level j -1, namely , and form the operator's vertical input bus, while the rest two and form its lateral input bus. The lateral bus signals are driven inverted to the operator. The gray operator produces three signals for vertical successor of prefix level j ( , and ) and one (c j ) for its lateral successor. Note that compared to the prefix operator, the gray one requires one extra gate, but does not require logic levels. Considering a sparse-2 k parallel-prefix carry computation unit, gray operators will not be used in the first k prefix levels, since these need only compute the group generate and propagate terms out of 2 k adjacent bit positions.
Fig. 7. Design of a gray operator
The logic equations performed by a gray operator residing at prefix level j -1 are Consider now that we connect to 0, and to and , respectively, and and to and , respectively.
Then, the gray operator will provide as its lateral output. More importantly, it will also provide at its vertical outputs:
An information which as we will show in the following suffices for the vertical successor to compute out of and . Consider the vertical successor of the aforementioned gray operator which resides in prefix level j. by using a gray operator in its place in which we connect and to and , respectively, and , and to the vertical output of the gray operator of level j-1 mentioned above, that is, and The lateral output of the operator will be equal to = = , = = And will provide at its vertical outputs:
Applying the same procedure recursively, the lateral output of the last vertical successor of a gray operator will be equal to
That is, equal to
From the above analysis, it is concluded that starting from a sparse architecture with doubled up operators, it suffices to 1 Remove the doubled up operators that associate inverted signals, 2 Replace the top operator of every column excluding the leftmost that accepts a feedback signal with a gray one, with its input tied to zero, and
Replace every vertical successor of a gray operator introduced by the previous step with a gray one,
To attain a diminished-1 modulo 2 n + 1 adder, in which two gray operator are used. The top one which resides at prefix level 3, accepts a feedback signal and therefore has its input tied to zero. This operator is used to compute ( ) which is necessary for the computation of both . Its vertical successor is also replaced by a gray operator that computes the final: n -1 addition. These architecture preserve all the benefits of parallel-prefix carry computation units and can be easily designed for every n. more specifically, Dimitrakopoulos et al. [16] generalized the design of such units for all values of n and has provided easy-to-follow topographical design rules. The resulting structures for n ≠ 2 k save significant amount of area without scarifying delay. Therefore it is concluded that mapping the diminished modulo 2 n + 1 adder design problem to that of modulo 2 n -1 addition, would beneficial given all the efficient architectures that have been proposed for the latter. In the following, it is shown that this mapping requires a constant time post processing stage and analyzes its area and time overhead.
Modulo 2 n -1 Unification Theory
In order to unify the parallel-prefix modulo 2 n -1 addition principles, there is a need to explore the relation between carries of these two addition operators, that is, between and
The relationship that connects the recirculation carry-out bits and that are employed for the derivation of the sum bits on the least-significant position zero is trivial (5), the derived term is the definition of the carry signal in the case of diminished-1 modulo 2 n + 1 addition.
The direct consequence of the newly derived relationship is that we can compute the carries for the case of modulo 2 n + 1 adders directly from the a modulo 2 n -1 carry computation unit by a stage of XOR gates that will combine the carries with terms
At first, it may seem complicated to compute since it requires a complete carry tree to be added for the computation of . However, based on theorem 2 the computation of is straightforward and can be implemented at low cost.
Theorem 3 .
where Proof: By definition we know that and that Therefore, the term written as
The term can be easily proven that is equal to .
Hence,
In the manner, the term in the parentheses leading to
Applying the same rule recursively n times we get Therefore, from the two namely introduced theorems, the diminished-1 modulo 2 n + 1 sum can be derived from the corresponding modulo 2 n -1 sum as follows: by definition, we know that Replacing with its new value we get that We identify that is the corresponding sum .
Thus, it holds that for i ≠ 0 … (7) Also based (6), the sum bit is simply equal to . An arithmetic example illustrating the derivation of a diminished-1 modulo 17 sum via a modulo 15 adder and some extra logic is given in figure. According to section 3, one or both the input operands in case of diminished-1 representation may be equal to zero. It is decided to handle this case by setting the corresponding diminished-1 carries to zero. However, when using a modulo 2 n -1 adder for implementation of a diminished-1 adder an even simpler approach can be employed: when at least of the input operand is zero, i.e., a z = 1 or b z = 1, then we ignore the term for the derivation of the bit and keep the original sum of the modulo 2 n -1 adder. This simple condition can be efficiently implemented by the following equations:
for i ≠ 0 and …(8)
Modified Pre Processing Stages in Both Architectures
Fig. 11. Design of a Modified Pre Processing Stage
Here Boolean expressions are used to reduce the gate count. In normal pre processing stage contain 7 gates and modified pro processing stage are reduced to 3 gates and it contain only 4 gates to produce generate, propagate, and half sum bits.
Simulation And Implementation Results
Initially the VHDL coding is performed to the design and it is then implemented in XILINX ISE 9.1E kit. The implementation is performed with 8-bit input. The experimental results for the parameters namely power, delay, frequency and LUT count obtained for modulo 2 8 +1 design are obtained. For the parallel addition operation, three stages are used. Thus the first stage(pre processing stage) are modified has less amount of power consumption compared to the earlier method. The results are obtained and are tabulated as follows. 
CONCLUSIONS
Power efficient modulo 2 n + 1 adders are appreciated in a variety of computer applications such as cryptography. In this paper, two modified contributions are offered to the modulo 2 n + 1 addition problem.
A novel architecture has been modified the sparse totally regular parallel-prefix carry computation unit. This architecture was modified by using the Boolean expressions are reduced to parallel-prefix carry computation unit in modulo 2 n + 1 addition. The experimental results indicate that the modified architecture approximately decrease or increase the earlier solutions in implementation LUT'S and power consumption, while maintain the same operating speed and delay.
The modulo 2 n + 1 addition problem was also shown to be related to the modulo 2 n -1 addition problem. The unified theory presented in this paper shown that a simple post processing stage composed of an XOR gate for each output bit needs to be added to a modulo 2 n -1 adder for attaining a modulo 2 n + 1 adder.
REFERENCES
[1] www.cs.kent.edu/~rothstei/modular_arith.ppt 
