Modulo 2 n +1 multiplier is the key block in the circuit implementation of cryptographic algorithm such as IDEA and also widely used in the area of data security applications such as residue arithmetic, digital signal processing, and data encryption that demands low-power, area and high-speed operation. In this paper, a new circuit implementation of an area and power efficient self-checking modulo 2 n +1 multiplier based on residue codes are proposed. Modulo 2 n +1 multiplier has the three major stages: partial product generation stage, partial product reduction stage, and the final adder stage. The last two stages determine the speed and power of the entire circuit. An efficient self-checking modulo 2 n + 1 multiplier based on residue codes are proposed to detect errors online at each single gate during the data transmission and produce an error at the gate output, which may propagate through the subsequent gates and generate an error at the output of the modulo multiplier. The proposed self-checking modulo multipliers for various values of input are specified in Verilog Hardware Description Language (HDL), simulated by using XILINX ISE and synthesized using cadence RTL encounter tool.
INTRODUCTION
Now-a-days, the information safety is given utmost priority. The security requirements and confidentiality of data transmission through channels, is becoming more and more important due to rapid increase in popularity of internet and wireless communication nodes, which makes cryptography play a vital role in this information age. So various cryptographic algorithms have been studied and carried out to ensure security for data transmission. In cryptography, a symmetric key algorithm such as International Data Encryption Algorithm (IDEA) is one of the most reliable cryptographic algorithms used for secure data transmission. Modulo 2 n +1 multiplier is one of key block in the circuit implementation of IDEA. Hence, an area and power efficient self checking modulo 2 n +1multiplier is proposed to protect data against errors in security applications.
The transient and permanent faults occurred during the transmission of data can be detected online by using selfchecking circuits and systems. So self-checking circuits are designed using various coding schemes such as arithmetic codes to check the functionality of the circuits. In recent years, Residue number system (RNS) is widely used in modulo arithmetic applications, so in many of the industrial applications self-checking arithmetic circuits are designed using residue codes to detect faults online as soon as they occur during data transmission.RNS has become so popular because computations are performed so efficiently that is the, computations of large integers are performed parallel by using set of small integers so that overall performance of the system increases.
In past decades, different architectures of modulo 2 n +1 multiplier are proposed. According to Cruiger"s [3] , three different multiplication architectures are proposed: The first architecture is designed by using a (n+1) × (n+1) bits multiplier followed by modulo adders to correct errors caused by carry. The second architecture is realized by using modulo 2 n +1 adder, which consists of a carry-save adder and a final carry-select addition unit to reduce design complexity [2] . The third architecture, is realized by modifying the second architecture by reducing the circuit area significantly and by introducing a bit-pair recoding scheme in the carry-save adder block [3] operating speed is increased. The last two architectures are suitable for full-custom design [2], because they increase design challenges such as layout and fabrication complexity. Later, Zimmermann [4] , implemented a new high speed, low power modulo 2 n +1 multiplier which has a three major parts: partial products generation stage, partial product reduction stage, and the final addition stage.
In this paper, area and power efficient self checking modulo 2 n +1 multiplier based on residue codes are implemented. Section 2 introduces 3 stages of modulo 2 n +1 multiplier and different type of compressors used in modulo 2 n +1 multiplier [1] . Section 3 discusses the implementation of the selfchecking modulo 2 n +1 multiplier, modulo generators and dual rail checker. Experimental results showing the simulation results of self-checking modulo 2 n +1 multiplier circuits are given in Section 4.
MODULO 2 n +1 MULTIPLIER
Modulo 2 n +1 multiplier is widely used in many data security applications such as digital signal processors and cryptographic applications .Modulo 2 n +1 multiplier today has 16 characteristics of high speed, low power and small area which is suitable for VLSI implementation. Modulo 2 n +1 multiplier consists of three major parts: partial products generation block, partial product reduction block, and the final addition stage. The last two stages determine the speed and power of the whole circuit. The enhancements can be done in the partial product reduction stage and the final adder stage to achieve higher speed and lower power as these two stages are the critical path of the multiplier.
Algorithm for Partial Product Generation Stage
Consider A and B are two inputs represented as A = a n a n-1 a n- Due to the reposition operation, the final n × n partial product matrix results in a correction factor of:
Algorithm for Partial Product Reduction Stage
In this stage, the final n×n partial product matrix is reduced into a sum vector and a carry vector using compressors. The carryout bit of each level of the compressor has to be fed back as the carry-in bit of the next subsequent level, this leads to a correction factor of:
For a n-bit modulo 2 n +1 multiplier, the final correction factor can be calculated as: In partial product reduction stage, a correction factor of "2" is added and remaining correction factor of "1" is added to the final adder stage due to the inverted carry feedback. The nth bit of carry vector is inverted and repositioned as shown in Figure 5 . 
Description of Compressors
The partial product reduction stage determines the speed and power of the entire modulo 2 n +1 multiplier [1] . So, a group of low-power high-speed compressors must be designed in this stage. Traditional compressors are designed with full adders, but these designs occupy too much chip area and consume more power .To overcome this, now compressors are designed with MUX and XOR-XNOR sub circuits. The new MUX based compressor architecture meets the requirement of low power and high speed since it use less number of transistors compared to the full adder based compressors. 
18
Among these architectures, 6(c) is the best choices, because of it"s less delay, consumes less power, and occupies small silicon area compared to remaining architectures. So this architecture 6(c) is widely used in low-power high-speed applications.
Algorithm for the Final Addition Stage
The final adder stage is a sparse tree based inverted EndAround-Carry (EAC) adder which is revised from conventional Kogge-Stone adder. Three major operations performed in final addition stage are: a) Computation of carry-generate bits G i ,the carrypropagate bits P i , b) Carry computations, c) 4-bit conditional sum generator.
The carry-generate bits G i , the carry-propagate bits P i are computed from the final sum vector and carry vector of partial product reduction stage, for every ,0 1 i i n    according to:
Carry computation is performed using parallel prefix operator, which associates pair of carry generate and carry propagate signals and was defined in [7] as:
:
Here (G k:j, P k:j) denotes generate and propagate term with k>j. Since for every carry, C i =G i:0 , several algorithms have been introduced for computing all the carries using only "o" operators. Among all algorithms, sparse tree based inverted end around adder is preferred because the area of the design and the delay of the final adder stage is significantly reduced, while the wire interconnection problem is also solved. The design of sparse tree adder relies on the use of a sparse parallel-prefix carry computation unit and conditional sum generator blocks. 
The 4-bit conditional sum generator computes two sets of sum bits corresponding with the possible values of the incoming carry. When the carry is computed, the correct sum is selected without any delay overhead. 
International Journal of Computer Applications (0975 -8887)

SELF-CHECKING MODULO 21 n  MULTIPLIER DESIGN
Error detection and correction in binary multiplier makes use of residue codes required for computing the residue modulo A of a binary number. So, a self-checking circuit which extracts the residue of a binary input number of arbitrary widths, with respect to any odd modulus A is designed. A self-checking circuit should satisfy 3 properties to detect errors online. The properties are:

Fault secure: For any fault in the fault set, if the circuit do not generate an incorrect code word for any input code, then the circuit is called fault secure circuit.
Self-testing:
For all faults in the fault set, at least one input code word generates an output which is not a valid code word. 
Modulo a Generator
Modulo A Generator are widely used in self-checking digital circuits to detect errors immediately as soon as they occur. In modulo a generator based on residue codes, the input operands are divided into n-bit vectors which then added by a sparse tree inverted end around carry adder. Let us consider the input operand n is a 16-bit binary number, and check base as"4", now the input operand is divided into 4 bytes based on 
Dual Rail Checker
Dual rail checker in self-checking circuits plays a major role, since it detects all the faults occurred in the circuit. Dual rail checker is used to check dual blocks and duplicate blocks by inverting one of the outputs. It consists of four coded input vector of various length and two outputs. One output shows correct operation, and another shows the presence of error. 
EXPERIMENTAL RESULTS
The simulation results of an area and power efficient selfchecking modulo 2 n +1 multiplier are carried out using Xilinx ISE and synthesized using cadence RTL encounter tool .The comparison of area and power among 3:2, 4:2 and 5:2 compressors are shown in Table 1 , we observe that the 3:2 compressor occupies less area and consumes less power compared to 4:2 and 5:2 compressors. 
CONCLUSION
In this paper, a new design of an area and power efficient selfchecking modulo 2 n +1 multiplier based on residue codes are proposed to detect errors online. To reduce area and power, MUX-based compressors are used instead of full adders to implement modulo multipliers and modulo A generators. Sparse-tree based inverted end-around-carry adder is used instead of Kogge-stone adder, to reduce the critical path of circuit. The proposed self checking modulo 2 n +1 multiplier based on residue codes are simulated using Xilinx-ISE and synthesized using cadence RTL encounter tool. 
REFERENCES
