I. Introduction
Cryptography is a method of storing and transmitting data in a particular form, so that those whom it is intended only can read and process the data. Cryptography is very essential for security purpose in data transmission. For its hardware implementation Montgomery modular Multiplication algorithms is used. Mostly in Public Key Cryptography this logic is used in data encryption process. Montgomery algorithm can be classified into two types based on its operation. They are Full-Carry-Save Montgomery modular Multiplication (FCS-MM) and Semi-Carry-Save Montgomery modular multiplication (SCS-MM1) forms. In FCS-MM both the obtained carry and sum are considered as outputs. In SCS-MM only the sum which was obtained is considered as output. When compared to FCS-MM, SCS-MM is having a low area because of less number of adder levels in the basic algorithm.
In this paper we discuss about the Modified SCS-MM2 architecture and analyse it for 128-bit inputs. In SCS-MM algorithm it has three input A, B, N, and S as sum output .A is a Multiplicand, B is multiplier and N is modulus. There are some rules for considering the inputs. They are length of the inputs should be same. Modulus value should be always greater than the multiplicand and multiplier.
II. SCS-MM2 Algorithm
The modified SCS-MM2 algorithm is shown in fig.1 . Initially we make the carry and sum values as the sum of multiplier and the modulus this is pre-computation step. The steps from 3 to 4 iterates for K times. Here K represents the number of bits and i represents i th bit. In fig.1 suffix 0 represents the least significant bit. Adders are of many types. Out of those carry save adder is efficient because it is having less propagation delay. Carry Save adder for n-bit means it is having n-parallel adders, which produce n-bit sums and n-bit carry's. The inputs for carry save adder are SS,SC and mux output. Mux output depends up on "aa" and "qa" of a single bit. Here we considered "aa" as A[i] * B and gate Least Significant Bit. "qa" represents the sum of SS and "aa".
Fig1. Modified SCS-MM2 algorithm

Fig.1(a) Block diagram of SCS-MM2 algorithm.
In fig2 depending upon "aa" and "qa" values the third input for the CSA varies. This loop iterates for n times. The final stage sum is considered as the final output. The CSA block internally consists of full adders.
III. Proposed System
The main advantage of proposed system is to increase the speed of algorithm. It achieved by implementing a full adder using two peres gates. Fig.2(a) represents a full adder logic by using two peres gates. Peres gates are called reversible gates. The first peres gate resembles a half adder and the second also the same. These gates produce three garbage outputs, which we don't require. Here we neglected it. The performance of RCSA is higher than CSA. www.iosrjournals.org 50 | Page of n full adders which are made of peres gates. so that its performance increases. So that overall performance of SCS-MM2 algorithm was also increased.
IV. Results and Comparison
The design of SCS-MM2 and Modified SCS-MM2 has been made by using Verilog Hardware Description Language (Verilog HDL). The simulation results has been evaluated by using Modelsim 6.3c and synthesis Performances are estimated by using Xilinx 10.1 for 16-bit. Fig. 3(a) . Simulation Waveform of SCS-MM2 algorithm In Fig.3(a) . A, B, N are inputs and S is final output all the inputs and outputs are of same size. SS, SC are internal registers. All the inputs and outputs are of 16-bit. Finally obtained sum value is from the multiplication of A and B and modulus for the resultant. Fig. 3(b) . Simulation Waveform of Modified SCS-MM2 algorithm.
In Fig.3(b) . A, B, N are inputs and S is final output all the inputs and outputs are of same size. SS, SC are internal registers. Finally obtained sum value is from the multiplication of A and B and modulus for the resultant. which is same as the SCS-MM2 value.bit size is of 16bit. Table. 1 represents the timing report for SCS-MM2 and Modified SCS-MM2 architectures, using virtex-2 FPGA using Xilinx10.1 tool. So that the performance of proposed system increases. 
Table1. Parametric analysis of SCS-MM2 and modified SCS-MM2 for critical path
V. Conclusion
The proposed SCS-MM2 (Semi Carry Save Montgomery modular multiplication) for radix-2 architecture reduces number of critical path delay when compared to the existing logic. The SCS-MM2 (Semi Carry Save Montgomery modular multiplication) architecture is simulated using Modelsim and design verification, area timing report is done using Xilinx ISE 10.1. Finally, the proposed architecture can achieve reduced critical path, and increases the speed of operation.
