

C Divya Vani \* et al. (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.5, Issue No.5, August - September 2017, 7408-7410.

# VLSI Implementation Of High Performance Montgomery Modular Multiplication For Crypto Graphical

#### C DIVYA VANI

M.Tech Student, Department Of ECE, Indur Institute of Engineering & Technology, Siddipet, T.S, India. V SRINU

Assistant Professor, Department Of ECE, Indur Institute of Engineering & Technology, Siddipet, T.S, India.

Abstract: The multiplier factor receives and outputs the data with binary illustration and uses solely onelevel Carry Save Adder (CSA) to avoid the carry propagation at each addition operation. This CSA is additionally accustomed perform operand pre-computation and format conversion from the carry save format to the binary illustration, leading to an occasional hardware price and short important path delay at the expense of additional clock cycles for finishing one standard multiplication. To beat the weakness, a Configurable CSA (CCSA), that may be one full-adder or 2 serial half-adders, is projected to scale back the additional clock cycles for quantity pre-computation and format conversion by 0.5. The mechanism which will notice and skip the surplus carry-save addition operations in the one-level CCSA design whereas maintaining the short important path delay is developed. The additional clock cycles for quantity pre-computation and format conversion is hidden and the high turnout is obtained. AES relies on a style principle called a substitution-permutation network, combination of each substitution and permutation, and is quick in each software package and hardware. AES doesn't use a Festal network. AES is a variant of Irondale that encompasses a fastened block size of 128 bits, and a key size of 128, 192, or 256 bits. By contrast, the Irondale specification in and of itself is nominative with block and key sizes that will be any multiple of thirty-two bits, both with a minimum of 128 and a most of 256 bits.AES operates on a 4×4 column-major order matrix of bytes, termed the state, though some versions of Irondale has a bigger block size and have further columns within the state. Most AES calculations square measure tired a special finite field.

*Keywords:* Carry-save addition; low-cost architecture; Montgomery modular multiplier; public-key cryptosystem;

## I. INTRODUCTION

In Many public-key cryptosystems, modular multiplication (MM) with large integers is the most critical and time-consuming operation. Therefore, numerous algorithms and hardware implementation has been presented to carry out the MM more quickly, and Montgomery's algorithm is one of the most well-known MM algorithms. Montgomery's algorithm determines the quotient only depending on the least significant digit of operands and replaces the complicated division in conventional MM with a series of shifting modular additions to produce  $S = A \times B \times R^{-1} \pmod{N}$ , where N is the bit modulus, R-1 is the inverse of R modulo N, and  $R = 2k \mod N$ . As a result, it can be easily implemented into VLSI circuits to speed up the encryption/decryption process. However, the threeoperand addition in the iteration loop of Montgomery's requires long carry propagation for large operands in binary representation. To solve this problem, several approaches of based on carrysave addition were proposed to achieve a significant speedup of Montgomery MM. Based on the representation of input and output operands, these approaches can be roughly divided into semicarry-save (SCS) strategy and full carry-save (FCS) strategy. In the SCS strategy, the input and output operands of the Montgomery MM are represented in binary, but intermediate results of shifting modular additions are kept in the carry-save format

to avoid the carry propagation. However, the format conversion from the carry-save format of the final modular product into it's the binary representation is needed at the end of each MM. This conversion can be accomplished by an extra carry propagation adder (CPA) or reusing the carry-save adder (CSA) architecture iteratively. Contrary to the SCS strategy, the FCS strategy maintains the input and output operands A, B, and S in the carry-save format denoted as (AS, AC), (BS, BC), and (SS, SC), respectively, to avoid the format conversion, leading to fewer clock cycles for completing an MM. Nevertheless, this strategy implies that the number of operands will increase and that more CSAs and registers for dealing with these operands are required. Therefore, the FCS based Montgomery modular multipliers possibly have higher hardware complexity and longer critical path than the SCS-based multipliers.

## **II. PREVIOUS STUDY**

The crucial path delay of SCS-based multiplier factor can be reduced by combining the benefits of FCSMM-2 and SCS-MM-2. That's pre-cipher D = B + N and reuse the one-level CSA design to perform B+N and the format conversion. Figure shows the changed SCS-based Montgomery multiplication (MSCS-MM) algorithm and attainable hardware design, respectively. To zero, which might be accomplished exploitation one NOR operation. The Q\_L circuit decides the energy



price. The carry propagation addition operations of B + N and also the format conversion area unit performed by the one-level CSA design of the MSCS-MM multiplier factor through repeated execution the carry-save addition (SS, SC) = SS +SC + zero till SC = zero. In addition, we tend to conjointly pre-compute Ai and energy in iteration i-1 (this is going to be explained more clearly in Section III-C) in order that they'll be want to immediately choose the required input quantity from zero, N, B, and D through the electronic device money supply in iteration I [4]. Therefore, the crucial path delay of the MSCSMM multiplier will be reduced into TMUX4 + TFA. However, additionally to acting the three-input carry-save additions k + a pair of times, several additional clocks cycles area unit needed to perform B + N and also the format conversion via the one-level CSA design as a result of they must be performed once in each millimetre. Furthermore, the additional clock cycles for performing B+N and also the format conversion through repeated execution the carry-save addition (SS, SC) = SS+SC+0 area unit captivated with the longest carry propagation chain in SS + SC. If SS = 111...1112 and SC = 000...0012, the one-level CSA design desires k clock cycles to complete SS + SC. That is 3k clock cycles within the worst case area unit needed for finishing one millimetre. Thus, it is crucial to scaling back the specified clock cycles of the MSCS-MM multiplier factor [5].



Fig.2.1. FCS-MM-1 multiplier.

## III. PROPOSED ADVANCED ENCRYPTION STANDARD

AES is predicated on a style principle referred to as a substitution-permutation network, combination of both substitution and permutation, and is quick in each software and hardware. Its forerunner DES, AES does not use a gala network. AES may be a variant of Irondale which encompasses a mounted block size of 128 bits, and a key size of 128, 192, or 256 bits. against this, the Rijndael specification as such is given with block and key sizes that may be any multiple of thirty-two bits, each with a minimum of 128 and a most of 256 bits. AES operates on a 4×4 column-major order matrix of bytes, termed the state, though some versions of Rijndael have a bigger block size and have additional columns within the state. Most AES calculations are drained a special finite field. The key size used for associate degree AES cipher number of repetitions specifies the of

transformation rounds that convert the input, known as the plaintext, into the ultimate output, known as the ciphertext. Every spherical consist of many process steps, every containing four similar however completely different stages, including one that depends on the encoding key itself. A set of reverse rounds square measure applied to transform cipher text back to the initial plaintext using an equivalent encoding key.



Fig.3.1. SCS-MM-New multiplier.

# IV. SIMULATION RESULTS

In this section, we tend to 1st analyze the important path delay and space of the planned SCS-MM-New number according to the knowledge listed in Table II. Then, the delay and area square measure compared thereupon of previous styles. Additionally, the average clock cycles of various Montgomery multipliers to complete onemillimetre operation are measured. Finally, several Montgomery multipliers square measure enforced and synthesized to demonstrate the potency of the planned approach. To more verify the potency of the planned style, we synthesized those Montgomery standard multipliers listed in Table III by Synopsys style Compiler with TSMC 90-nm CMOS cell library. Later, the Cadence Sock Encounter was utilized to perform the position and routing. Delay estimations were obtained behind RC extraction from the placed and routed net lists. The implementation results, including important path delay (Delay), the hardware space (Area), the execution time (Time), and also the outturn rate of those modular multipliers square measure given in Table III. The execution time is the needed time to accomplish one Montgomery millimeter, i.e.,  $\#Cycle \times Delay$ . The outturn rate is developed because the key size increased by the frequency (the reciprocal of Delay) and then divided by #Cycle. As the results are shown in Table III, the planned SCS-MM-New number has the shortest important path delay and needs fewer clock cycles to finish one Montgomery MM, and so spends the smallest amount execution time and achieves the highest outturn rate. Note that the important path delay of SCS-MM-1(64) is considerably long by the 64-bit CPA\_FC. On the opposite hand, the SCS-MM-2 number generally has smaller space than alternative styles. The planned SCS-MM-New number additionally desires additional space than the SCS-MM-2



because of additional multiplexers number introduced to shorten the important path delay and cut back the desired clock cycles. Withal, the realm of the planned SCS-MM-New multiplier remains but that of FCS-based numbers. As a consequence, SCS-MM-New will get the tiniest ATP than previous radix-2 Montgomery multipliers. When compared with the FCS-MMM42 number, the planned 1024-bit (2048-bit) SCS-MM-New number achieves 28.1% (22.4%) shorter important path and thirty-three.5% (34.4%) smaller hardware space, resulting in twenty-nine.8% (21.5%) outturn enhancement and forty eight.8% (46.0%) ATP improvement. The ends up in Table III square measure in keeping with the analyses in Section IV-A and show that the planned approach is indeed capable of considerably enhancing the performance of radix-2 CSA-based Montgomery number whereas maintaining low hardware complexness.



Fig.4.1. Output functions.

# V. CONCLUSION

To enhance the performance of Montgomery MM hardware maintaining the low whereas complexness, this paper has changed the SCSbased Montgomery multiplication rule a cheap and high performance Montgomery standard number. The multiplier used one-level CCSA design and skipped the extra carry-save addition operations to mostly scale back the important path delay and needed clock cycles for finishing the onemillimeter operation. FCSbased multipliers maintain the input and output operands of the Montgomery millimeter within the carry-save format to flee from the format conversion, leading to fewer clock cycles, however, and larger space than SCS-based multiplier. In Future, for cryptographers, a crypto logical "break" is something quicker than a brute force performing arts one trial decipherment for every key (see Cryptanalysis). This includes results that square measure infeasible with current technology. The biggest successful publically glorious brute force attack against any block-cipher encoding was against a 64- bit RC5 key.

## **VI. REFERENCES**

[1] R. L. Rivets, A. Shamir, and L. Adelman, "A technique for getting digital signatures and public-key cryptosystems," Commun. ACM, vol. 21, no. 2, pp. 120–126, Feb. 1978.

- [2] V. S. Miller, "Use of Elliptic curves in cryptography," in Advances in Cryptology. Berlin, Germany: Springer-Vela, 1986, pp. 417–426.
- [3] N. Koblitz, "Elliptic curve cryptosystems," Math. Comput., vol. 48, no. 177, pp. 203– 209, 1987.
- P. L. Montgomery, "Modular multiplication while not Trial Division," Math. Comput., vol. 44, no. 170, pp. 519–521, Apr. 1985.
- [5] Y. S. Kim, W. S. Kang, and J. R. Choi, "Asynchronous implementation of 1024-bit standard processor for RSA cryptosystem," in Proc. 2nd IEEE Asia-Pacific Conf. ASIC, Aug. 2000, pp. 187–190.
- [6] V. Bunimov, M. Shimmer, and B. Tong, "A complexity-effective version of Montgomery's algorithm," in Proc. Workshop advanced. Effective styles, May 2002.
- [7] H. Shingling, R. M. Al Should, and V. P. Smirching, "An economical architecture of 1024-bits crypto processor for RSA cryptosystem based mostly on changed Montgomery's formula," in Proc. Fourth IEEE Int. Workshop Intel. Knowledge Acquisition Adv. Compute. Syst., Sep. 2007, pp. 643–646.
- [8] Y.-Y. Zhang, Z. Li, L. Yang, and S.-W. Zhang, "An economical CSA architecture for Montgomery standard multiplication," Microprocessors Microsyst., vol. 31, no. 7, pp. 456–459, Nov. 2007.
- [9] C. McIvor, M. Malone, and J. V. McCanny, "Modified Montgomery modular multiplication and RSA mathematical process techniques," IEE Proc.- Comput. Digit. Techn., vol. 151, no. 6, pp. 402–408, Nov. 2004.
- [10] S.-R. Kuang, J.-P. Wang, K.-C. Chang and H.-W. Hsu, "Energy-efficient highthroughput Montgomery standard multipliers for RSA Cryptosystems," IEEE Trans. terribly Giant Scale Integr. (VLSI) Syst., vol. 21, no. 11, pp. 1999–2009, Nov. 2013.