Brief Abstract We review the issues involved in building a special-purpose chip for performing RSA encryption/decryption, and review a few of the current implementation efforts.
I. Review of the RSA C r y p t o s y s t e m
The "RSA cryptosystem" [RSA78] was the Erst published solution to the problem of implementing a public-key cryptosystem {DH76] -a concept invented by DifEe and Hellman. I t remains today as the preeminent proposal for practical use. In this paper we review some of the considerations involved in implementing the RSA cryptosystem with special-purpose VLSI chips.
We begin by reviewing the RSA cryptosystcm itself. The reader who wishes a more detailed review of public-key cryptography might consult [De82) 
, [DH76], [DH79], or [RSA78].
A user A of the RSA cryptosystem creates his keys as follows: He first chooses at random two large (e.g. 100 decimal digit) prime numbers p and qHe then multiplies them together to get his public modulus n = p . q.
*He then chooses at random a large integer d which has no divisors in common with either
He then computes e aa the multiplicative inverse of d, modulo (p -1) . (q -1).
Me publishes as his secret key the pair (e, n), and keeps as his secret key the pair (d, n). (He Anyone else can then encrypt a message M for A using A's public key, resulting in ciphertext p -1 or 4 -1 .
may also wish to keep as part of his secret key the primes p and q.)
C, using the equation:
Similarly, A can decrypt the ciphertext C using the equation:
As an example, if we choose p = 47 and q = 59, we have n = 2773. If we then choose d = 157 we can compute e = 17 using the technique given in [RSA78]. The public key is then (e,n) = (17,2773) and the secret key is ( d , n ) = (157,2773). The message M = 31 can be encrypted using the public key t o obtain the ciphertext C = 3117 = 587 (mod 2773); decrypting yields the original message back: 31 = 58715' = 31 (mod 2773).
II. Security of the RSA Cryptosystem
The securikj of the RSA cryptosystem depends on the difEculty for the enemy of factoring the published modulus n. If the enemy can fador the number n, he can compute the secret key (d, n) and read all of A's private mail (or forge A's digital signatures).
The security of the RSA cryptosystem is not known to be equivalent to the problem of factoring; it may be possible to break the RSA cryptosystem without factoring n. However, the most efficient attacks found to date are all provably equivalent to factoring. One can prove that computing the secret key is equivalent to factoring, and some variations on the basic RSA scheme are provably equivalent to factoring for some attacks (see [12a79, WiSO]).
One interesting result, due to Andy Yao, is that the RSA system is "uniformly secure" in the sense that there can be no large sets of ''weak messages": if an enemy can decrypt a significant fraction of messages cncryptcd with the RSA cryptosystem, then he could effectively decrypt all messages. Putting it another way, if the RSA cryptosystem oifers security for the encrypted messages, then i t ofrcrs uniformly high security for all messages. This follows from the multiplicative nature of the RSA scheme.
Even stronger resul,ts along this line have been proven by a number researchers (see [ACGS84] and its extensive list of references). T h e essence of these results is that if the RSA cryptosystem is secure, then the enemy will not even be able to get various kinds of partiat information about the message from the ciphertext. (If he could, he would be able to get the whole message.)
Iw. H o w Hard is F a c t o r i n g ?
The best available algorithms for factoring large composite :integers have a running time which is proportional to: e,/ln(n).in(~n(n)) for factoring a k-bit number n. A very crude approximation to this, in the range we are interested in, is:
5.10"+(&).
In the range of interest, the difficulty of factoring seems to grow roughly one order of magnitude more difficult with each extra 50 bits (15 decimal digits) of modulus. At the moment, using available supercomputers, numbers with 71 digits can be factored in a reasonable length of time. Numbers with up to 100 decimal digits are plausibly factorable in the future using the best available algorithms and special-purpose hardware.
If we take as a bench-mark data point that a 75-digit number can be factored in about one day with today's technology, and using the above formulas, we can derive the following table: 75 digits -9 10l2 operations -1 day 100 digits -2 . 10'' operations -255 days 125 digits -3 . lo" operations -103 years 150 digits -3 . 10'' operations -9,755 years 175 digits -2 10" operations -70 thousand years 200 digits -1 . loz3 operations -36 million years 225 digits -5 . lo2* operations -1 billion years 250 digits -2 . loz6 operations -60 billion years 300 digits -1 .lo2" operations -5 . l o i 3 years
In our original paper [RSA78] we proposed that 200 decimal digits (around 664 bits) would be a reasonable modulus size; we still feel that this is a reasonable choice.
III. I m p l e m e n t a t i o n Basics and t h e Need for Special-Purpose VLsI

I E A . I m p l e m e n t a t i o n Basics
Multiplication of two k-bit integers takes time:
O(k') on a microcomputer using a standard algorithm, 0 O(k) with special-purpose serial/parallel multiplication hardware (O(k) gates), 0 O(log k) with special-purpose parallel-parallel multiplication hardware (O(k2) gates).
Using today's technology, the serial-parallel approach seems the best trade-off point.
Modular multiplication of two k-bit integers modulo a third k-bit integer takes time: O ( k Z ) on a microcomputer using standard algorithms, O(k) with special-purpose hardware (O(k) gates), O((10g k)'+€) with special-purpose hardware (O(k') gates).
Again, with today's technolobT, the O(k)-time, O(k)-hardware approach seems best.
Modrilar exponentiation is an interesting computational problem in that it seems intrinsically 'sequential": using extra hardware or extra parallelism doesn't seem to help beyond the amount it helps to speed up the underlying modular multiplications. To raisc a k-bit number t o a k-bit power modulo a k-bit modulus thus seems to require O(k) multiplications. We note that so-called %.trong" primes are not intrinsically more.difficult to find t h a t random
The second step of generating a n RSA key-set, finding e from d, is not harder than modular primes. (See the paper by J. Gordon in this proceedings.) exponentiation, since we have the relation:
Another approach, using the extended Euclidean algorithm for finding greatest common divisors, can also be used (see [RSA78] for details). The algorithm chosen here doesn't matter much since the bulk of work for key-generation will be in finding the large prime numbers.
ID. B. Implementation ideas for speed.
outlined above.
The following ideas may help speed up a n implementation, over and above the basic approach A fast d o c k rate may of course be very helpful. Using a short encryption ezponent (e.g. e = 3, as suggested by Knuth [Kn81, p. S G ] ) gives a 300-fold or so improvement in the speed of encryption and signature vcriGcations (operations which use the public key), but does not help with decryption or signing (operations which use the secret key). This trick can not be used on d as well, since the length of e plus the the lcngth of d should be approximately the length of n. Furthermore, if d is short it could be guessed, so a short d provides little security.
Using the Chinese Remainder Theorem -working modulo p and modulo q separately -can help speed up decryption and signing by a factor of 4 on a microcomputer and a factor of 2 to 4 using.O(k) hardware.
There are two basically different ezponentiation algorithms one may use: the left-to-right algorithm and the righl-to-le/t algorithm. These algorithms examine the bits of the exponent in different orders. suppose the exponent e has a binary representation of e k -l e k -2 . . .eleo. Then the algorithms for computing a ciphertext C from a message M both begin by setting C to 1, and then proceed as follows:
0 The Lejt-to-Right Algorithm: for i from k -1 down to 0, this algorithm first sets C to C2
The Right-to-left Algorithm: for i from 0 up to k -1, this algorithm first sets C to C M If the left-to-right algorithm is used, then the number of modular multiplications required in the worst case can be reduced from 2 . k to k + (4) by precomputing a table of M ' , . . ., M2'-' (i.e. by modifying the left-to-right algorithm to consider the exponent e in radix 2* instead of radix 2).
If the Tight-to-left algorithm is used, then by using twice as much hardware one can obtain a two-fold speed-up, since each squaring modular multiplication can be performed in parallel with the "accumulation" modular multiplication.
We note that the above two optimization techniques are incompatible, since they require different underlying exponentiation algorithms.
An elegant approach for speeding up the computation is to perform modular multiplication directly, rather than first performing an integer multiplication and then reducing the result modulo n as a separate step. This can yield a six-fold (approximately) speed-up, since the modular multiplication of two k-bit numbers can now be performed in approximately k clock cycles instcad of approximately 6 . k. (see (Br821) . (mod n) and then, if ei = I, sets C to C . M (mod n).
(mod n) if e i = 1, and then (in any case) sets M to iM2 (mod 71).
IV. Overview of E x i s t i n g / P l a n n e d Chips
In this section we review briefly six designs for RSA chips. These reviews are brief, and only -itended to give the reader a feel for the kinds of chips possible with today's technology. For more details the reader should consult the references. Also, there are other chips in the design stage for which no references exist; these chips are not listed here.
W.A. The "first" RSA chip
This chip was designed by Rivest, Shamir, and Adleman, and is described in [RiSO] .
It was a single-chip nMOS design; using 4-micron design rules, the chip occupied 42 mm2. It contained a 512-bit ALU in bit-slice design with eight 512-bit registers for storage of intermediate results, carry-save adder logic, and up-down shiftcr logic. The 224-word microcode RObf contained control routines for cncryption, decryption, finding large primes, gcd, ctc. It used a 5v supply, and drew approximately 1 watt of power. It contained approximately 40,000 transistors.
It communicated with a host microproccssor using an &bit 1/0 port. The encryption rate was designed to be slightly in excess of 1200 bits/second. Due to an as yet undiagnosed error in the memory cell design, this chip never worked rcliably.
W.D. T h e N E C / M i y a g u c h i Design
This chip ctcsign was dcscribcd in [Mi82]; I do not know if it was ever fabricated. The design was for a cascadable chip set, with each chip having a 2-bit slice. (So 333 chips would be needed for a 200 decimal digit modulus.) Each chip would contain a 2 by 8 multiplier; multiplication would be done bytc-wise (8 by n). h encryption rate of 50,000 bits/second was claimed possible for a 512-bit modulus using this design, or 29,000 bits/second using a 200 decimal-digit modulus.
W . C . The F i r s t Sandia Design
This chip, described in [RSW382], used a two-chip set to work with numbers up to 336 bits in length. Each of the two chips is identical and could perf0r.m a modular multiplication of 336-bit numbers. Using the right-to-left exponentiation algorithm, one chip repeatcdly squared the message while the other chip accumulated the product of the desired powers.
The chip was fabricated using 3-micron CMOS technology; the total area of the chip is 41 performs modular multiplications directly.
(For 512-bit moduli, four chips would be needed.)
The chip will be cascadable; t h e first chips made are likely to be a 128-bit slice of t h e set.
N.E. The "RSA S e c u r i t y " Design RSA Security, Inc., a new start-up in the data-encryption area, is designing an RSA chip for commeicial use [RSA84] . Currently in the design stage, the chip should be available in sample quantities in mid-1985.
Using 3-micron CMOS design rules, the chip should be approximately 47 mm2 in size. It will be able to handle numbers u p to 200 decimal digits (664 bits) in length, and should be able t o do one encryption in under 65 milliseconds (i.e. the data rate should be in excess of 9600 bits/second for a full-size modulus).
V. The F u t u r e ...
It is interesting to observe that seven years ago, when the RSA cryptosystem was invented, the task of implementing the RSA scheme in a reasonably secure manncr was quite expensive.
(For example, we built a $3000 TTL implementation that could only handle numbers slightly over 300 bits in length.) Today, a very secure implementation (664 bits) tits nicely on one chip. Seven years from now we may move from a 3-micron technology to a submicron (say 0.3 micron) technology, giving a 100-fold reduction in area. In this case the same RSA implementation will take only 1% of a typical chip. T h e steady progess of technology will clearly make cryptography SO cost-effective that no information system that handles data that is at all sensitive or t h a t needs to be authenticated can d o r d t o d o without it.
