In look-up table based multiplication schemes, techniques based on tables of squares require less memory than techniques based on direct implementations. In this paper, we present a method to realize an n-bit multiplier using a table of squares for n-bit integers. A new technique to store tables of squares is also presented. The new scheme is shown to compare favorably, in terms of storage requirements, with a scheme wherein the entire table of squares is stored directly. The addressing requirements of the new storage technique are also discussed.
Introduction
The look-up table approach for multiplication is well known 1]. Rather than a direct implementation, it is of advantage to use a look-up table based on a single operand. This reduces memory requirements signi cantly. Ling 2] proposed a single operand-based transform technique to realize a multiplier. Another alternative is a table of squares. In this technique, the squares of all the numbers in the input range are stored in a single table. Given two numbers to be multiplied, the table of squares and a set of simple arithmetic operations are used to obtain their product.
In this paper, we present a technique by which a table of squares for n-bit numbers can be used to realize an n-bit multiplier. We also present a technique by which a table of squares is stored in a compact form in several ROMs. We refer to such an implementation as a split ROM implementation. The square of a number is obtained by simply concatenating the outputs from the di erent tables, stored in memory. We demonstrate that, when compared to the single-table approach, the cost of our technique, in terms of storage requirements is signi cantly lower. In general, addressing requirements will also be simpli ed. The rest of this paper is organized as follows. In Section 2, a procedure to multiply two numbers based on the table of squares is presented. Split-ROM implementations for the table of squares are discussed in Section 3. The paper concludes in Section 4 with a summary of our results.
Squares-Based Multiplication
The technique of using squares to compute products is well known 3]. However, to realize an n-bit multiplier the method in 3] requires a table of squares for (n + 1)-bit integers. An n-bit table of squares is su cient to realize an n-bit multiplier. Let A and B be the two numbers whose product is required. Let Clearly, for arbitrary integers one or both of x and y may be negative. Given an integer z, the value of z 2 is independent of the sign of z. If negative numbers are represented in the signmagnitude representation, then storing the squares of integers in the positive half of the input range will be su cient to realize the multiplier. However, if negative numbers are represented in the two's complement notation, then the squares of the numbers in both the positive and negative halves of the input range will have to be stored. This doubles memory requirements. This redundancy may be avoided by using only the magnitudes of x and y to access the table of squares. Alternately, using only the magnitudes of the input operands A and B and ensuring that jAj jBj will also su ce. The two's complement representation of the product, if it is negative, is computed after the magnitude of the product is obtained. We now consider the determination of the square of a given binary number.
Reducing Memory Requirements
In this section, we present a technique to reduce the memory needed to store the table of squares.
Split-ROMs
Let N 1 and N 2 be two binary numbers such that the k least signi cant bits of N 1 and N 2 are identical. Then, the least signi cant (k + 1) bits in N The two tables, for such a bifurcation, are shown in Figure 1 . For ease of understanding, the row address is given as a decimal number. Given a 6-bit binary number N, its square is obtained as follows: Table A is used to obtain the 7 most signi cant bits in N 2 . The value of N is used to address the appropriate row in Table A. Table B is used to obtain the 5 least signi cant bits in N 2 . The 4 least signi cant bits in N are used to address the appropriate row in Table B . As an example, consider the 6-bit binary number N = 61 = 111101. The four least signi cant bits are 1101, which form the decimal number N 1 = 13. The 7-bit number stored at address 61 in Table  A is 1110100. The 5-bit number stored at address 13 in Table B entries. The total number of bits used to implement the table would then be 544. The table of squares may also be stored in more than two parts. The squared value is then obtained by concatenating the outputs from all the tables. Figure 2 shows one method of storing the squared values of all the numbers in range 0 to 63 in three separate tables. The following procedure is used to determine the square of a 6-bit binary number using the three component table shown in Figure 2 . Given a 6-bit binary number N, let N 1 be the binary number formed by the 5 least signi cant bits of N. Similarly, let N 2 be the binary number formed by the 3 least signi cant bits of N. The 6 most signi cant bits in N 2 are obtained from Table A by accessing the entry at address N. The next 2 signi cant bits are obtained from Table B by accessing the entry at address N 1 . Finally, the 4 least signi cant bits are obtained from Table C by The number of bits needed to store the table of squares for 6-bit numbers in three parts is 64 6 + 32 2 + 8 4 = 480 bits. Thus, one can reduce the number of bits needed to save data considerably by storing the table of squares in a partitioned form. It may be noted that the separate tables may all be accessed simultaneously. Further, the desired output is obtained by simply concatenating the outputs of the separate tables. Therefore, the time needed to access separate tables will not be very di erent from the time needed to access a single table. Tables of squares are often used in residue-based number system applications 3]. In such an application there are two options. One may store the residues of the squares, for various moduli, directly. Alternately, the residues can be obtained from a table of squares. The partitioning techniques described in this paper are applicable only when the table of squares is stored, or if the modulus is a power of two.
Table of Squares in n Blocks
A partitioning technique, similar to that used in Figure 1 , may be applied repeatedly to the table of squares to partition it into n blocks. Consider the n. In Table B1 , the entries in row k are the n least signi cant bits of k 2 . Two (n ? 1) bit numbers can have up to (n ? 2) bits in common. Consequently, Table B1 can itself be partitioned into two tables, Table B and Table C1 . (n ? 1). In Table C1 , the entries in row k are the (n ? 1) least signi cant bits of k 2 . Similarly, Table C1 can be further partitioned into two tables, Table C and Table D1 .
Thus, after the initial bifurcation, a table of order 2 (n?k) (n?k+1) can be partitioned into two tables, one of order 2 (n ?k). This procedure is executed for all k from k = 1 through k = (n?2). The smallest table is of order 2 2. Figure 3 illustrates the division into the largest possible number of components, using our technique, for n = 4. The square of the number is obtained using a method similar to that used for the tables in Figure 2 . That is, given a number N, the least signi cant bit is used to address the appropriate entry in Table D In contrast, the number of bits required to implement the table of squares directly is n 2 (n+1) . Table 1 compares the storage requirements of the two schemes for di erent values of n. Even if such a complete partition may not be feasible, the numbers in Table 1 also serve as a bound on the number of bits needed to implement the table of squares.
Address Decoding
When a table of squares is realized using more than one integrated circuit, partitioning the table of squares can lead to a direct reduction in the memory requirements without any additional costs. Assume for example, that ROMs of size 1024 4 are being used to realize a table of squares for 12-bit numbers. The squares would be 24-bit numbers. Twenty four integrated circuits would be required to realize the conventional table of squares. Recall that the least signi cant twelve bits are periodic with period 2
11
. Consider the case where the table of squares is partitioned into two components, wherein the least signi cant 12 bits are stored only once. Then, the number of integrated circuits needed to implement the table of squares is only eighteen. None of the integrated circuits in the partitioned table need be modi ed in any manner. When the table is partitioned into three components, only 16 integrated circuits are required. 7 = 448 bits. Further, only six 8 1 multiplexers, one 4 1 multiplexer and one 2 1 multiplexer are needed for the column decoder. In many decoding schemes the cost of the column decoder is proportional to the sizes of multiplexers. In such implementations the split ROM technique will also lower addressing costs. However, in other implementations the cost of the column decoder may be increased by requiring that di erent types of multiplexers be used. Consequently, the exact reduction in area obtained with the split ROM technique will be OR gates are required to obtain an appropriate decoder from the decoder inputs to column c i?1 . The main drawback of this approach is that it requires additional decoding logic. However, since 2-input OR gates contains only 4 transistors, a reduction in transistor count is obtained in many cases. Though a reduction in transistor count may be obtained, due to the additional addressing complexity, this technique may not lead to a net decrease in area.
Two Dimensional Decoding
However, even when redundant information is stored, the split ROM technique can provide a reduction in the size of the memory and the complexity of the decoding logic. It must be noted that the real bene ts which accrue will be dependent on many choices made during the implementation of the system. We conclude with a summary of our results.
Conclusion
The use of look-up tables for multiplication is well known. The amount of data to be stored is reduced by using procedures based on the table of squares. In this paper, a new method based on split ROMs to store the table of squares was presented. The number of bits required to store the table of squares is reduced considerably using the new approach. In comparison to the look-up table based multiplication scheme in 2], the algorithm presented in this paper is simpler and requires less memory. An address decoding scheme for the partitioned table was also discussed. 
