k − 49 · 2 k + 4. We intend to show that the above estimates may be improved with the help of the result [3] stating that a sum of n bits may be computed via 4.5n operations instead of 5n as in the naive approach. The resulting bounds are M(n) ≤ 5.5n 2 − 6.5n − 1 + (n mod 2) and
where {Φ k } is the Fibonacci sequence: Φ 1 = Φ 2 = 1, Φ k+2 = Φ k+1 + Φ k . Auxiliary circuits. The circuits below are built from the following subcircuits: half-adders HA, HA ± , (3, 2)-carry save adders (CSA) F A 3 , F A − (the MDF A circuit was proposed in [3] ). Specifically, they implement the functions:
HA: (x 1 , x 2 ) → (u; v), where x 1 + x 2 = 2u + v; HA ± : (x 1 , x 2 ) → (u; v), where x 1 − x 2 = −2u + v; F A 3 : (x 1 , x 2 , x 3 ) → (u; v), where
, where x 1 +x 2 −x 3 = 2u+v, if x 1 +x 2 −x 3 ≥ 0; * Translated from the Russian original published in: Proc. of XXII Conf. "Information means and technology" (Moscow, November 18-20, 2014). Vol. 3. Moscow, MPEI, 2014, 180-187.
† e-mail: isserg@gmail.com
, where
These circuits are shown on Fig. 1 . Gates AND, OR, XOR are denoted by symbols ∧, ∨, ⊕, respectively. Inverted inputs are marked by small circles.
Figure 1: Auxiliary circuits
Standard method. The first stage of the standard method of multiplication of n-bit integers involves n 2 bit multiplications. The next stagemultiple addition -may be performed via summation of bits in consecutive columns (column is a set of bits of the same order). The summation utilizes the aforementioned auxiliary subcircuits. The result of a column summation is a bit of the product and a set of carries to the next order.
Let us index columns from 1 in increasing order. Then, after the first phase one has n − |n − k| bits in a k-th column, k = 1, . . . , 2n − 1. Consider the following rule of column summation: if there exist 5 summand bits, use MDF A; else, if there exist 3 summand bits, use F A 3 ; else, if there exist 2 summand bits, use HA.
Denote by h(k) number of summand bits in the k-th column after completion of summation in all lower-order columns. Clearly, h(1) = 1. Let us check by induction that h(k) = 2k − 2 for 2 ≤ k ≤ n. Obviously, the statement holds for k = 2. Assume, it also holds for k = t and consider summation in the t-th column. By the declared strategy, summation of 2t − 2 bits involves ⌊t/2⌋ − 1 circuits MDF A, one circuit HA, and in the case of odd t, one more circuit F A 3 . In total, it produces t − 1 carries to the next order. Hence, h(t + 1) = t + 1 + t − 1 = 2(t + 1) − 2, as required.
By analogy, we conclude that h(n+1) = 2n−2 and h(2n−k) = 2k +1 for 0 ≤ k ≤ n − 2. For summation in the (n + 1)-th column one use the same set of circuits as for the n-th column. For summation in (2n − k)-th column we use ⌊k/2⌋ circuits MDF A, and in the case of odd k = 1 an additional circuit F A 3 . For k = 1 we need a circuit SF A 3 instead of F A 3 , since summation in the previous column involves MDF A.
The use of MDF A requires a conversion to the special bit encoding (x, y) → (x, x ⊕ y). All MDF A outputs encoded this way may be connected to MDF A inputs of the same encoding, with the exception of the last MDF A, in the (2n − 2)-th column. Therefore, to execute all summations we need additionally q + 1 XOR gates, where q is a number of MDF A.
Therefore, if n ≥ 4, then the second stage of multiplication utilizes n circuits HA, n − 3 + 2(n mod 2) circuits F A 3 , one circuit SF A 3 , q = (n 2 − 3n)/2 + 1 − (n mod 2) circuits MDF A and q + 1 XOR gates (the number of MDF A is easy to derive from the number of (3, 2)-CSA, since MDF A reduces the total number of summand bits by 2, F A 3 or SF A 3 reduces it by 1; the number of summand bits before summation stage is n 2 , and at the end it is 2n). Summing up the complexities of subcircuits we can bound the complexity M(n) of the multiplication circuit as
2 − 6, 5n − 1 + (n mod 2).
The estimate holds also for 2 ≤ n ≤ 3. Karatsuba method. Represent two m-bit multiplication operands as A 1 2 n + B 1 and A 2 2 n + B 2 , where n = ⌈m/2⌉, 0 ≤ B i < 2 n , 0 ≤ A i < 2 m−n . Then, the product may be computed by the formula:
The implied circuit consists of two addition circuits computing A 1 + B 1 and A 2 + B 2 , three multiplication subcircuits for (n + 1)-bit, (m − n)-bit and n-bit operands, and a subcircuit for the final addition-subtraction. The structure of this final addition is shown on the pattern below (see Fig. 2 ). Symbols "+" and "−" denote summand and subtrahend bits, respectively. Columns are indexed so that an index i corresponds to a bit with weight 2 i . Pairs of bits in brackets are missing when m is odd. Denote by h + (k), h − (k) numbers of summand and subtrahend bits in the k-th column after completion of summation in all lower-order columns. One can easily verify that h + (n) = h − (n) = 2, h + (n + 1) = h − (n + 1) = 3 and h + (k) = 3, h − (k) = 4 for n + 2 ≤ k ≤ 2m − n − 1. We use one F A − 3 and one HA in the n-th column, one MDF A − and one HA ± in the (n+1)-th column, one MDF A − and one F A − 3 in any subsequent column up to (2m − n − 1)-th. When m is odd, we have h + (k) = h − (k) = 3 for k = 3n − 1, 3n − 2. Therefore, one MDF A − and one HA ± should be used in the corresponding columns.
Further, h
use HA elsewhere, but use XOR in the most significant column, since no carry is required there.
As in the standard method, conversion to the special pair-of-bits encoding requires q + 1 additional XOR gates, where q is the number of MDF A − subcircuits. Now, we're going to estimate the complexity K(m) of the multiplication circuit. It's a common knowledge, that the complexity of the addition of two n-bit numbers is 5n − 3, the addition of an n-bit number and an (n − 1)-bit number has complexity 5n − 6 (see e.g. Note, that one can save some gates. Columns indexed by n + i and 2n + i, for i = 0, . . . , n − 1, contain identical pairs of bits (from summands B 1 B 2 , −A 1 A 2 2 n and −B 1 B 2 2 n , A 1 A 2 2 2n , respectively). Arrange the computation process to pass these bits to the inputs of F A − 3 in columns indexed by n and n + 2, . . . , 2m − n − 1, and to the inputs of MDF A − encoded by (x, x ⊕ y) in other columns (that is, in (n + 1)-th column, and in the case of odd m, also in (3n − 2)-th and (3n − 1)-th columns).
Thus, for i = 0 and 2 ≤ i ≤ n − 1 − 2(m mod 2) we can save two gates XOR and ANDNOT via exploiting a CSA from Fig. 3a in a lower-order column and a CSA from Fig. 3b in a higher-order column (CSA's are different since the signs of bits in low-order and high-order columns are rearranged). For i = 1 and n − 2(m mod 2) ≤ i ≤ n − 1, one XOR gate is to be saved.
So, the following recurrent formulae hold:
A halving iteration of the Karatsuba method provides an advantage when m = 16 or m ≥ 18. Bounds on the complexity L(m) of multiplication of mbit numbers for m ≤ 18 are collected in the For the convenience of comparison, let us derive the complexity of the Karatsuba multiplication circuit in an explicit form for m = 2 k . Denote (2) imply X k+1 ≤ AX k + b k for k ≥ 4. Via common calculations, we obtain (1) as the solution of the latter inequality (initial values of the complexity should be taken from the Table 1 ).
Research supported in part by RFBR, grant 14-01-00671a.
