A novel algorithm for computing the discrete logarithm modulo 2 k that is suitable for fast software or hardware implementation is described. The chosen preferred implementation is based on a linear-time multiplier-less method and has a critical path of less than k modulo 2 k shift-and-add operations.
Introduction and summary: Hardware capabilities for integer arithmetic generally include addition, multiplication, and division with precision k typically chosen as 16, 32 or 64. Multiplication and division are often implemented by recursive bit serial algorithms employing O(k) serial additions to avoid the size and power requirements of a large multiplier. The integer addition and multiplication operations realised are effectively 'exact' residue arithmetic operations with modulo 2 k . Hardware support for applications where fast residue arithmetic computation is desirable is typically limited to only residue addition and multiplication. There is a need to find efficiently implementable algorithms for other fundamental residue operations for the 'hardware friendly' modulus 2 k . Furthermore, for implementations where hardware support does not include a large multiplier, there is a particular need for additive bit-serial algorithms for these additional residue operations.
The fundamental residue arithmetic operations supplementing residue addition and multiplication of particular interest for feasibility of hardware implementation are: multiplicative inverse, powering (or exponentiation), and discrete logarithm. Following [1] we herein employ jnj 2 k ¼ j to denote the congruence relation n j (mod 2 k ) with the residue j satisfying 0 j 2 k À 1. The discrete logarithm modulo 2 k with logarithmic base 3dlg( j) ¼ e of an odd residue j, 1 j 2 k À 1, is the minimum exponent e, when it exists, such that j3 e j 2 k ¼ j. Similarly, e ¼ dlg (b,M) ( j) represents the discrete logarithm modulo M with logarithmic base b of j: jb e j M ¼ j. From [2] [3] [4] , dlg( j) exists whenever j jj 8 2 {1, 3}, and also 0 dlg( j) 2 kÀ2 À 1. Furthermore, for any odd residue j with 1 j 2 k À 1, there is a unique sign, exponent pair (s, e) with s 2 {0, 1}, 0 e 2 kÀ2 À 1 such that
In [3] we showed that the pair (s, e) of (1) can be determined from the odd input j employing O(k) dependent modular multiplications. Our main result in this Letter is showing how to determine the pair (s, e) by a bit serial (shift-and-add) algorithm employing only O(k) dependent additions and a lookup table of size roughly k 2 bits.
Discrete logarithm modulo 2 k -algebraic properties: Lemma 1 represents the core result for Algorithm 1. We omit a formal proof and instead proceed with pointing out the essential mathematical properties that lead to a constructive proof and its corresponding algorithm.
Lemma 1: For any k ! 2, every odd integer j with 1 j 2 k À 1 has a unique modular factorisation
with factor selection specified by s 2 {0, 1} and
Notationally we use a j as shorthand for the jth digit of A, and a j i for the jth digit of A i . Also, we will call a residue t i ¼ j2 i þ 1j 2 k, 3 i < k to be a two-ones residue. The key advantage of multiplying by two-ones residues t i is that a multiplication by t i can be performed simply as a less expensive shift-and-add operation:
where P i ( i represents an i bits left shift of P i .
Our method consists of three stages. The first stage is initialising P 2 ¼ A. The second stage and main iteration step is updating
can be directly computed if dlg(B i ) are all known and P k j1j 2 k. For more mathematical details the reader is referred to [3] . We choose B i ¼ t i and update P i such that its last i digits become 00. . . 01 (i.e. jP i j 2 i ¼ 1):
Observation 1: Whenever the binary digit p i i of P i equals 1 (i.e. jP i j 2 i ¼ 1), multiplying P i with the two-ones residue t i ¼ (2 i þ 1) results in a product P iþ1 ¼ P i Â t i that is congruent with 1 modulo 2 iþ1 . That is:
When the binary digit p i i of P i equals 0, P iþ1 can be set to P iþ1 ¼ (P i Â 1) and it is still congruent with j1j 2 iþ1.
We show in Table 1 the (valid) 8-bit dlgs associated with the corresponding two-ones residues (i.e. t i 3
dlg(ti )
). Also, in the last column we suggest how the updating of the partial products P iþ1 works when p i i ¼ 1 and values t i are to be used. The values dlg(t i ) can be precomputed using any dlg method, e.g. the one we presented in [3] . Storing these values in a table requires a lookup table of uncompressed size smaller than k 2 bits. 6 0100 0001 01 0000 p 7 6 100 0001 Â 100 0001 ! p 7 7 000 0001 7 1000 0001 10 0000 1000 0001 Â 1000 0001 ! 0000 0001
Shift-and-add DLG algorithm:
Stimulus: A modulus 2 k with k ! 3 and an odd valued residue A ¼ a kÀ1 a kÀ2 . . . a 0 . Response: dlg(A), expressed as an (s, e) pair where j(À1)
The first-initialisation-stage is performed in lines L1-L3. If A is not congruent with 1 or 3 modulo 8, then jÀAj 2 k is, and the algorithm determines the dlg of jÀAj 2 k (i.e. P ¼ jÀPj 2 k in L2). The variable e represents the exponent of 3 that gives jP À1 j 2 i ¼ j3 e j 2 i. It is set to 0 in L1 corresponding to jPj 2 3 ¼ 1. In the case jPj 2 3 ¼ 011, e is adjusted in line L3 to be 1, along with the corresponding update of P (which is equivalent to P ¼ j3 Â Pj 2 k). In the second stage P is iteratively updated (conceptually) as a series of multiplications P iþ1 ¼ P i Â t i , while e is updated with the corresponding values dlg(t i ) looked up from a table. The updating of e and P in line L5 can be performed concurrently. The final result is computed in line L6 as the sign s and the exponent jÀej 2 kÀ2. This is because e really represents e ¼ dlg ((À1) s
As can be seen after a quick look at algorithm 1, its time complexity is essentially k dependent shift-and-add modulo 2 k operations. Figs. 1 and 2 are schematic diagrams of an implementation of the datapath portion of algorithm 1. Fig. 1 implements lines L1-L3 and sets up the appropriate values in the P and B registers based on the sign of A and the values of the least significant bits. Note that no error-checking circuitry is included and it is assumed that only odd values of A are used. Fig. 2 implements the iterative portion of the algorithm described in lines L4-L5. This circuit consists of a counter, a small lookup table that may be in compressed form, and three add=accumulate units. The values of P and B are stored in shift-registers that shift content to the left. These values are replaced by multiples of 2 k depending on the value of each p i bit. 
