Synthesis of Parallel Binary Machines by Dubrova, Elena
ar
X
iv
:1
10
5.
45
14
v1
  [
cs
.C
R]
  2
3 M
ay
 20
11
Synthesis of Parallel Binary Machines
Elena Dubrova,
Royal Institute of Technology, IMIT/KTH, 164 46 Kista, Sweden
Abstract—Binary machines are a generalization of Feedback
Shift Registers (FSRs) in which both, feedback and feedforward,
connections are allowed and no chain connection between the
register stages is required. In this paper, we present an algorithm
for synthesis of binary machines with the minimum number of
stages for a given degree of parallelization. Our experimental
results show that for sequences with high linear complexity
such as complementary, Legendre, or truly random, parallel
binary machines are an order of magnitude smaller than parallel
FSRs generating the same sequence. The presented approach can
potentially be of advantage for any application which requires
sequences with high spectrum efficiency or high security, such as
data transmission, wireless communications, and cryptography.
Index Terms—Feedback shift register, sequences, nonlinear
complexity
I. INTRODUCTION
In information theory, it is known that any binary sequence
with a finite period can be generated by a binary machine
shown in Figure 1 [1]. An n-stage binary machine consists of
an n-stage binary register, n updating Boolean functions, and
a clock. At each clock cycle, the current values of all stages
of the register are synchronously updated to the next values
computed by the updating functions. Binary machines can be
viewed as a more general version of Feedback Shift Registers
(FSRs).
Suppose we would like to construct a binary machine which
generates the following binary sequence:
A2 = (0,0,1,1,0,1,1,1,0,0,1,0,1,1,1,0,1,1,0,0).
Since the output of a binary machine equals to the least
significant bit of its current state, any assignment of states
S2 = (s0,s1, . . . ,s19) such that si mod 2 = ai results in a binary
machine with generates A. For example, we can use
S2 =(0,2,1,3,4,5,7,9,6,8,11,10,13,15,17,12,19,21,14,16)
where even and odd integers are assigned in an increasing
order. From S2 we can easily see how many stages a binary
machine should have to generate A2. The largest element of
S2 is 21. We need 5 bits to expand it in binary. Thus, a binary
machine generating A2 should have at least 5 stages.
As in the case of traditional Finite State Machines (FSM)
synthesis [2], for different state assignments we usually get
different next state functions. The circuit complexity of these
functions may vary substantially for different state assign-
ments. We can also use the one-hot encoding instead of the
binary one. Then, the number of stages will increase, but the
complexity of functions might decrease in some cases.
Next we describe an intuitive idea behind the algorithm for
synthesis of parallel binary machines presented in this paper.
...
clock
...
updating functions
output
0
f0
fn−2
fn−1
n−2n−1
n register stages
Fig. 1. A binary n-stage machine with the degree of parallelization one.
Suppose that we use the encoding (00) = 0, (01) = 1, (10) = 2,
(11) = 3 to encode the binary sequence A2 from the example
above into the following quaternary sequence:
A4 = (0,3,1,3,0,2,3,2,3,0).
We can construct a quaternary machine generating A4 (in
which the stages of the register can store 4 different values and
the updating functions are 4-valued) by choosing a sequence of
states S4 =(s0,s1, . . . ,s9) such that si mod 4= ai. For example,
we can assign the states as follows:
S4 = (0,3,1,7,4,2,11,6,15,8).
Note that the largest element of S4 is 15. We need 2 quaternary
digits to represent it. Thus, we can generate A4 using a
quaternary machine with 2 stages (see Figure 4(a)). Such a
quaternary machine can, in turn, be converted into a binary
machine by encoding each 4-valued function by a pair of
Boolean functions and by replacing each quaternary stage
by two binary stages (see Figure 4(b)). The resulting 4-stage
binary machine generates the same binary sequence A2 as in
the example above, but two bits per clock cycle. Note, that is
the example above we needed 5 stages to generate A2 one bit
per clock cycle. So, we constructed a parallel binary machine
which has fewer stages than the theoretical lower bound on
the number of stages in a binary machines generating the same
sequence sequentially bit by bit.
Later in the paper, we show that the number of stages
can be reduced even further by using the 8-ary encoding.
What is even more important, we reduce not only the number
of stages, but also the circuit complexity of the updating
functions. Our experimental results show that for sequences
with high linear complexity such as complementary, Legendre,
or truly random, parallel binary machines are an order of
magnitude smaller than parallel FSRs generating the same
sequence. Therefore, the presented approach can potentially
be useful for any application which requires sequences with
high spectrum efficiency or high security. Such applications
include data transmission, wireless communications, cryp-
tography, and many others [3], [4], [5], [6]. A particularly
attractive application is encryption and authentication systems
for smartcards and Radio Frequency IDentification (RFID)
tags. A low-cost RFID tag can spare only a few hundred
gates for security functionality [7]. None of the available
cryptographic systems satisfies this requirement at present [8].
The rest of the paper is organised as follows. Section II
describes basic notation and definitions used in the sequel. In
Section IV, we present an algorithm for constructing an m-
ary machine with the minimum number of stages generating
a given m-ary sequence. In Section V, we show how m-
ary machines can be encoded to generate binary sequences
in parallel and demonstrate that such an encoding can be
of advantage. Section VI presents the experimental results.
Section VII concludes the paper.
II. PRELIMINARIES
Let M = {0,1, . . . ,m− 1}. An m-ary sequence is vector
Am = (a0,a1, . . . ,) where ai ∈ M for all i ≥ 0.
If there exist k > 0 and k0 ≥ 0 such that ai = ai+k for all
i ≥ k0, then A is called eventually (or ultimately) periodic. If
k0 = 0, then A is called purely periodic, or simply periodic.
The least integers k0 and k with this property are called pre-
period and period of the sequence, respectively [9].
For a multiple-valued function f : Mn → M, the i-set of f
is defined by [10]
i-set( f ) = {x ∈ Mn : f (x) = i}.
In the binary case, 0-set and 1-set correspond to off-set and
on-set of f , respectively [11].
An m-ary n-stage machine consists of n m-ary storage
elements, called stages. Each stage i ∈ {0,1, . . . ,n−1} has an
associated state variable xi ∈ M which represents the current
value of the stage i and an updating function fi : Mn → M
which determines how the value of xi is updated.
A state of an n-stage machine is a vector of values of
its state variables. At every clock cycle, the next state of a
machine is determined from its the current state by updating
the values of all stages simultaneously to the values of the
corresponding fi’s.
The degree of parallelization of an n-stage machine is the
number of stages p, 1 < p≤ n, which are used to produce its
output at each clock cycle.
III. PREVIOUS WORK
For the case of Linear FSRs (LFSRs), there are two main
approaches to constructing an LFSR with the degree of par-
allelization p: (1) synthesis of subsequences representing p
decimation of some phase shift of the original LFSR sequence
and (2) computation of the set of states reachable from any
state in p steps.
Let S be a sequence produced by an LFSR whose charac-
teristic polynomial g(x) of degree n is irreducible in GF(2).
Let α be a root of g(x) and let T be the period of S. In the
method based of synthesis of subsequences [12], the sequence
S is decomposed into p subsequences S jp, each representing a
p decimation of jth phase shift of S. In other words, the ith
element of S jp is equal to i · p+ j element of S. By Zierler’s
theorem [13], for 0 ≤ j < p, the subsequences S jp can be
generated by an LFSR with the following properties:
• The minimum polynomial of αd in GF(2n) is the char-
acteristic polynomial q∗(x) of the new LFSR which has:
– Period T ∗ = T/gcd(d,T ),
– Degree n∗, which is the multiplicative order of 2 in
Z(T ∗).
The Berlekamp-Massey algorithm [14] or its generaliza-
tions [15] can be used to find the smallest LFSR for each
subsequence S jp. The size of each LFSR is n∗, which is at most
n, i.e. the overall number of bits in p LFSRs is at most p×n.
This method is applicable to any degree of parallelization p
which is not a multiple of the period T .
The second approach is based on computing the set of
states reachable from any state in p steps. This is usually
done by computing pth power of the connection matrix of
the LFSR [16], [17]. Such an approach is applicable to the
degrees of parallelization 1 < p ≤ n. The size of the register
with the degree of parallelization p in this case is the same as
the size of the original LFSR, n.
For the case of Non-Linear FSRs (NLFSRs), algorithms for
finding a shortest NLFSR generating a given binary sequence
have been presented in [18], [19], [20], and [9]. An NLFSR
with the degree of parallelization p can be constructed by
computing the set of states reachable from any state in p
steps, as in the approach (2) for LFSR. This can be done by
computing pth power of the transition relation of the NLFSR.
However, the size of pth power of the transition relation of
an NLFSR usually grows much faster than in the LFSR case.
Therefore, in practice, in applications which use NLFSRs with
the degree of parallelization p, NLFSRs are selected so that
variables of the p left-most stages of the NLFSR are not used
in the updating functions. In such a case, an NLFSR with the
degree of parallelization p can be constructed by duplicating
the updating functions p times [21], [22], [23].
For binary machines with the degree of parallelization
one, an algorithm for constructing a shortest binary machine
generating a given binary sequence has been presented in [24].
IV. SYNTHESIS ALGORITHM
The algorithm presented in this section exploits the property
of m-ary n-stage machines that any m-ary n-tuple can be the
next state of a given current state. Note that, in the traditional
n-stage NLFSRs in the Fibonacci configuration [1], the next
state overlaps with a current state in n− 1 positions. NLF-
SRs in the Galois configuration are more flexible. However,
Algorithm 1 Construct an m-ary machine which generates
an m-ary sequence A = (a0,a1, . . . ,ak) with the degree of
parallelization one.
1: for every i from 0 to m− 1 do
2: Ni := 0; /*counts the number of digits with value i∈M*/
3: end for
4: for every j from 0 to k− 1 do
5: Na j := Na j + 1;
6: end for
7: Nmax := maxi∈MNi
8: for every i from 0 to m− 1 do
9: Bi := /0
10: for every j from 0 to Nmax− 1 do
11: Bi := Bi∪{ j ∗m+ i};
12: end for
13: end for
14: for every i from 0 to m− 1 do
15: Bi := [bi,0,bi,1, . . . ,bi,Nmax−1] is an arbitrary permutation
of Bi;
16: ri := 0; /*records how many elements of Bi were used*/
17: end for
18: for every j from 0 to k− 1 do
19: s j := ba j ,ra j ; /*ba j,ra j is the ra j th element of Ba j */
20: ra j := ra j + 1;
21: end for
22: n = ⌈logmNmax⌉+ 1;
23: for every j from 0 to k− 1 do
24: Expand s j as an m-ary vector s j :=
(s jn−1 ,s jn−2 , . . . ,s j0) ∈ Mn;
25: end for
/*The resulting sequence S = (s0,s1, . . . ,sk−1) is inter-
preted as a sequence of states of an m-ary n-stage ma-
chine*/
26: for every p from 0 to n− 1 do
27: for every i from 0 to m− 1 do
28: i-set( fp) = /0;
29: end for
30: end for
31: for every j from 0 to k− 1 do
32: for every p from 0 to n− 1 do
33: i = s( j+1)p ;
34: i-set( fp) = i-set( fp)∪{(s jn−1 ,s jn−2 , . . . ,s j0)};
35: end for
36: end for
37: Return ( f0, f1, . . . , fn−1);
since they do not allow feedforward connections, their set of
possible next states is still restricted to a certain subset of all
possible states [25].
The input of the algorithm is an m-ary sequence A of
length k. First, we show how to construct a sequence of
integers S = (s0,s1, . . . ,sk−1) such that s j mod m = a j for all
j ∈ {0,1, . . . ,k− 1}. We count the number of occurrences of
each of digits with the value i ∈ M in A, Ni, and determine
the largest number of occurrences, Nmax = maxi∈MNi.
Let Bi be a set consisting of Nmax non-negative integers of
type j ·m+ i for all j ∈ {0,1, . . . ,Nmax−1} and all i ∈ M. Let
Bi = [bi,0,bi,1, . . . ,bi,Nmax−1] be an arbitrary permutation of Bi.
Initially, for all i ∈ M, we set to zero a counter ri which
counts how many digits of Bi have been used. Then, for every
j from 0 to k− 1, we take the jth element of the sequence
A, a j, and assign s j to ra j th element of Ba j . It is easy to see
from our construction that s j mod m is equal to ai.
Let S = (s0,s1, . . . ,sk−1) be a sequence constructed as de-
scribed above. Each integer si ∈ S can be represented as an m-
ary expansion (sin−1 ,sin−2 , . . . ,si0)∈Mn where n is the number
of m-ary digits needed to represent the largest integer of S and
si0 is the least significant digit of the expansion. We interpret
each n-tuple (sin−1 ,sin−2 , . . . ,si0) as a state of an m-ary n-stage
machine. By construction, si0 = ai for all i ∈ {0,1, . . . ,k−1}.
Next, we define a mapping si 7→ si+1, for all i∈ {0,1, . . . ,k−
1}, where ′′+′′ is mod k. This mapping assigns si+1 to be the
next state of a current state si of an m-ary n-stage machine.
Each of mn−k remaining states of the m-ary n-stage machine
are left unspecified. This gives us a freedom to specify the
updating functions in a way which minimizes their circuit
complexity.
The i-sets of the updating functions implementing the re-
sulting mapping are derived as follows. Initially i-set( f j) = /0,
for all j ∈ {0,1, . . . ,n− 1} and all i ∈ M. For every j from 0
to k− 1, and every p from 0 to n− 1, if s( j+1)p 6= 0, where
′′+′′ is mod k, then we add (s jn−1 ,s jn−2 , . . . ,s j0) to the i-set of
fp where i = s( j+1)p .
The algorithm described above is summarized as Algo-
rithm 1. Its worst-case time complexity is O(n · k) (assuming
k > m which is normally the case).
Theorem 1: The Algorithm 1 constructs an m-ary n-stage
machine generating an m-ary sequence A of length k with the
degree of parallelization one where n is given by
n = ⌈logmNmax⌉+ 1, (1)
where Nmax = maxi∈MNi.
Proof: At the step 7 of the Algorithm 1, for each i ∈ M, Ni
equals to the number of digits with the value i in the sequence
A. From the step 6 of the Algorithm 1 we can conclude that,
for each i ∈ M, the largest integer si ∈ S such that si mod
m = i is equal to m(Ni − 1)+ i. We need ⌈logmNi⌉+ 1 m-ary
digits to express this integer for any Ni > 0. Since k > 1, the
number of stages in the m-ary n-stage machine is given by
⌈logmNmax⌉+ 1 where Nmax = maxi∈MNi.
✷
The Lemma below shows under which conditions that the
bound given by (1) is an exact lower bound.
Lemma 1: Given a purely periodic m-ary sequence Am with
the period k, any m-ary machine which generates Am the
degree of parallelization one has at least n stages, where n
is given by (1).
Proof: The existence of an m-ary machine with n =
⌈logmNmax⌉+ 1 stages which can generate Am follows from
the Theorem 1. It remains to prove that no m-ary n′-stage
machine with n′ < n can generate Am.
Assume that such a machine exists. Then, if Am is purely
periodic and has the period k, to be able to generate one digit
of Am per clock cycle with the period k, the m-ary n′-stage
machine must have at least Ni distinct states whose 0the stage
has the value i. We need at least ⌈logmNi⌉+1 m-ary stages to
implement the largest of these states for any Ni > 0. So, we
can conclude that n′ ≥ ⌈logmNmax⌉+ 1 which contradicts the
assumption that n′ < n.
✷
As an example, consider the 4-ary sequence from the
Introduction section:
A4 = (0,3,1,3,0,2,3,2,3,0).
We have Nmax = 4. So:
B0 = {0,4,8,12},
B1 = {1,5,9,13},
B2 = {2,6,10,14},
B3 = {3,7,11,15}.
Suppose we use following permutations of Bis:
B0 = [0,4,8,12],
B1 = [1,5,9,13],
B2 = [2,6,10,14],
B3 = [3,7,11,15].
Then we get:
S4 = (0,3,1,7,4,2,11,6,15,8).
Since Nmax = 4, from the Theorem 1 we can conclude that
the quaternary machine which generates A has 2 stages. By
applying the mapping described in the Algorithm 1 to S, we
get the following i-sets for the updating functions f0 and f1:
0-set( f1) = {(00),(03),(10),(20)}
1-set( f1) = {(01),(13),(23)}
2-set( f1) = {(02),(33)}
3-set( f1) = {(22)}
0-set( f0) = {(13),(20),(33)}
1-set( f0) = {(03)}
2-set( f0) = {(10),(23)}
3-set( f0) = {(00),(01),(02),(12)}.
The defining tables of these functions are shown is Figure 2.
The symbol ”-” stands for a don’t care value.
Note that, in Lemma 1, we require that A is purely periodic
with the period k. The need for the latter condition is obvious:
if A repeats two or more times within the input sequence
length k given to the Algorithm 1, then we need less than eq.
(1) stages to generate A. The former condition is necessary
because, in the sequence is eventually periodic, we might be
able to generate is with a binary machine with less than eq.
(1) stages. As an illustration, consider an eventually periodic
x0\x1 0 1 2 3
0 0 0 0 -
1 1 - - -
2 2 3 - -
3 0 1 1 2
x0\x1 0 1 2 3
0 3 2 0 -
1 3 - - -
2 3 3 - -
3 1 0 2 0
Function f1(x0,x1) Function f0(x0,x1)
Fig. 2. Defining table for the updating functions of the 4-ary 2-stage machine
in Figure 4(a). The symbol ”-” stands for a don’t care (unspecified) value.
x01x00\x11x10 00 01 10 11
00 0 0 0 0
01 0 0 0 0
10 1 1 0 0
11 0 0 0 1
x01x00\x11x10 00 01 10 11
00 1 1 0 0
01 1 0 0 0
10 1 1 0 0
11 0 0 1 0
Function f11(x00,x01,x10,x11) Function f01(x00,x01,x10 ,x11)
x01x00\x11x10 00 01 10 11
00 0 0 0 0
01 1 0 0 0
10 0 1 0 0
11 0 1 1 0
x01x00\x11x10 00 01 10 11
00 1 0 0 0
01 1 0 0 0
10 1 1 0 0
11 1 0 0 0
Function f10(x00,x01,x10,x11) Function f00(x00,x01,x10 ,x11)
Fig. 3. Defining tables for the updating functions of the binary 4-stage
machine in Figure 4(b) for the case when all don’t cares are specified to 0.
The pairs ( f11, f10) and ( f01, f00) encode the 4-valued functions f1 and f0 in
Figure 2, respectively.
binary sequence (1,1,0,0,1,0,1,0,1) with pre-period 3 and
period 2. By using Algorithm 1, we can construct a binary
machine with 4 stages which repeats this sequence with the
period 9. However, we can also construct a binary machine
with 3 stages whose state transition graph has a cycle of
length 2, corresponding to the period (0,1) and has a branch
implementing (1,1,0) which leads to the cycle. In some cases,
the binary machine constructed by the latter approach might
be smaller than the one constructed using the Algorithm 1.
V. GENERATION OF BINARY SEQUENCES
We can use m-ary n-stage machines for generating binary
sequences by encoding their m-ary stages and m-valued func-
tions using at most (⌈log2m⌉ · n) binary stages and Boolean
functions.
An an example, consider the quaternary 2-stage machine
from the example in the previous section. Figure 4(a) shows
its quaternary implementation. Figure 4(b) shows the same
machine in which the updating functions f0 and f1 are encoded
by a pair of Boolean functions ( fi0, fi1), i ∈ {0,1}, using the
encoding 0 = (00), 1 = (01), 2 = (10), 3 = (11). The defining
tables for the Boolean functions are shown in Figure 3. We
specified all don’t cares of f0 and f1 to 0. The resulting binary
4-stage machine generates the following sequence A2 two bits
per clock cycle:
A2 = (0,0,1,1,0,1,1,1,0,0,1,0,1,1,1,0,1,1,0,0). (2)
As we showed in the Introduction, if instead of using
quaternary encoding, we use Algorithm 1 to construct a binary
machine for A2 directly , we get N0 = 9 and N1 = 11 and thus
a machine with n = ⌈log211⌉+ 1= 5 stages.
(a) (b)
1 0
f0
f1
11 10 01 00
f00
f01
f10
f11
Fig. 4. (a) A quaternary 2-stage machine with the degree of parallelization
one. (b) The machine from (a) encoded as a binary 4-stage machine with the
degree of parallelization two.
Let us see whether we can reduce the number of stages even
more is we use 8-are encoding. We group the bits of A2 in
triples to get the following 8-ary sequence:
A8 = (1,5,6,2,7,3,0).
Note that we have added an extra 0 to A2 to make its length
a multiple of 3. Using the Algorithm 1 we can derive the
following sequence of integers S8 = (s0,s1, . . . ,s7) such that
s j mod 8 = a j for all j ∈ {0,1, . . . ,7}:
S8 = (1,5,6,2,7,3,0).
As we can see, S = A8, because none of the digits of A8 repeat
more than once. By the Theorem 1, we need n = ⌈log81⌉+
1 = 1 stage to implement this sequence by an 8-ary machine.
The updating function of this machine is defined is Figure 5.
By encoding the 8-ary 1-stage machine in binary, we get a
binary 3-stage machine with the updating functions defined in
Figure 6 which generates three bits of A2 per clock cycle. So,
we gained one more stage by using the 8-ary encoding.
Before presenting the main result of the paper, let us
formally define m-ary encodings.
Definition 1: For m = 2p, p > 0, an m-ary encoding of a
binary sequence A2 of length k is the m-ary sequence Am
of length ⌈k/p⌉ which is obtained from A2 by replacing
the consecutive p-tuples of bits of A2, (ai,ai+1, . . . ,ai+p−1),
i ∈ {0, p,2p, . . . ,⌈k/p⌉}, by the value ai ·mp−1+ai+1 ·mp−2+
. . .+ ai+p−1 ·m
0
. If k′ mod p 6= 0, then the length of A is
extended to the minimum k′ such that k′ mod p = 0 and
k′ > k. The appended bits are chosen so that the resulting
Nmax = maxi∈MNi is minimum.
The following theorems gives the lower bound on the
number of stages in binary machine with the degree of
parallelization p.
Theorem 2: Let A2 be a purely periodic binary sequence
with the period k. Any binary machine which generates A2
with the degree of parallelization p ≥ 1 has at least n stages,
where n is given by:
n = ⌈log2Nmax⌉+ p
where Nmax =maxi∈MNi and Ni is to the number of digits with
the value i in the m-ary encoding of A2, m = 2p.
x0 0 1 2 3 4 5 6 7
1 5 7 0 - 6 2 3
Fig. 5. Defining table for the updating function f0 of the 8-ary 1-stage
machine from the example.
Proof: Let m = 2p where p is the degree of parallelization,
p > 0. From the step 6 of the Algorithm 1 we can conclude
that, for each i ∈M, the largest integer si ∈ S such that si mod
m = i is equal to m(Ni − 1)+ i. We need ⌈log2Ni⌉+ p binary
digits to express this integer for any Ni > 0 Therefore, for
k > 1, the number of stages in the binary n-stage machine is
at most n ≤ ⌈log2Nmax⌉+ p where Nmax = maxi∈MNi.
To be able to generate p bits of A2 per clock cycle,
the binary n-stage machine must have at least Ni distinct
states whose p lest significant bits correspond to the binary
encoding of the value i. If A2 is purely periodic with the
period k, we need at least ⌈log2Ni⌉+ p binary stages to
implement the largest of these states for any Ni > 0. Therefore,
n ≥ ⌈log2Ni⌉+ p.
So, we can conclude that n = ⌈log2Nmax⌉+ p.
✷
The technique presented above opens a new possibility
for increasing the throughout of FSR-based binary sequence
generators. As we mentioned in Section III, at present, the
generation of p-bits of a sequence per clock cycle is usually
achieved by duplicating the combinatorial logic implementing
updating functions of the FSR p times [21], [22], [23].
As an example, consider the sequence A2 given by (2).
According to the Example V.1 in [9]1, the shortest non-linear
FSR in the Fibonacci configuration which can generate A2 has
7 stages and the following updating function of the stage 6:
f6 = x0x1 ⊕ x0x1⊕ x0x1⊕ x0x1x2x3⊕ x0x1x2x3
⊕x0x1x2x3⊕ x0x1x2x3x4x5x6⊕ x0x1x2x3x4x5x6.
The updating functions of the remaining stages of the NLFSR
are of type fi = xi+1, for i∈ {0,1, . . . ,5}. If we use the number
of 2-input XORs and ANDs as a measure of cost, then the cost
of f6 is 24 ANDs + 7 XORs.
On the other hand, as shown above, we can generate 3-bits
of A2 per clock cycle using the 3-stage binary machine with
the updating functions defined in Figure 6. We can express
these functions as follows:
f02 = x00x01 ⊕ x00x01x02
f01 = x00x01⊕ x00x02
f00 = x02⊕ x00x01.
In total, f02, f01 and f00 have 6 AND and 3 XORs. So, the cost
of generating 3 bits of A2 per clock cycle using this binary
3-stage machine is 3 binary stages of a register + 6 ANDs +
3 XORs.
Too make a crude comparison of the two costs, let us assume
that the costs of the 2-input AND and the 2-input XOR are 1,
1The sequence in the Example V.1 in [9] does not contain the last bit of
A2, but this does not change the updating functions of the NLFSR.
x02x01x00 000 001 010 011 100 101 110 111
0 1 1 0 0 1 0 0
Function f02(x00,x01 ,x02)
x02x01x00 000 001 010 011 100 101 110 111
0 0 1 0 0 1 1 1
Function f01(x00,x01 ,x02)
x02x01x00 000 001 010 011 100 101 110 111
1 1 1 0 0 0 0 1
Function f00(x00,x01 ,x02)
Fig. 6. Defining tables for the updating functions ( f02, f01, f00) representing
the binary encoding of the 8-valued function in Figure 5 for the case when
the don’t care is specified to 0.
and the cost of one stage of a register is 2. Then, the cost of
the NLFSR is 45, while the cost of the binary machine is 15.
So, the binary machine is not only 3 times faster, but also 3
times smaller.
VI. EXPERIMENTAL RESULTS
To evaluate the presented approach, we compared the areas
of binary machines, LFSRs and NLFSRs generating the same
sequence for 3 types of sequences: truly random, complemen-
tary, and Legendre. All experiments were run on a PC with
Intel dual-core 1.8 GHz processor and 2 Gbytes of memory.
The area was computed using ABC synthesis tool [26] by first
optimizing the circuits with resyn script and then by mapping
them with map. In the results reported below, 1 unit of area
is equal to the area of a 2-input NAND gate.
In the first set of experiments, for each n in the range 4 ≤
n≤ 16, we generated 20 truly random sequences of length 2n
using the method [27]. Columns 2-4 of Table I show the areas
of the resulting LFSRs, NLFSRs and binary machines (BM)
for the degree of parallelization one. Columns 5-7 of Table I
shows similar results for the degree of parallelization equal
to the number of stages in binary machines (which is always
less or equal to the number of stages in LFSRs and NLFSRs).
Each entry is an average for 20 sequences.
LFSRs are quite bad for generating truly random se-
quences.2 The number of their stages grows roughly as a
half of the sequence length. For NLFSRs, the number of
stages grows much slower. However, the combinatorial area
of parallel NLFSRs grows so fast that they become hard to
synthesize for random sequences longer than 256 bits. As we
can see from Table I, on average, the area of parallel binary
machines is an order of magnitude smaller than the area of
parallel LFSRs and NLFSRs.
Table II shows the results for complementary sequences.
Complementary sequences are a pair of sequences whose
2Note that there is a subset of pseudo-random sequences, called m-
sequences, for which LFSRs are extremely efficient. An n-stage LFSR with
a primitive polynomial of degree n generates an m-sequence of length 2n −1.
If the primitive polynomial has k non-zero terms, then to implement such an
LFSR with the degree of parallelization p, we need n stages and no more
than k ∗ p XORs. However, due to the linearity of LFSRs m-sequences they
are easy to reconstruct from a short segment.
aperiodic autocorrelation coefficients sum up to zero [28].
These sequences are known to have a tightly low peak-to-mean
envelope power ratio, good error detection capabilities, and
high nonlinearity [4]. They are recommended for orthogonal
frequency division multiplexing [4] and for multicarrier code
division multiple access systems [5] We can see that, on
average, parallel binary machines are an order of magnitude
smaller than parallel LFSRs and NLFSRs.
Table III shows the results for extended Legendre sequences.
Extended Legendre sequences are known to have the asymp-
totic merit factor of 6.3421, which is the highest of all known
families of sequences of an arbitrary length [6]. The higher the
merit factor of a sequence which is used to modulate a signal,
the more uniformly the signal energy is distributed over the
frequency range. This is important for spread-spectrum com-
munication systems, ranging systems, and radar systems [5],
[6]. Again, on average, parallel binary machines are an order
of magnitude smaller than parallel LFSRs and NLFSRs.
VII. CONCLUSION
In this paper, we present a method for constructing binary
machines with the minimum number of stages for a given
degree of parallelization. Our experimental results show that,
for sequences with high linear complexity, such as comple-
mentary, Legendre, or truly random sequences, parallel binary
machines are an order of magnitude smaller than parallel
LFSRs and NLFSRs generating the same sequence.
Our results can be beneficial for any application which
requires sequences with high spectrum efficiency or high
security, such as data transmission, wireless communications,
and cryptography.
VIII. ACKNOWLEDGMENTS
This work was supported in part by a research grant 621-
2010-4388 from the Swedish Research Council.
REFERENCES
[1] S. Golomb, Shift Register Sequences. Aegean Park Press, 1982.
[2] G. De Micheli, R. Brayton, and A. Sangiovanni-Vincentelli, “Optimal
state assignment for finite state machines,” Computer-Aided Design of
Integrated Circuits and Systems, IEEE Transactions on, vol. 4, pp. 269
– 285, july 1985.
[3] K. Zeng, C. Yang, D. Wei, and T. R. N. Rao, “Pseudo-random bit
generators in stream-cipher cryptography,” Computer, 1991.
[4] J. Davis and J. Jedwab, “Peak-to-mean power control in ofdm, golay
complementary sequences, and reed-muller codes,” IEEE Transactions
on Information Theory, vol. 45, pp. 2397 –2417, Nov. 1999.
[5] B. Popovic, “Spreading sequences for multicarrier cdma systemscom-
plementary series,” IEEE Transactions on Communications, vol. 47,
pp. 918–926, June 1999.
[6] R. Kristiansen and M. Parker, “Binary sequences with merit factor >
6.3,” IEEE Transactions on Information Theory, vol. 50, pp. 3385–3389,
Dec. 2004.
[7] A. Juels, “RFID security and privacy: a research survey,” Selected Areas
in Communications, IEEE Journal on, vol. 24, pp. 381–394, Feb. 2006.
[8] T. Good and M. Benaissa, “ASIC hardware performance,” New Stream
Cipher Designs: The eSTREAM Finalists, LNCS 4986, pp. 267–293,
2008.
[9] K. Limniotis, N. Kolokotronis, and N. Kalouptsidis, “On the nonlinear
complexity and Lempel-Ziv complexity of finite length sequences,”
IEEE Transactions on Information Theory, vol. 53, no. 11, pp. 4293–
4302, 2007.
Degree of parallelization = 1 Degree of parallelization = stages in BM Improvement
Sequence LFSRs NLFSRs BM LFSRs NLFSRs BM a4
a6
a5
a6length a1 a2 a3 a4 a5 a6
24 47.35 32.93 72.38 86.3 98.68 20.85 4.73 4.14
25 104.33 53.4 153 249.48 257.85 41.08 6.28 6.07
26 218.85 86.15 340 654.65 1007.62 79.03 12.75 8.28
27 449.03 136.18 724.5 1501.53 4081.98 151.33 26.97 9.92
28 885.85 236.65 1600.1 3715.9 26638.6 371.43 71.72 10
29 1910.28 407.3 3258.6 8707.73 - 859.18 - 10.13
210 3889.13 757.95 7306.78 22727.68 - 1759.3 - 12.92
211 8540.27 1399.75 15057.5 - - 3588.9 - -
212 15664.25 2567.45 30128.6 - - 7777.03 - -
213 30208.38 4765.86 58946.55 - - 15719.8 - -
214 - 8817.89 114325.91 - - 32981.89 - -
215 - 16084.3 219473.62 - - 63694.7 - -
216 - - 419118.45 - - 123947.6 - -
TABLE I
AREA RESULTS FOR RANDOM SEQUENCES (AVERAGE FOR 20 SEQUENCES); ’-’ STANDS FOR TIME OUT TO COMPUTE THE RESULT (15 MIN).
Degree of parallelization = 1 Degree of parallelization = stages in BM Improvement
Sequence LFSRs NLFSRs BM LFSRs NLFSRs BM a4
a6
a5
a6length a1 a2 a3 a4 a5 a6
24 49 34 81.5 115 155 20.5 5.61 7.56
25 105 54.5 164 241 411.5 48.5 4.97 8.48
26 279 92.5 347.5 782 6347.5 91.5 8.55 69.37
27 493 - 707 1747 - 165.5 10.56 -
28 1093 - 1486.5 4556 - 470 9.69 -
29 2161 - 2737 12531 - 909.5 13.78 -
210 4509 - 6348.5 34660 - 1865.5 18.58 -
211 9097 - 11269 82954 - 3874 21.41 -
212 19379 - 23073.5 - - 8324.5 - -
213 36951 - 39905 - - 13888 - -
214 74089 - 80422.5 - - 22720.5 - -
215 - - 140433 - - 43094.5 - -
216 - - 292710.5 - - 82670 - -
TABLE II
AREA RESULTS FOR COMPLEMENTARY SEQUENCES; ’-’ STANDS FOR TIME OUT TO COMPUTE THE RESULT (15 MIN).
Degree of parallelization = 1 Degree of parallelization = stages in BM Improvement
Sequence LFSRs NLFSRs BM LFSRs NLFSRs BM a4
a6
a5
a6length a1 a2 a3 a4 a5 a6
17 42 33 68.5 84 110 19 4.42 5.79
31 97 44.5 146.5 281 192.5 31.5 8.92 6.11
61 231.5 83.5 311 667.5 1248.5 89.5 7.46 13.95
127 482 136.5 640.5 1901 10157.5 180.5 10.53 56.27
257 833 248 1357.5 3115 20787 247 12.61 84.16
557 2144.5 408 2900 9629.5 - 862.5 11.16 -
1021 3796 733 6779 19369 - 1906 10.16 -
2053 8016.5 1356.5 13080 47263.5 - 3652 12.94 -
4099 16358 2596.5 25491.5 - - 7654 - -
8233 33930.5 - 50691 - - 16211 - -
10223 42422 - 71780 - - 20160.5 - -
16127 63012 - 116037 - - 32423.5 - -
TABLE III
AREA RESULTS FOR EXTENDED LEGENDRE SEQUENCES; ’-’ STANDS FOR TIME OUT TO COMPUTE THE RESULT (15 MIN).
[10] E. Dubrova, “Multiple-valued logic synthesis and optimization,” in Logic
Synthesis and Verification, Eds.: S. Hassoun and T. Sasao, (Kluwer
Academic Publishers), pp. 89–114, 2002.
[11] R. K. Brayton, C. McMullen, G. Hatchel, and A. Sangiovanni-
Vincentelli, Logic Minimization Algorithms For VLSI Synthesis. Kluwer
Academic Publishers, 1984.
[12] A. Lempel and W. L. Eastman, “High speed generation of maximal
length sequences,” IEEE Trans. Comput., vol. 20, pp. 227–229, February
1971.
[13] N. Zierler, “Linear recurring sequences,” Journal of the Society for
Industrial and Applied Mathematics, vol. 2, pp. 31–48, 1959.
[14] J. L. Massey, “Shift-register synthesis and BCH decoding,” IEEE Trans-
actions on Information Theory, vol. 15, pp. 122–127, 1969.
[15] G. L. Feng and K. K. Tzeng, “Algorithm for multisequence shift-register
synthesis with applications to decoding cyclic codes,” IEEE Transactions
on Information Theory, vol. 37, no. 5, pp. 1274–1287, 1991.
[16] I. Goldberg and D. Wagner, “Architectural considerations for cryptan-
alytic hardware,” tech. rep., Secrets of Encryption Research, Wiretap
Politics and Chip Design, 1998.
[17] S. Mukhopadhyay and P. Sarkar, “Application of LFSRs for parallel
sequence generation in cryptologic algorithms,” in Computational Sci-
ence and Its Applications - ICCSA 2006, vol. 3982 of Lecture Notes in
Computer Science, pp. 436–445, Springer Berlin / Heidelberg, 2006.
[18] C. J. A. Jansen, “The maximum order complexity of sequence ensem-
bles,” in Proceedings of the 10th International conference on Theory
and application of cryptographic techniques, EUROCRYPT’91, (Berlin,
Heidelberg), pp. 153–159, Springer-Verlag, 1991.
[19] P. Rizomiliotis and N. Kalouptsidis, “Results on the nonlinear span of
binary sequences,” IEEE Transactions on Information Theory, vol. 51,
no. 4, pp. 1555–1563, 2005.
[20] P. Rizomiliotis, N. Kolokotronis, and N. Kalouptsidis, “On the quadratic
span of binary sequences,” IEEE Transactions on Information Theory,
vol. 51, no. 5, pp. 1840–1848, 2005.
[21] M. Hell, T. Johansson, and W. Meier, “Grain - a stream cipher for
constrained environments,” citeseer.ist.psu.edu/732342.html.
[22] C. D. Canniere and B. Preneel, “TRIVIUM specifications,” cite-
seer.ist.psu.edu/734144.html.
[23] B. Gittins, H. A. Landman, S. O’Neil, and R. Kelson, “A presenta-
tion on VEST hardware performance, chip area measurements, power
consumption estimates and benchmarking in relation to the aes, sha-
256 and sha-512.” Cryptology ePrint Archive, Report 2005/415, 2005.
http://eprint.iacr.org/ .
[24] E. Dubrova, “Synthesis of binary machines,” IEEE Transactions on
Information Theory, 2011, to appear.
[25] E. Dubrova, “A transformation from the Fibonacci to the Galois NLF-
SRs,” IEEE Transactions on Information Theory, vol. 55, pp. 5263–
5271, November 2009.
[26] Berkeley Logic Synthesis and Verification Group, “ABC: A system for
sequential synthesis and verification, release 70930,” 2007.
[27] D. Rijmenants, “One-time pad,” 2011.
[28] M. J. E. Golay, “Complementary series,” IRE Transactions on Informa-
tion Theory, vol. 7, pp. 82–87, Oct. 1961.
