An Algorithm for Constructing a Smallest Register with Non-Linear Update
  Generating a Given Binary Sequence by Li, Nan & Dubrova, Elena
ar
X
iv
:1
30
6.
55
96
v1
  [
cs
.IT
]  
24
 Ju
n 2
01
3
1
An Algorithm for Constructing a Smallest Register
with Non-Linear Update Generating a Given Binary
Sequence
Nan Li, Student Member, IEEE and Elena Dubrova, Member, IEEE
Abstract—Registers with Non-Linear Update (RNLUs) are a generalization of Non-Linear Feedback Shift Registers (NLFSRs) in which both,
feedback and feedforward, connections are allowed and no chain connection between the stages is required. In this paper, a new algorithm for
constructing RNLUs generating a given binary sequence is presented. Expected size of RNLUs constructed by the presented algorithm is proved to
be O(n/ log2(n/p)), where n is the sequence length and p is the degree of parallelization. This is asymptotically smaller than the expected size of
RNLUs constructed by previous algorithms and the expected size of LFSRs and NLFSRs generating the same sequence. The presented algorithm
can potentially be useful for many applications, including testing, wireless communications, and cryptography.
Index Terms—Binary sequence, LFSR, NLFSR, binary machine, circuit-size complexity, BIST.
✦
1 INTRODUCTION
Binary sequences are important for many areas, including
cryptography, wireless communications, and testing.
In cryptography, pseudo-random binary sequences are used
in stream cipher-based encryption. A stream cipher produces
a keystream by combining a pseudo-random sequence with
a message, usually by the bit-wise addition [1]. The security
of stream ciphers is directly related to statistical properties
of pseudo-random sequences. At present, there is no secure
method for generating pseudo-random sequences which satisfy
the extreme limitations of technologies like RFID. Low-cost
RFID tags cannot dedicate more than a few hundreds of
gates for security functionality [2]. Even the most compact
of today’s encryption systems contain over 1000 gates [3].
The lack of adequate protection mechanisms gives rise to
many security problems and blocks off a variety of potential
applications of RFID technology.
In wireless communications, pseudo-random sequences are
used for scrambling and spreading of the transmitted signal.
Scrambling is performed to give a transmitted signal some
useful engineering properties, e.g. to reduce the probability
of interference with adjacent channels or to simplify timing
recovery at the receiver [4]. Spreading increases a bandwidth
of the original signal making possible to maintain, or even
increase, communication performance when signal power is
below the noise floor [5]. For both, scrambling and spreading,
it is important to select pseudo-random sequences carefully,
because their length, bit rate, correlation and other properties
determine the capabilities of the resulting systems. Today’s
wireless communication systems typically use Linear Feed-
back Shift Register (LFSR) sequences, or sequences obtained
by linearly combining pairs of LFSR sequences, such as Gold
codes [6]. There are many theoretical results demonstrating the
The authors are with the Royal Institute of Technology (KTH), Stockholm,
Sweden.
advantages of using nonlinear sequences in wireless communi-
cations. For example, complementary sequences can solve the
notorious problem of power control in Orthogonal Frequency
Division Multiplexing (OFDM) systems by maintaining a
tightly bounded peak-to-mean power ratio [7]. Popovich [8]
has shown that multi-carrier spread spectrum systems using
complementary and extended Legendre sequences outperform
the best corresponding multi-carrier Code Division Multiple
Access (CDMA) system using Gold codes. However, due to
the lack of efficient hardware methods for generating nonlinear
sequences, their theoretical advantages cannot be utilized at
present.
Built-In-Self-Test (BIST) uses the pseudo-random binary
vectors usually generated on-chip by an LFSR as test pat-
terns [9]. The hardware cost of an LFSR-based BIST is low.
However, the test time of BIST may be long due to random-
pattern resistant faults. Several methods for coping with these
faults have been proposed, including modification of the circuit
under test [10], insertion of control and observe points into
the circuit [11], modification of the LFSR to generate a
sequence with a different distribution of 0s and 1s [12], and
generation of top-off test patterns for random-pattern resistant
faults using some deterministic algorithm and storing them
in a Read-Only Memory (ROM) [13]. The latter approach
can help detecting not only random-pattern resistant faults,
but also delay faults which are not handled efficiently by
the pseudo-random patterns. However, the memory required
to store the top-off patterns in BIST can exceed 30% of the
memory used in a conventional ATPG approach [14]. Finding
alternative ways of generating top-off patterns is an important
open problem.
Any binary sequence can be generated using a Register with
Non-Linear Update (RNLU) shown in Figure 1a. A k-stage
RNLU consists of k binary stages, k updating functions, and
a clock. At each clock cycle, the current values of all stages
are synchronously updated to the next values computed by the
updating functions. RNLUs can be viewed as a more general
2type of Non-Linear Feedback Shift Registers (NLFSRs) (see
Figure 1b) in which both, feedback and feedforward, connec-
tions are allowed and no chain connection between the stages
is required.
RNLUs are typically smaller and faster than NLFSRs gen-
erating the same sequence. For example, consider the 4-stage
NLFSR with the updating function
f (x0,x1,x2,x3) = x0⊕ x3⊕ x1 · x2⊕ x2 · x3,
where “⊕” is the Boolean exclusive-OR, “·” is the Boolean
AND, and xi is the variable representing the value of the
stage i, i ∈ {0,1,2,3}. If this NLFSR is initialized to the state
(x3x2x1x0) = (0001), it generates the output sequence
(1,0,0,0,1,1,0,1,0,1,1,1,1,0,0) (1)
with the period 15. The same sequence can be generated by
the 4-stage RNLU with the updating functions
f3(x0,x3) = x0⊕ x3
f2(x1,x2,x3) = x3⊕ x1 · x2
f1(x2) = x2
f0(x1) = x1.
We can see that the RNLU uses 3 binary operations, while the
NLFSR uses 5 binary operations.
While RNLUs can potentially be smaller than NLFSRs,
the search space for finding a smallest RNLU for a given
sequence is considerably larger than the corresponding one
for NLFSRs. Algorithms for constructing RNLUs with the
minimum number of stages were presented in [15], [16].
However, since, for large k, the size of a circuit implementing
a k-input Boolean function is typically much larger than the
size of a single stage of a register, usually these algorithms do
not minimize the total size of an RNLU.
In this paper, we present an algorithm which minimizes
the size of the support set of updating functions, i.e. the
number of variables on which the updating functions depend.
For most Boolean functions, the size of a circuit computing a
function grows exponentially with the number of the variables
in their support set [17]. Therefore, by reducing the number
of variables of updating functions to the minimum, we can
minimize the total size of an RNLU. To support this claim,
we derive expressions for the expected size of RNLUs con-
structed by the presented method and previous approaches.
Our analysis shows that RNLUs constructed by the presented
method are asymptotically smaller. For completeness, we
also compare RNLUs to linear and nonlinear feedback shift
registers generating the same sequence.
The rest of this paper is organized as follows. Section 2
lists the notation and basic concepts used in the paper. Sec-
tion 3 discusses the related work. Section 4 gives a general
introduction to the presented approach. Section 5 describes
the algorithm for constructing RNLU. Section 6 compares
RNLUs constructed by the presented method to the RNLUs
constructed using previous approaches, as well as to linear
and nonlinear feedback shift registers. Section 7 presents the
experimental results. Section 8 concludes the paper.
...
...
fk-1
k-1 k-2 0
fk-2 f0
(a) An RNLU with the degree of parallelization one.
...
k-1 k-2 0
f
(b) An NLFSR with the degree of parallelization one.
Fig. 1: General structure of RNLUs and NLFSRs.
2 PRELIMINARIES
In this section, we present basic definitions and notation used
in the paper.
2.1 Boolean functions
A k-variable Boolean function is a mapping of type f : Bk →
B, where B = {0,1}. The support set of a Boolean function
f (x0,x1, · · · ,xk−1), sup( f ), is a set of variables on which f
depends:
sup( f ) = {xi | f |xi=0 6= f |xi=1},
where f |xi= j = f (x0, · · · ,xi−1, j,xi+1, · · · ,xk−1), for j ∈ {0,1}.
A k-variable Boolean function f can be computed by a
logic circuit with k inputs and one output, such that, for every
input combination a ∈ Bk, the circuit output is f (a). The size
of a circuit is the number of gates required to implement it.
Typically gates are restricted to a certain set, e.g. {AND, OR,
NOT} [18].
2.2 Registers with Non-Linear Update
A k-stage Register with Non-Linear Update (RNLU) (also
called binary machine [15], [19]) consists of k binary storage
elements, called stages, each capable of storing one bit of
information. Every stage i∈ {0,1, · · · ,n−1} has an associated
state variable xi ∈ {0,1} which represents the current value
of the stage i and a Boolean updating function fi : {0,1}k →
{0,1} which determines how the value of xi is updated to its
next value, x+i :
x+i = fi(x0,x1, · · · ,xk−1).
A state of an RNLU is a vector of values of its state
variables. At every clock cycle, the next state of an RNLU
is computed from its the current state by updating the values
of all stages simultaneously to the values of the corresponding
updating functions.
The degree of parallelization p of a k-stage RNLU is the
number of stages used for producing the output at each clock
cycle, 1 ≤ p ≤ k. Throughout the paper, we assume that p
rightmost stages of RNLU are used for producing its output.
32.3 Feedback Shift Registers
A k-stage Feedback Shift Register (FSR) can be viewed as a
special case of a k-stage RNLU satisfying
x+0 = x1
x+1 = x2
· · ·
x+k−2 = xk−1
x+k−1 = f (x0,x1, · · · ,xk−1)
The updating function of the stage k−1 is called the feedback
function of the FSR.
If all feedback functions of an FSR are linear, then the FSR
is called a Linear Feedback Shift Register (LFSR). Otherwise,
it is called a Non-Linear Feedback Shift Register (NLFSR).
Its is known that the recurrence relation generated by the
feedback function of a k-stage LFSR has a characteristic
polynomial of degree k [19]. If this polynomial is primitive 1,
then the LFSR follows a periodic sequence of 2k − 1 states
which consists of all possible non-zero k-bit vectors [19].
This result is very important, because it makes possible the
generation of pseudo-random sequences of length 2k−1 with
a device of size O(k). No analogous results has been found
for the nonlinear case yet.
3 PREVIOUS WORK
There are many different ways of generating binary sequences.
A thorough treatment of this topic is given by Knuth in [21].
In this section, we focus on FSR-based binary sequence
generators and their generalizations.
LFSRs are one of the most popular devices for generating
pseudo-random binary sequences. They have numerous ap-
plications, including error-detection and correction [22], data
compression [23], testing [24], and cryptography [25].
The Berlekamp-Massey algorithm can be used to construct a
smallest LFSR generating a given binary sequence. It was orig-
inally invented by Berlekamp for decoding Bose-Chaudhuri-
Hocquenghem (BCH) codes [26]. Massey [27] linked the
Berlekamp’s algorithm to LFSR synthesis and simplified it.
There were many subsequent extensions and improvements
of the algorithm, for example Mandelbaum [28] developed
its arithmetic analog, Imamura and Yoshida [29] presented an
alternate and easier derivation, Fitzpatrick [30] found a version
which is more symmetrical in its treatment of the iterated pairs
of polynomials, and Fleischmann [31] modified it to extend
the model sequence in both directions around any given data
bit. It has also been shown that similar to the Berlekamp-
Massey algorithm results can be obtained with the Euclidean
algorithm [32] and continued fractions [33].
The Berlekamp-Massey algorithm constructs traditional LF-
SRs, which generate one output bit per clock cycle. A number
of techniques have been developed for constructing LFSRs
with the degree of parallelization p. Two main approaches
are: (1) synthesis of subsequences representing p decimation
of some phase shift of the original LFSR sequence [34] and
(2) computation of the set of states reachable from any state
1. An irreducible polynomial of degree k is called primitive if the smallest
m for which it divides xm +1 is equal to 2k −1 [20].
in p steps. The latter is usually done by computing pth power
of the connection matrix of the LFSR [25]. LFSRs with a high
degree of parallelization are used in applications where high
data rate is important, such a Cyclic Redundancy Check (CRC)
widely used in data transmission and storage for detecting
burst errors [22].
NLFSRs have been much less studied compared to LF-
SRs [35]. The first algorithm for constructing a smallest
NLFSR generating a given binary sequence was presented by
Jansen in 1991 [36], [37]. Alternative algorithms were given
by Linardatos et al [38], Rizomiliotis et al [39], and Limniotis
et al [40].
Similarly to the LFSR case, an NLFSR can be re-designed
to generate p bits of the sequence per clock cycle. This
is usually done by duplicating the updating functions of an
NLFSR p times, as in [41]–[43]. Such a technique requires
that the p left-most stages of the NLFSR are not used as inputs
to feedback functions or output functions. More generally,
the problem of constructing an NLFSR with the degree of
parallelization p can be solved by computing the pth power
of the transition relation induced by its feedback functions.
However, the size of circuits computing the pth power of the
transition relation may grow substantially larger than a factor
of p [44].
An FSR may need up to n stages to generate a binary
sequence of length n. For example, the smallest LFSR and
NLFSR generating the binary sequence
00 · · ·0
︸ ︷︷ ︸
n−1
1,
have n and n− 1 stages, respectively [36].
On average, an LFSR needs n/2 stages to generate a binary
sequence of length n [45] and an NLFSR needs 2 log2 n stages
to generate such a sequence [36]. Note that these bounds
reflect the size of stages only; they do not take into account the
size of circuits computing feedback functions. Since nonlinear
feedback function of an NLFSR is typically larger than the
linear feedback function of an LFSR, a k-stage NLFSR may
be considerably larger than a k-stage LFSR.
The first algorithm for constructing an RNLU with the
minimum number of stages for a given binary sequence was
presented in [15]. This algorithm exploits the unique property
of RNLUs that any binary n-tuple can be the next state of
a given current state. The algorithm assigns every 0 of a
sequence a unique even integer and every 1 of a sequence
a unique odd integer. Integers are assigned in an increasing
order starting from 0. For example, if an 8-bit sequence
A = (0,0,1,0,1,1,0,1) is given, the sequence of integers
(0,2,1,4,3,5,6,7) can be used. This sequence of integers is
interpreted as a sequence of states of an RNLU. The largest
integer in the sequence of states determines the number of
stages. In the example above, ⌈log2 7⌉= 3, thus the resulting
RNLU has 3 stages.
In [16], the algorithm [15] was extended to RNLUs gener-
ating p bits of the output sequence per clock cycle. The main
idea is to encode a binary sequence into an 2p-ary sequence
which can be generated by a smaller RNLU. As an example,
suppose that we use the 4-ary encoding (00) = 0,(01) =
4Extra
Bits
Sequence
Bits
Updating Functions
Output
Fig. 2: Structure of RNLUs constructed by the presented
algorithm.
1,(10) = 2,(11) = 3 to encode the binary sequence A from the
example above, into the quaternary sequence (0,2,3,1). Then,
we can construct an RNLU generating the sequence A 2-bits
per clock cycle using a sequence of states (0, 2, 3, 1). Note that
⌈log2 3⌉ = 2, so the resulting RNLU has one stage less than
the RNLU generating one bit per clock cycle in the previous
example.
RNLUs have been successfully applied to the storage of
cryptographic keys [46] and deterministic test patterns [47].
For example, it was shown in [46] that an RNLU may take
less than a quarter of the size of a read-only memory storing
the same sequence.
4 INTUITIVE IDEA
We can separate each state of a k-stage RNLU with the degree
of parallelization p into two parts: p output bits which contain
the output sequence and k − p extra bits which are used
for differentiating the states whose output bits are the same.
Output bits are defined by the sequence to be generated. For
the extra bits, we can use any k− p bit vector that is not used
in another state with the same output bits.
As we mentioned previously, the overall size of an RNLU
is typically dominated by the size of circuits computing its
updating functions. The size of these circuits greatly depends
on the support sets of updating functions. In order to minimize
the support sets, we use extra bit vectors which are unique for
every specified state. In other words, not only the states with
the same output bits, but also all other specified states are
assigned a unique (k− p)-bit extra bit vector. Such a state
encoding allows us to reduce the support sets of updating
functions to variables representing extra bits only, as shown
in Figure 2.
Suppose we would like to construct an RNLU generating
a binary sequence A of length m × p with the degree of
parallelization p. In order to distinguish between identical p-
bit vectors in A, we need at least ⌈log2 m⌉ extra bits. Therefore,
the number of stages in the resulting RNLU is given by:
k = ⌈log2 m⌉+ p.
This number is typically greater than the minimum possible
number of stages in an RNLU which can generate A. The
minimum number of stages is determined by partitioning
A into p-bit vectors, computing the decimal representation
for each p-bit vector, and counting the largest number of
occurrences among all p-bit vectors with the same decimal
representation, Nmax. For example, in the 10-bit sequence
A = (0,1,0,0,0,1,1,1,0,1) the 2-bit vector (0,1) occurs 3
times, so Nmax = 3. The minimum number of stages in an
RNLU generating A is given by [16]:
kmin = ⌈log2 Nmax⌉+ p. (2)
The presented method reduces the support sets of the
updating functions to the minimum. Updating functions of
output bits cannot depend on less than ⌈log2 m⌉ variables since
otherwise the RNLU would not be able to generate all ⌈n/p⌉
p-bit vectors constituting a partitioning of A.
Note that the size of an RNLU can be further reduced by
removing the stages representing output bits and taking the
output directly from the updating functions.
5 ALGORITHM
In this section, we present an algorithm for constructing RN-
LUs which minimizes the support sets of updating functions
to ⌈log2 m⌉ variables representing extra bits.
The pseudocode of the algorithm ConstructRNLU(A, p) is
shown as Algorithm 1. The input is a binary sequence A =
(a0,a1, · · · ,an−1) and the desired degree of parallelization p.
The output is the defining tables of p+ r updating functions
of the RNLU generating A with the degree of parallelization
p, where r = ⌈log2 m⌉ and m = ⌈n/p⌉.
The algorithm begins by selecting an r-stage extra bits
generator G using the procedure ChooseGenerator(n,r). As
we mentioned in the previous section, the size of an RNLU
depends on the order of extra bit vectors used for state
encoding. In principle, any permutation of r-bit vectors can
be used, however, a good choice of the generator reduces the
size of the resulting RNLU. For example, if we use an r-stage
LFSR or a binary counter as generators of extra bit vectors,
then the updating functions of extra bits can be computed by
a circuit of size O(r).
The selected generator G is set to some initial state g0 ∈ Br.
For LFSRs, g0 must be a non-zero state. For binary counters,
g0 can be any state. Then, the defining table of updating
functions of output bits is constructed as follows. At every step
i, i ∈ {0,1, · · · ,m− 1}, the input part of the table is assigned
to be the current state of the generator G, gi, and the output
part of the table is assigned to be the ith p-bit vector of the
input sequence A.
All remaining 2r −m input assignments are mapped to
don’t-care values. This gives us a possibility to specify the
functions f0, f1, · · · , fp−1 so that the size of their circuits is
minimized.
Since, by construction, the values of functions
f0, f1, · · · , fp−1 at step i correspond to the ith p-tuple of
A, for i ∈ {0,1, · · · ,m− 1}, the resulting RNLU generates A
with the degree of parallelization p.
As an example, let us construct an RNLU which generates
the following 40-bit binary sequence with the degree of
parallelization 4:
5Algorithm 1 ConstructRNLU(A, p) Constructs an RNLU
generating a binary sequence A = (a0,a1, · · · ,an−1) with the
degree of parallelization p.
1: m = ⌈n/p⌉;
2: r = ⌈log2 m⌉;
3: G = ChooseGenerator(m,r);
4: Initialize G to an initial state g0 ∈ Br;
5: for every i from 0 to m− 1 do
6: for every j from 0 to p− 1 do
7: f j(gi) = ai∗p+ j;
8: end for
9: gi+1 = ComputeNextState(G,gi );
10: end for
11: for every i from 0 to r− 1 do
12: fp+i = updating function of the stage i of G;
13: end for
14: Return f0, f1, · · · , fp+r−1;
A = (1,0,0,1,0,0,1,0,0,0,1,1,0,0,1,0,1,0,1,0,1,0,1,0,
0,0,0,1,1,0,0,0,0,1,1,0,1,1,1,0)
We need r = ⌈log2 10⌉ = 4 extra bits to assign to each of
the 10 4-bit vector of A a unique extra bit vector. Suppose
that we use the 4-stage LFSR with the primitive generator
polynomial g(x) = 1+ x+ x4 for generating extra bits. If we
choose (0001) as the initial state of the LFSR, then extra bit
vectors are assigned according to the following sequence of
LFSR states:
(1,8,4,2,9,12,6,11,5,10).
This gives us the following defining table for the updating
functions of output bits:
x7x6x5x4 f3 f2 f1 f0
0 0 0 1 1 0 0 1
1 0 0 0 0 1 0 0
0 1 0 0 1 1 0 0
0 0 1 0 0 1 0 0
1 0 0 1 0 1 0 1
1 1 0 0 0 1 0 1
0 1 1 0 1 0 0 0
1 0 1 1 0 0 0 1
0 1 0 1 0 1 1 0
1 0 1 0 0 1 1 1
These functions can be implemented as follows:
f3(x7,x6,x5,x4) = x7(x5 + x6)(x4 + x5 + x6)
f2(x7,x6,x5,x4) = (x7 +(x5⊕ x6))(x4 + x5 + x6 + x7)
f1(x7,x6,x5,x4) = (x4 + x7)(x6 + x7)(x5x6 + x4x5)
f0(x7,x6,x5,x4) = x4x7 +(x7⊕ x5x6)
where “+” is the Boolean OR and x denotes the Boolean
complement of x.
Algorithm 2 ChooseGenerator(m,r) Chooses an r-stage
generator of extra bits with at least m states.
1: if m < 2r then
2: G = Any r-stage LFSR with a primitive generator
polynomial of degree r;
3: else
4: G = r-stage binary counter;
5: end if
6: Return G;
Fout
7 6 5 4 3 2 1 0
Outputs
Fig. 3: 8-stage RNLU constructed for the example.
The updating functions of extra bits, f7, f6, f5, f4 are defined
by the LFSR:
f7(x4,x5) = x4⊕ x5
f6(x7) = x7
f5(x6) = x6
f4(x5) = x5
Figure 3 shows the structure of the resulting RNLU. The
block labeled by Fout computes the updating functions of
output bits f3, f2, f1, f0.
6 EXPECTED SIZE ANALYSIS
In this section, we derive expressions for the expected size
of RNLUs constructed using the presented algorithm and the
algorithms [15] and [16]. For completeness, we also show
results for LFSRs and NLFSRs generating the same sequence.
In 1942, Shannon [17] has proved that there is an (asymp-
totically) large fraction of Boolean functions of k variables that
remains uncomputable with circuits of size larger than 2k/k.
In 1962, Lupanov [48] has shown that, if we allow circuit size
to be larger by a small fraction of 2k/k, namely [1+o(1)]2k/k,
then we can compute all k-variable Boolean functions. In both
cases, it is assumed that circuits are composed from AND, OR
and NOT gates with at most two inputs.
From these two bounds, we can conclude that “most”
Boolean function of k variables require a circuit of size α2k/k
to be computed, where α is a constant such that 1 ≤ α ≤ 2.
In the analysis below, we assume one storage element counts
as β gates. Since the analysis is asymptotic, without the loss
of precision we use log2 n instead of ⌈log2 n⌉.
6.1 Degree of Parallelization One
Let A be a binary sequence of length n in which every element
is selected independently and uniformly at random from B.
6Throughout this section, we call such a sequence a random
sequence. Suppose that Algorithm 1 is used to construct an
RNLU generating A with the degree of parallelization one.
Then, the resulting RNLU has:
• one stage for the output bit,
• log2 n stages for extra bits,
• log2 n updating functions of the extra bits,
• one updating function of the output bit.
The updating functions of the extra bits can be computed
by a circuit of size O(log2 n). The updating function f0 of the
output bit is expected to depend on all log2 n state variables
of extra bits. This is because the probability that f0|xi=0 =
f0|xi=1 for some i ∈ {1,2, · · · ,(log2 n)− 1} goes to 0 as the
sequences length increases. Therefore, f0 requires a circuit of
size αn/ log2 n to be computed. So, the expected size of the
RNLU constructed by the presented algorithm is
E[RNLU(n,1)] = β(1+ log2 n)+αn/ log2 n+O(log2 n)
= O(n/ log2 n). (3)
Next, suppose that the algorithm [15] is used to construct
an RNLU for the same sequence. This algorithm constructs an
RNLU with the minimum number of stages kmin given by (2).
For sufficiently large random sequences, this number can be
approximated as:
kmin ≈ 1+ log2(n/2) = log2 ns.
In this case, the resulting RNLU has kmin stages and kmin
updating functions with the support set of size kmin. These
functions required kmin circuits of size α2kmin/kmin to be
computed, so their expected size is given by:
kmin ·α2kmin/kmin = α2log2 n = αn.
Therefore, the expected size of the RNLU constructed by the
algorithm [15] is:
E[RNLU(n,1)] = αn+β log2 n = O(n). (4)
Next, suppose that Berlekamp-Massey algorithm [27] is
used to construct an LFSR for the same sequence. Suppose
that this LFSR has l stages. According to [45], for sufficiently
large random sequences, l ≈ n/2. The linear feedback function
of the LFSR can be computed by a circuit of size O(n). So,
the expected size of the LFSR is
E[LFSR(n,1)] = βn/2+O(n) = O(n). (5)
Finally, suppose an r-stage NLFSR is constructed of the
same sequence, e.g. using the algorithm [38]. According
to [36], for sufficiently large random sequences, r ≈ 2log2 n.
Thus, the feedback function of the NLFSR has the support set
of size 2 log2 n. It requires a circuit of size α ·22 log2 n/(2log2 n)
to be computed. Therefore, the expected size of the NLFSR
is
E[NLFSR(n,1)] = 2β log2 n+α ·22 log2 n/(2log2 n)
= 2β log2 n+αn2/(2log2 n)
= O(n2/ log2 n). (6)
As we can see from equations (3), (4), (5), and (6), for
sufficiently large random sequences, RNLUs with the degree
of parallelization one constructed by the presented algorithm
are asymptotically smaller than RNLUs constructed by the
algorithm [15], LFSRs, and NLFSRs.
6.2 Degree of Parallelization p
In this section, we extend the analysis to the degree of
parallelization p.
Let A be a random binary sequence of length n. Suppose
that Algorithm 1 is used to construct an RNLU generating A
with the degree of parallelization p. Let m = ⌈n/p⌉. Then this
RNLU has:
• p stages for the output bits,
• log2 m stages for extra bits,
• log2 m updating functions of the extra bits,
• p updating functions of the output bits.
The updating functions of the extra bits can be computed
by a circuit of size O(log2 m). Each of the p updating
functions of the output bits is expected to depend on all
log2 m state variables of extra bits. This is because, for any
j ∈ {0,1, · · · , p− 1}, the probability that f j|xi=0 = f j|xi=1 for
some i ∈ {p, p + 1, · · · ,(p + log2 m)− 1} goes to 0 as the
sequences length increases. Therefore, the updating functions
of output bits require p circuits of size αm/ log2 m to be
computed. Thus, the expected size of the RNLU constructed
by the presented algorithm is
E[RNLU(n, p)] = β(p+ log2 m)+ pαm/ log2 m+O(log2 m)
= O(n/ log2 m)
= O(n/ log2 (n/p)). (7)
Suppose that the algorithm [16] is used to construct an
RNLU for the same sequence. The number of stages kmin is
given by (2). Since 1 ≤ Nmax ≤ m, we get
p ≤ kmin ≤ p+ log2 m.
The lower bound is reached when each p-bit vector occurs in
A exactly once. This is possible only if n ≤ 2p. Therefore
log2 n ≤ kmin ≤ p+ log2 m. (8)
The kmin updating functions require kmin circuits of size
α2kmin/kmin to be computed, so their expected size is α2kmin .
From (8), we get:
αn ≤ α2kmin ≤ αm2p
Therefore, the lower bound on expected size of the RNLU
constructed by the algorithm [16] is:
E[RNLU(n, p)] ≥ β log2 n+αn
≥ O(n). (9)
An LFSR with the degree of parallelization p has the
same number of stages as the LFSR with the degree of
parallelization one, but its feedback function is modified to
compute pth power of the connection matrix. This implies
that the expected size of the circuit computing the feedback
7function of the LFSR increases p times. So, the expected size
of the LFSR is
E[LFSR(n, p)] = βn/2+O(pn) = O(pn). (10)
Similarly, NLFSRs with the degree of parallelization p are
constructed by modifying its feedback functions to compute
pth power of its transition relation. This may increase in
the size of the circuit computing pth power of its transition
relation more than p times due to multiplication of non-linear
terms [44]. The the expected size of the NLFSR is thus
E[NLFSR(n, p)] ≥ 2β log2 n+α · p ·22 log2 n/(2log2 n)
≥ 2β log2 n+αpn2/(2log2 n)
≥ O(pn2/ log2 n). (11)
From equations (7), (9), (10), and (11), we can conclude
that, for sufficiently large random sequences, RNLUs with
the degree of parallelization p constructed by the presented
algorithm are asymptotically smaller than RNLUs constructed
by the algorithm [15], LFSRs, and NLFSRs.
Note that our analysis does not take into account that two
circuits implementing two k-variable Boolean functions may
share some gates, and therefore their cost may be smaller than
2α2k/k. However, since the analysis is asymptotic, this factor
is not likely to affect the results.
7 EXPERIMENTAL RESULTS
To compare the analytical results to the actual size of RNLUs,
we applied the presented algorithm and algorithms [15], [16],
to randomly generated binary sequences of length up to 105
bits.
For all algorithms, circuits for the updating functions were
synthesized using the logic synthesis tool ABC [49]. The
generic library of gates mcnc.genlib was used for technology
mapping.
Figures 4a and 4b show the results for the degrees of
parallelization 1 and 100, respectively. 2-input AND is used
as a unit of gate size. We can see that RNLUs constructed by
the presented algorithm are considerably smaller that RNLUs
constructed by the algorithms [15] and [16]. The improvement
is particularly striking for the degree of parallelization one.
For example, for sequences of length 105, RNLUs constructed
by the algorithm [15] are 6.67 times larger than RNLUs
constructed by the presented algorithm. For the degree of
parallelization 100 and sequences of length 105, RNLUs con-
structed by the algorithm [16] are 65.1% larger than RNLUs
constructed by the presented algorithm.
8 CONCLUSION
In this paper, we presented an algorithm for constructing
RNLUs in which the support set of updating functions is
reduced to the minimum. We proved that the expected size
of the resulting RNLUs is asymptotically smaller than the
expected size of RNLUs constructed by previous approaches.
The presented method might be useful for applications
which require efficient generation of binary sequences, such
as testing, wireless communication, and cryptography.
0 2 4 6 8 10
x 10
4
0
5
10
15
x 10
4
Sequence length, bits
S
iz
e
o
f
R
N
L
U
,
g
a
te
s
algorithm [15]
presented
(a) Degree of parallelization one.
0 2 4 6 8 10
x 10
4
0
0.5
1
1.5
2
2.5
3
x 10
4
Sequence length, bits
S
iz
e
 o
f 
R
N
L
U
, 
g
a
te
s
algorithm [16]
presented
(b) Degree of parallelization 100.
Fig. 4: Comparison of RNLUs constructed by the presented
algorithm to RNLUs constructed using the algorithms [15],
[16]. Each dot is computed as an average for 100 randomly
generated sequences of the same length.
ACKNOWLEDGEMENT
This work was supported in part by the research grant No
2011-03336 from Swedish Governmental Agency for Innova-
tion Systems (VINNOVA) and in part by the research grant
No 621-2010-4388 from the Swedish Research Council.
REFERENCES
[1] M. Robshaw, “Stream ciphers,” Tech. Rep. TR - 701, July 1994.
[2] A. Juels, “RFID security and privacy: a research survey,” IEEE Journal
on Selected Areas in Communications, vol. 24, pp. 381–394, Feb. 2006.
[3] T. Good and M. Benaissa, “ASIC hardware performance,” New Stream
Cipher Designs: The eSTREAM Finalists, LNCS 4986, pp. 267–293,
2008.
[4] B. G. Lee and B.-H. Kim, Scrambling Techniques for CDMA Commu-
nications. Berlin, Springer, 2001.
[5] R. L. Pickholtz and et. al., “Theory of spread spectrum communications
- a tutorial,” IEEE Trans. on Communications, vol. 30, no. 5, pp. 855–
883, 1982.
[6] R. Gold, “Optimal binary sequences for spread spectrum multiplexing
(corresp.),” Information Theory, IEEE Transactions on, vol. 13, pp. 619
–621, october 1967.
[7] J. Davis and J. Jedwab, “Peak-to-mean power control in OFDM, Golay
complementary sequences, and Reed-Muller codes,” IEEE Trans. on Inf.
Theory, vol. 45, no. 7, pp. 2397–2417, 1999.
[8] B. Popovic, “Spreading sequences for multicarrier CDMA systems,”
IEEE Transactions on Communications, vol. 47, pp. 918–926, June
1999.
8[9] E. McCluskey, “Built-in self-test techniques,” IEEE Design and Test of
Computers, vol. 2, pp. 21–28, 1985.
[10] E. B. Eichelberger and E. Lindbloom, “Random-pattern coverage en-
hancement and diagnosis for LSSD logic self-test,” IBM J. Res. Dev.,
vol. 27, pp. 265–272, May 1983.
[11] J. Rajski, J. Tyszer, M. Kassab, and N. Mukherjee, “Embedded determin-
istic test,” Computer-Aided Design of Integrated Circuits and Systems,
IEEE Transactions on, vol. 23, pp. 776 – 792, may 2004.
[12] C. Chin and E. J. McCluskey, “Weighted pattern generation for built-in
self test,” Tech. Rep. TR - 84-7, Stanford Center for Reliable Computing,
Aug. 1984.
[13] J. Savir, G. S. Ditlow, and P. H. Bardell, “Random pattern testability,”
IEEE Transactions on Computers, vol. C-33, pp. 79 –90, Jan. 1984.
[14] G. Hetherington and et. al., “Logic bist for large industrial designs:
real issues and case studies,” in Proc. of International Test Conference,
pp. 358 –367, 1999.
[15] E. Dubrova, “Synthesis of binary machines,” IEEE Transactions on
Information Theory, vol. 57, pp. 6890 – 6893, 2011.
[16] E. Dubrova, “Synthesis of parallel binary machines,” in Proc. of IC-
CAD’2011, (San Jose, CA, USA), Nov. 2011.
[17] C. E. Shannon, “The synthesis of two-terminal switching circuits,” Bell
System Technical Journal, vol. 28, no. 1, pp. 59–98, 1949.
[18] I. Wegener, The Complexity of Boolean Functions. John Wiley and Sons
Ltd, 1987.
[19] S. Golomb, Shift Register Sequences. Aegean Park Press, 1982.
[20] R. Lidl and H. Niederreiter, Introduction to Finite Fields and their
Applications. Cambridge Univ. Press, 1994.
[21] D. E. Knuth, The Art of Computer Programming Volume 2, Seminumer-
ical Algorithms. Boston, MA, USA: Addison-Wesley Reading, 1969.
[22] J. McCluskey, “High speed calculation of cyclic redundancy codes,” in
Proceedings of the 1999 ACM/SIGDA seventh international symposium
on Field programmable gate arrays, FPGA ’99, (New York, NY, USA),
pp. 250–256, ACM, 1999.
[23] G. Mrugalski, J. Rajski, and J. Tyszer, “Ring generators - New devices
for embedded test applications,” Transactions on Computer-Aided De-
sign of Integrated Circuits and Systems, vol. 23, no. 9, pp. 1306–1320,
2004.
[24] R. David, Random Testing of Digital Circuits. New York: Marcel
Dekker, 1998.
[25] S. Mukhopadhyay and P. Sarkar, “Application of LFSRs for parallel
sequence generation in cryptologic algorithms,” in Computational Sci-
ence and Its Applications - ICCSA 2006, vol. 3982 of Lecture Notes in
Computer Science, pp. 436–445, Springer Berlin / Heidelberg, 2006.
[26] E. R. Berlekamp, “Nonbinary BCH decoding,” in International Sympo-
sium on Information Theory, (San Remo, Italy), 1967.
[27] J. Massey, “Shift-register synthesis and BCH decoding,” IEEE Transac-
tions on Information Theory, vol. 15, pp. 122–127, 1969.
[28] D. Mandelbaum, “An approach to an arithmetic analog of Berlekamp’s
algorithm,” IEEE Transactions on Information Theory, vol. 30, no. 5,
pp. 758–762, 1984.
[29] K. Imamura and W. Yoshida, “A simple derivation of the Berlekamp-
Massey algorithm and some applications,” IEEE Transactions on Infor-
mation Theory, vol. 33, no. 1, pp. 146–150, 1987.
[30] P. Fitzpatrick, “New time domain errors and erasures decoding algorithm
for bch codes,” Electronics Letters, vol. 32, no. 2, pp. 110–111, 1994.
[31] M. Fleischmann, “Modified berlekamp-massey algorithm for two-sided
shift-register synthesis,” Electronics Letters, vol. 31, no. 8, pp. 605–606,
1995.
[32] J. Dornstetter, “On the equivalence between Berlekamp’s and Euclid’s
algorithms,” IEEE Transactions on Information Theory, vol. 33, no. 3,
pp. 428–431, 1987.
[33] L. Welch and R. Sholtz, “Continued fractions and Berlekamp’s algo-
rithm,” IEEE Transactions on Information Theory, vol. 25, no. 1, pp. 19–
27, 1979.
[34] A. Lempel and W. L. Eastman, “High speed generation of maximal
length sequences,” IEEE Trans. Comput., vol. 20, pp. 227–229, February
1971.
[35] H. Fredricksen, “A survey of full length nonlinear shift register cycle
algorithms,” SIAM Review, vol. 24, no. 2, pp. 195–221, 1982.
[36] C. J. Jansen, Investigations On Nonlinear Streamcipher Systems: Con-
struction and Evaluation Methods. Ph.D. Thesis, Technical University
of Delft, 1989.
[37] C. J. A. Jansen, “The maximum order complexity of sequence ensem-
bles,” Lecture Notes in Computer Science, vol. 547, pp. 153–159, 1991.
Adv. Cryptology-Eupocrypt’1991, Berlin, Germany.
[38] D. Linardatos and N. Kalouptsidis, “Synthesis of minimal cost nonlinear
feedback shift registers,” Signal Process., vol. 82, no. 2, pp. 157–176,
2002.
[39] P. Rizomiliotis and N. Kalouptsidis, “Results on the nonlinear span of
binary sequences,” IEEE Transactions on Information Theory, vol. 51,
no. 4, pp. 1555–5634, 2005.
[40] K. Limniotis, N. Kolokotronis, and N. Kalouptsidis, “On the nonlinear
complexity and Lempel-Ziv complexity of finite length sequences,”
IEEE Transactions on Information Theory, vol. 53, no. 11, pp. 4293–
4302, 2007.
[41] C. Cannie`re and B. Preneel, “Trivium,” New Stream Cipher Designs:
The eSTREAM Finalists, LNCS 4986, pp. 244–266, 2008.
[42] M. Hell, T. Johansson, A. Maximov, and W. Meier, “The Grain family of
stream ciphers,” New Stream Cipher Designs: The eSTREAM Finalists,
LNCS 4986, pp. 179–190, 2008.
[43] B. Gittins, H. A. Landman, S. O’Neil, and R. Kelson, “A presentation
on VEST hardware performance, chip area measurements, power con-
sumption estimates and benchmarking in relation to the AES, SHA-256
and SHA-512.” Cryptology ePrint Archive, Report 415, 2005.
[44] E. Dubrova and S. Mansouri, “A BDD-based approach to constructing
LFSRs for parallel CRC encoding,” in Proc. of International Symposium
on Multiple-Valued Logic, pp. 128–133, 2012.
[45] R. Rueppel, “Linear complexity and random sequences,” in Advances
in Cryptology – EUROCRYPT’85 (F. Pichler, ed.), vol. 219 of Lecture
Notes in Computer Science, pp. 167–188, Springer Berlin Heidelberg,
1986.
[46] N. Li, S. S. Mansouri, and E. Dubrova, “Secure key storage using
state machines,” in Multiple-Valued Logic (ISMVL), 2013 IEEE 43rd
International Symposium on, pp. 290–295, 2013.
[47] N. Li and E. Dubrova, “Embedding of deterministic test data for in-field
testing,” tech. rep., ArXive, January 2013.
[48] O. B. Lupanov, “Complexity of formula realization of functions of
logical algebra,” 1962.
[49] Berkeley Logic Synthesis and Verification Group, “ABC: A system for
sequential synthesis and verification, release 70930.”
