Pseudo Random Number Generation on FPGA by Herendi, Tamás
T. Herendi / Carpathian Journal of Electronic and Computer Engineering 4 (2011) 55-58                            55 
 
 
Pseudo Random Number Generation on FPGA 
 
Tamás Herendi*,
* University of Debrecen/Faculty of Informatics, Debrecen, Hungary 
** 
 
 
 
Abstract—The aim of the present paper is to show the 
theoretical background of the construction of uniformly 
distributed (UD) pseudo random number (PRN) sequences 
with good properties and efficient implementation on 
FPGA. 
I. INTRODUCTION 
Pseudo random numbers play an essential role in many 
applications, such as simulations, Monte-Carlo methods, 
coding and cryptography. There are numerous ways to 
produce sequences of pseudo random numbers see e.g. 
[4]. The use of linear feedback shift registers (LFSR) for 
generating pseudo random bit sequences is a well known 
application. It has several advantages and disadvantages, 
too. One of the good properties is the very simple 
structure of the generators. This gives the possibility for 
easy implementation, both on classical architecture 
computers and in FPGAs. On the negative side, however, 
the sequences contain only single bits and since the 
consecutive members have a strong correlation, it is not 
possible to construct sequences of several bit numbers 
from them, keeping the properties of the original ones. 
Knuth in [4] shows that one should be very careful when 
combining uniformly distributed (UD) sequences to 
achieve a new one, since the new sequence not necessarily 
remains UD.  Lidl and Niederreiter in [5] give a 
generalization of LFSR to finite fields and show a method 
for constructing UD sequences in these structures. 
However, since they use a more complex algebraic 
structure, the implementations of such generators are not 
quite simple. In the next chapters, we give a generalization 
of LFSR to residue ring structures and show the 
possibility of internal representation in the general system.  
II. BASIC DEFINITIONS AND PROPERTIES 
Definition 1 
Let 𝑎𝑎0, … , 𝑎𝑎𝑑𝑑−1 be integers and  𝑢𝑢 = {𝑢𝑢𝑛𝑛}𝑛𝑛=0∞  is an 
infinite sequence of integers, satisfying the recurrence 
relation  
𝑢𝑢𝑛𝑛+𝑑𝑑 = 𝑎𝑎𝑑𝑑−1𝑢𝑢𝑛𝑛+𝑑𝑑−1 + ⋯+ 𝑎𝑎0𝑢𝑢𝑛𝑛  
for any 𝑛𝑛 = 0,1, … Then 𝑢𝑢 is called a linear recurring 
sequence (LRS), where 𝑎𝑎0, … , 𝑎𝑎𝑑𝑑−1 are the coefficients 
and 𝑢𝑢0, … ,𝑢𝑢𝑑𝑑−1 are the initial values of 𝑢𝑢. 
The order of the recurrence is 𝑑𝑑 and the corresponding 
characteristic polynomial is  
𝑃𝑃(𝑥𝑥) = 𝑥𝑥𝑑𝑑 − 𝑎𝑎𝑑𝑑−1𝑥𝑥𝑑𝑑−1 −⋯− 𝑎𝑎0 . 
 
Definition 2 
Let 𝑢𝑢 be the integer LRS defined by the coefficients 
𝑎𝑎0, … , 𝑎𝑎𝑑𝑑−1 and initial values 𝑢𝑢0, … ,𝑢𝑢𝑑𝑑−1. 
Denote by  
𝑢𝑢�𝑛𝑛(𝑘𝑘) = (𝑢𝑢𝑛𝑛 , … ,𝑢𝑢𝑛𝑛+𝑘𝑘−1) 
the 𝑛𝑛𝑡𝑡ℎ  𝑘𝑘-dimensional state vector and by  
 
𝑀𝑀(𝑢𝑢) =
⎝
⎜
⎛
0   1 …  0      0      
⋮    ⋮  ⋱   ⋮       ⋮      0   0 …  1      0      0   0 …  0      1      
𝑎𝑎0 𝑎𝑎1 …  𝑎𝑎𝑑𝑑−2 𝑎𝑎𝑑𝑑−1⎠⎟
⎞
 
the companion matrix of 𝑢𝑢. 
 
Remark 
With the above notations we may write  
𝑢𝑢�𝑛𝑛(𝑑𝑑) = 𝑀𝑀(𝑢𝑢)𝑛𝑛𝑢𝑢�0(𝑑𝑑). 
 
Definition 3 
Let 𝑢𝑢 be a sequence of integers and 𝑠𝑠 be an integer. We 
say that 𝑢𝑢 reduced modulo 2𝑠𝑠 is periodic with period 
length 𝜌𝜌, if there exists a positive integer 𝜌𝜌0, such that 
𝑢𝑢𝑛𝑛 ≡ 𝑢𝑢𝑛𝑛+𝜌𝜌  mod 2𝑠𝑠, i.e. 𝑢𝑢𝑛𝑛  and 𝑢𝑢𝑛𝑛+𝜌𝜌  has the same residue 
divided by 2𝑠𝑠, for all 𝑛𝑛 ≥ 0. The least such 𝜌𝜌 is called the 
minimal period length of the sequence. 
If 𝜌𝜌0 = 0 then 𝑢𝑢 is called purely periodic. 
 
Remark 
Let 𝑢𝑢 be LRS of integers. By definition, 𝑢𝑢 reduced by 2𝑠𝑠 
is periodic for any positive 𝑠𝑠. 
 
Remark 
If we set 𝑠𝑠 = 1 in Definition 3, we arrive to the 
mathematical model of the well known LFSR. Here the 
coefficients of the recurrence are 0 and 1.  The practical 
meaning is that the values in registers corresponding to  
coefficients equal to 1 are fed back through an exclusive 
or filter while the values in registers corresponding to 
coefficients equal to 0 are simply stored for future use.  
 
Definition 4 
Let 𝑢𝑢 be a sequence of integers and 𝑠𝑠 be a positive 
integer. We say that 𝑢𝑢 is uniformly distributed (UD) 
reduced modulo 2𝑠𝑠, if  
𝑙𝑙𝑙𝑙𝑙𝑙
𝑁𝑁→∞
1
𝑁𝑁
#{𝑛𝑛 ≤ 𝑁𝑁|𝑢𝑢𝑛𝑛 ≡ 𝑎𝑎 mod 2𝑠𝑠} = 12𝑠𝑠 
for all integers 𝑎𝑎, i.e. if we observe a reasonable long 
segment of the sequence, the relative frequencies of the 
residues derived from the members of the sequence by 
reducing modulo 2𝑠𝑠 are close to each other. 
 
Remark 
In the case of the most common LFSR (𝑠𝑠 = 1) the UD 
property implies that the relative frequency of the 0s (and 
T. Herendi / Carpathian Journal of Electronic and Computer Engineering 4 (2011) 55-58                            56 
 
 
ISSN 18474 – 9689 
 
 
Figure 1. LFSR with 6 register 
similarly the relative frequency of 1s) are approaching 0.5 .  
 
Remark 
General conditions for the UD of LRSs can be found in 
[6], [7], [1] and [2]. 
 
III. THEORETICAL BACKGROUND 
 
The main idea of the presented construction of PRN 
generators is that we choose an initial UD sequence and 
try to extend to some more complex one. In our case the 
basic generator is a proper LFSR which is converted by a 
simple but high time complexity transformation.  
 
Example 
Fig. 1 shows an LFSR composed from 6 storage 
registers and 4 XOR gates. Sending [0,0,0,0,0,1] to the 
input, located on the left, one get  [0, 0, 0, 0, 0, 1, 0, 1, 1, 0,  1, 1, 1, 0, 0, 1, 1, 1, 1, 1,  0, 1, 0, 0, 1, 0, 0, 0, 1, 1,  0, 0, 0, 0, 0, 1, 0, 1, 1, … ] 
on the output, located on the right. The sequence repeats 
from the fourth line. The period length of the repetition is 
30. The formal description of the shift register is 
𝑢𝑢𝑛𝑛+6 = 𝑢𝑢𝑛𝑛+4 ⊗𝑢𝑢𝑛𝑛+3 ⊗𝑢𝑢𝑛𝑛+2 ⊗𝑢𝑢𝑛𝑛+1 ⊗𝑢𝑢𝑛𝑛  , 
where ⨂ denotes the XOR operation. 
The mathematical model of the sequence is 
𝑢𝑢𝑛𝑛+6 = 𝑢𝑢𝑛𝑛+4 + 𝑢𝑢𝑛𝑛+3 + 𝑢𝑢𝑛𝑛+2 + 𝑢𝑢𝑛𝑛+1 + 𝑢𝑢𝑛𝑛   mod 2 . 
Here the order 𝑑𝑑 of the recurrence is 6, while the 
characteristic polynomial is 
𝑃𝑃(𝑥𝑥) = 𝑥𝑥6 − 𝑥𝑥4 − 𝑥𝑥3 − 𝑥𝑥2 − 𝑥𝑥 − 1 . 
By general theory, one can prove that 𝑢𝑢𝑛𝑛  is UD modulo 2, which can be checked by counting the members of the 
sequence (any consecutive subsequence of length 30 
contains exactly 15 0′𝑠𝑠 and 15 1′𝑠𝑠).  
There are two weakness of the above defined sequence: 
first, its period is rather short and second, the generation 
of numbers greater than 1 is possible only by tying 
together some 𝑘𝑘 bits using a proper selection method, e.g. 
the 𝑘𝑘 consecutive members of the sequence. However, the 
computed number sequence looses the UD property. Both 
problem can be solved using the following results. 
   
Definition 5 
An LFSR (or in general an LRS) with initial values 
[0,0, … ,0,1] is called an impulse response sequence (IR). 
 
Theorem 1 
Let 𝑢𝑢 be an IR LRS of integers with characteristic 
polynomial  
𝑃𝑃(𝑥𝑥) ≡ (𝑥𝑥 + 1)2𝑄𝑄(𝑥𝑥) mod 2 , 
where 𝑄𝑄(𝑥𝑥) is a degree 𝑘𝑘, modulo 2 irreducible 
polynomial. Then 𝑢𝑢 is UD modulo 2. Choosing 𝑄𝑄(𝑥𝑥) with 
a little care, 𝑢𝑢 have period length 2𝑘𝑘+1 − 2.  
The proof of Theorem 1 can be found in [2]. 
 
 
Remark 
Thanks to Theorem 1, theoretically, one can simply 
create an UD LRS generated by an LFSR with long 
period.  Choosing the degree of 𝑄𝑄(𝑥𝑥) to be large enough, 
the period length of the sequence became reasonable. For 
instance, set 𝑘𝑘 = 1000 then the period length is 21001 −2, which is approximately 10302 . 
 
For the construction of pseudo random number (PRN) 
sequences the following result is important. 
 
Theorem 2 
Let 𝑄𝑄(𝑥𝑥) be an irreducible polynomial modulo 2 of 
degree 𝑘𝑘 and let  
𝑃𝑃(𝑥𝑥) ≡ (𝑥𝑥 + 1)2𝑄𝑄(𝑥𝑥) mod 2 . 
Further, let  
  𝑃𝑃1(𝑥𝑥) = 𝑃𝑃(𝑥𝑥), 
  𝑃𝑃2(𝑥𝑥) = 𝑃𝑃(𝑥𝑥) − 2,   𝑃𝑃3(𝑥𝑥) = 𝑃𝑃(𝑥𝑥) − 2𝑥𝑥,   𝑃𝑃4(𝑥𝑥) = 𝑃𝑃(𝑥𝑥) − 2𝑥𝑥 − 2 , 
and the corresponding IR LRSs are 𝑢𝑢(1), 𝑢𝑢(2), 𝑢𝑢(3) 
és 𝑢𝑢(4) , respectively. 
Then exactly one of the sequences 𝑢𝑢(𝑙𝑙) is UD reduced 
modulo 2𝑠𝑠 for all 𝑠𝑠 ≥ 1. With a proper 𝑄𝑄(𝑥𝑥) we can 
assume, that the period length of the sequences are 2𝑠𝑠(2𝑘𝑘+1 − 2) . 
The proof of Theorem 2 can be found in [2]. 
 
Remark 
By Theorem 2, if we find the LRS 𝑢𝑢(𝑙𝑙) with the strong 
UD property, then the sequence reduced modulo 2𝑠𝑠 
provides pseudo random integers of 𝑠𝑠 bits.  
 
IV. ALGORITHM 
The algorithm, presented below, based on the previous 
results and on their proofs, lets us to construct LRSs 
which are UD modulo 2s  for all s ≥ 1. 
Step 1.  
Choose a suitable integer k and find a monic 
polynomial Q(x) of degree k, which reduction modulo 2 
is irreducible. If k is a so called Mersenne-prime, the 
search for a proper Q(x) is more simple. Let = 2k − 1 . 
Step 2.  
Calculate the polynomial 
T. Herendi / Carpathian Journal of Electronic and Computer Engineering 4 (2011) 55-58                            57 
 
 
ISSN 18474 – 9689 
 
 
Figure 2. The LRS generator un+d=ad-1un+d-1+…+a0un 
P(x) = xk+2 − pk+1xk+1 −⋯− p0 
by the relation  P(x) ≡ (x + 1)2Q(x) mod 2 , 
where the coefficients p0, … , pk+1 ∈ {0,1} and let  P′(x) ≡ (x + 1)Q(x) mod 2 . 
Further, let  
 P1(x) = P(x),  
 P2(x) = P(x) − 2, 
 P3(x) = P(x) − 2x,  
 P4(x) = P(x) − 2x − 2. 
Step 3.  
Calculate the companion matrices M(i) corresponding to 
the characteristic polynomial Pi(x). Check for which i the 
relation M(i)1� ≡ 1� mod 4 holds. Keep the two matrices 
which satisfy the congruence and denote them by M1 and M2 . 
Step 4.  
Compute the matrix (M1)2ρ  mod 4. If (M1)2ρ  ≡E mod 4, where E is the unit matrix, then the sequence we 
are looking for is the LRS given by the recurrence relation 
corresponding to M2, otherwise it is the LRS given by the 
recurrence relation corresponding to M1. 
 
Remark 
The proofs of the theorems and correctness of the 
algorithms and the detailed construction of LRSs can be 
found in [2].  
 
Remark 
The most time consuming part of the algorithm is the 
computation of (𝑀𝑀1)2𝜌𝜌  mod 4.  If the value of 𝑘𝑘 is around 1000, then we have to multiply 1000 times matrices of the 
size 1000𝑥𝑥1000.  
 
V. IMPLEMENTATION 
The practical use of LRSs as PRN generators rise two 
issues. First, one has to find the parameters of a suitable 
LRS. Second, if one has the parameters of a proper LRS, 
it is still question how to compute the members of the 
sequence.  
To find the coefficients of a good generator, as it was 
mentioned before, is a time consuming task. At the present 
state of technology, using a single processor (single core) 
computer one can check the UD property of a given LRS 
within a reasonable time (i.e. in a few weeks) for about 
𝑘𝑘 = 10000. However, [3] describes a construction for an 
FPGA which increases the speed of the computations by a 
factor around 1000. The design developed in the paper 
uses tiny processors for computing dot product of vectors 
of small integers. 400 of the mentioned processors are 
connected to a network to compute row-column 
combinations of matrices.  Since the storage capacity of 
an FPGA is rather limited, the most difficult problem 
during the execution is to move a large amount of data to 
the processing units in time. Organizing the order of 
computation of partial products and data flow, however, 
make it possible that all important units can work 
continuously. It makes the design very effective, which 
yields, that either the running time can be reduced to some 
hours or the size of the recurrence can be increased to 
30000, which provides already an extremely huge PRN 
generator with period length more than 230000 .  
Theoretically and finally in practice, we are able to 
calculate parameters of large LRSs for PRN generator 
purposes. Although the computation of the members of a 
LRS is a very simple as an algorithmic problem, the 
length of the core of the generator causes the reduction of 
the speed of the calculation. For example, if we want to 
use a recurrence of order 10000 for integers with 1024 
bits, we have to store 10000 numbers of 128 bytes and 
determining the next member of the sequence, depending 
on the properties of the coefficients, we have to apply 
approximately 5000 addition on numbers of the given 
size. The amount and structure of operations one have to 
execute during computation of the sequence immediately 
bring the possibility of parallelization into sight. 
Furthermore, since the applied operations are quite simple, 
one feels that the abilities of a complex processor are 
unexploited. If one could produce small reduced 
instruction set processors, the efficiency will increase and 
the space occupied by one unit decreases.  Using an FPGA 
is ideal for these requirements.  
In the case of LFSRs, it is well known, that 
reorganizing the connections of the structure can 
considerably decrease the number of necessary clock 
cycles. If one implements them directly by the definition 
the result is the so called external LFSR representation. 
The speed of this representation strongly depends on the 
speed of the applied operation (in this case it is the XOR). 
The efficiency of the external representation can be 
increased by the use of operation networks. Furthermore, 
the internal representation of an LFSR reduces the 
operational time to one clock cycle.  
Implementation of general LRS generators 
The general LRS generators can have similar physical 
structure as the LFSRs, substituting the flip-flops by 
storage registers and the XOR gates by operational 
components. However, here some extra unit should be 
added for the realization of multiplications. Fig. 2 shows 
the flow chart of the generator, corresponding to the 
recurrence relation 
𝑢𝑢𝑛𝑛+𝑑𝑑 = 𝑎𝑎𝑑𝑑−1𝑢𝑢𝑛𝑛+𝑑𝑑−1 + ⋯+ 𝑎𝑎0𝑢𝑢𝑛𝑛  . 
 
If the additive operation has associative and 
commutative properties (as the usual addition and modular 
addition have) we can redraw again the structure to an 
operation network. (See Fig. 3) Such a network provides 
the result of operations in logarithmic time in the best 
case, compared to the sequential execution. In general, the 
construction of an operation network would require more 
space than the serial one, but thanks to the special 
algebraic properties, in our case there are no increase in 
the area used by the components. One has to be careful, 
T. Herendi / Carpathian Journal of Electronic and Computer Engineering 4 (2011) 55-58                            58 
 
 
ISSN 18474 – 9689 
 
 
Figure 4. Internal representation of a LRS generator  
un+d=ad-1un+d-1+…+a0un 
 
Figure 3. Network of associative and commutative operations 
however, with the increasing amount of interconnections. 
Further improvement can be achieved by generalization 
of the internal representation of LFSRs to the universal 
case.  
 
Theorem 3 
Let 𝑢𝑢 be a LRS generated by the recurrence relation 
  𝑢𝑢𝑛𝑛+𝑑𝑑 = 𝑎𝑎𝑑𝑑−1𝑢𝑢𝑛𝑛+𝑑𝑑−1 + ⋯+ 𝑎𝑎0𝑢𝑢𝑛𝑛  (1) 
and let the vector sequence 𝑣𝑣 be defined by the 
following 
 𝑣𝑣0𝑛𝑛 = 𝑎𝑎0𝑢𝑢𝑛𝑛−1 
 𝑣𝑣1𝑛𝑛 = 𝑎𝑎0𝑢𝑢𝑛𝑛−2 + 𝑎𝑎1𝑢𝑢𝑛𝑛−1 
… 
 𝑣𝑣𝑑𝑑−2𝑛𝑛 = 𝑎𝑎0𝑢𝑢𝑛𝑛−𝑑𝑑+1 + ⋯+ 𝑎𝑎𝑑𝑑−2𝑢𝑢𝑛𝑛−1 
 𝑣𝑣𝑑𝑑−1𝑛𝑛 = 𝑎𝑎0𝑢𝑢𝑛𝑛−𝑑𝑑 + ⋯+ 𝑎𝑎𝑑𝑑−2𝑢𝑢𝑛𝑛−2 + 𝑎𝑎𝑑𝑑−1𝑢𝑢𝑛𝑛−1 . 
Then  
  𝑢𝑢𝑛𝑛 = 𝑣𝑣𝑑𝑑−1𝑛𝑛  (2) 
and  
  𝑣𝑣𝑙𝑙𝑛𝑛 = 𝑣𝑣𝑙𝑙−1𝑛𝑛 + 𝑎𝑎𝑙𝑙𝑢𝑢𝑛𝑛−1 . (3) 
 
Proof. 
Substituting 𝑛𝑛 − 𝑑𝑑 for 𝑛𝑛 in (1) we obtain 
  𝑢𝑢𝑛𝑛 = 𝑎𝑎𝑑𝑑−1𝑢𝑢𝑛𝑛−1 + ⋯+ 𝑎𝑎0𝑢𝑢𝑛𝑛−𝑑𝑑 . (4) 
The right hand side of this equation is just the definition of  
𝑣𝑣𝑑𝑑−1𝑛𝑛  , which proves (2).  
Let fix 𝑙𝑙. Substituting the definition of 𝑣𝑣𝑙𝑙−1𝑛𝑛   in (3) we get 
the equation of the definition of 𝑣𝑣𝑙𝑙𝑛𝑛  , which proves (3). 
 
The corollary of Theorem 3 is that the internal 
representation of LFSRs can be generalized to arbitrary 
LRS. This gives the possibility of implementation of a 
LRS as it is shown on Fig. 4. 
 
The physical realization of a LRS generator on FPGA 
still has some limitations. If we want to produce a 
sequence of 1024-bit integers, we have to place on the 
device 1024-bit adders. The space on even the largest 
FPGAs enables only a few of such an adder to implement. 
The solution to this problem is that we execute only a 
segment of the addition at once and then shift the 
sequence to the adder chain again. The result is an 
approximately 100 times faster generator than one can 
create on a conventional computer. 
Finding special recurrences with sparse coefficients 
decrease the need for additions and hence increase the 
speed of computations of the next members of the 
sequence. 
ACKNOWLEDGMENTS 
Research supported by the TÁMOP 4.2.1./B-09/1/KONV-2010-
0007 project and TARIPAR3 project grant Nr. TECH 08-A2/2-
2008-0086 
 
REFERENCES 
[1] T. Herendi  “Uniform distribution of  linear recurring sequences 
modulo prime powers”, Journal of Finite Fields and Applications, 
vol.10, 2004, pp. 1–23 
[2] T. Herendi “Construction of uniformly distributed linear recurring 
sequences modulo powers of  2”, in press  
[3] T. Herendi and S.R. Major “Modular exponentiation of matrices 
on FPGA-s”, in press 
[4] D.E. Knuth, The art of computer programming, Addison-
Wesley,1973 
[5] R. Lidl, H. Niederreiter, Introduction to finite fields and their 
applications, Cambridge University Press, 1986 
[6] H. Niederreiter, J.S.Shiue “Equidistribution of linear recurring 
sequences in finite fields”, Indag. Math., vol. 39, 1977  pp. 397--
405 
[7] H. Niederreiter, J.S.Shiue “Equidistribution of linear recurring 
sequences in finite fields II”, Acta Arith., vol. 38, 1980,  pp. 197—
207 
 
  
 
