Reed-Muller Realization of X (mod P) by Gorodecky, Danila
Reed-Muller Realization of X (mod P) 
Danila A. Gorodecky 
United Institute of Informatics Problems of  
NAS of Belarus 
Minsk, Belarus 
danila.gorodecky@gmail.com 
 
 
Abstract—This article provides a novel technique of X (mod P) 
realization. It is based on the Reed-Muller polynomial expansion. 
The advantage of the approach concludes in the capability to 
realize X (mod P) for an arbitrary P. The approach is competitive 
with the known realizations on the speed processing. Advantages 
and results of comparison with the known approaches for X [9:1] 
and P=7 is demonstrated. 
Keywords—modular arithmetic, residue number system, X (mod 
P), Reed-Muller expansion 
I.  INTRODUCTION 
The realization of the X (mod P) operation occupies a 
central place in cryptography; an efficiency of its realization in 
the residue number system (RNS) defines whether RNS will 
find wide implementation in practice or not. 
There are two ways for hardware realization of X (mod P). 
The pipelining realization is implemented for transformation of 
a data flow to a sequence of residues. An example of a fast 
pipelining realization in cryptography has been proposed in [1]. 
Another way is based on the iteration process. An iteration 
produces the bits in decreasing significance with later iterations 
producing bits of lower significance. Variations of these 
techniques referring to RNS have been proposed in [2, 3]. The 
main goal of the realization X (mod P) is to achieve high speed 
processing. 
The first way is suitable for an arbitrary value of P, but the 
speed of the approach is limited by a block of pipeline, which 
includes three kind of successive operations (comparison, 
multiplexing, and subtraction). The second way is efficient just 
for some types of P ( 32,12,2  nnn  [2,3,4] and some 
variations of them [5], e.g. 12,122 122
1
 

n
n
n ). The article 
proposes an approach for X (mod P) realization which is 
suitable and efficient for an arbitrary value of P, but it 
competitive with an iterative procedure. 
Result of the calculation of X (mod P)=S is  -bits binary 
vector, where   1log2  P , and every digit of 
 11,...,, SSSS    is a Boolean function represented by 
Zhegalkin (or positive polarity Reed-Muller) form – 
polynomial XOR expansion with only incompetent variables or 
Zhegalkin expansion. The rest of the material describes a 
technique of generation of the polynomial extensions and 
hardware realizations of them. 
A case for 2P  does not considered due to a simple way 
of realization of  2modX . In this way 
     1111 ,...,,2mod,...,, xxxxxx nn   , i.e.   
 1910 ...,,, xxxX  and 
32P , then    Xxxx 8mod,, 123 .  
II. X (MOD P) REALIZATION BY USING BOOLEAN FUNCTIONS 
The idea of the approach is to consider a result of X (mod 
P)=S as the system of   Boolean functions. Let’s define 
 11,...,, xxxX nn  ,  11,...,, pppP   , and 
 11,...,, SSSS   . Hence a Boolean function iS , ,1i , 
depends on n  variables, i.e.  11,...,, xxxSS nnii  . 
For any other case For example, a system of functions for 
the case X (mod 5)=S, where  12345 ,,,, xxxxxX , takes the 
following form: 
 
 
 






.,,,,
;,,,,
;,,,,
1234533
1234522
1234511
xxxxxSS
xxxxxSS
xxxxxSS
 
An arbitrary Boolean function depends on n variables may 
be represented with the set of the truth numbers  iSA  – 
numbers which correspond to the indexes of the truth table 
vector  iSw . This set contains numbers corresponded to 
unities on function values. Let’s generate Boolean functions 
with the following way: a function   1,...,, 11  xxxS nni  if and 
only if     PSSSSSPX i mod,...,1,...,,mod 11   . It is 
an equivalent for the set of the truth numbers  iSA , when this 
set consists of numbers contained unity on i th bits on mod P 
in the range from 0 to 12 n . For example, for  123 ,, xxxX  
and 3P , the set of truth numbers for 1S  is equal 
   7,4,11 SA  and for 2S  is equal    5,22 SA . 
There is a one to one correspondence between the set of the 
truth numbers  iSA  and the truth vector  iSw : the j -th entry 
of the set of truth numbers corresponds to the j -th unity of the 
truth vector. Thus   }7,4,1{1 SA  is transformed into 
  }1,0,0,1,0,0,1,0{1 Sw  and    5,22 SA  is transformed into 
   0,0,1,0,0,1,0,02 Sw . Let’s recall that the truth vector  iSw  
is the binary vector whose entry corresponds to the term from 
the full disjunctive normal form (FDNF) of the function iS . 
FDNF is a disjunctive normal form with disjunctions which 
contained all variables of the function depends on.  
The polynomial expansion of the function is the most 
efficient representation than others normal forms of Boolean 
functions for some criteria, e.g. because of a smaller number of 
terms and units in a circuit (in some cases is much smaller) [6]. 
As 1 from the truth vector corresponds to the term from 
FDNF of the function iS , as well as 1 from the Zhegalkin 
spectrum (or Reed-Muller spectrum [7]) corresponds to the 
term from the Zhegalkin (Reed-Muller) expansion. This 
expansion is referred as  iSr . And the truth vector should be 
transformed to the Zhegalkin spectrum. This task may be 
solved with the number of methods, and to demonstrate the 
procedure of transformation we will use the combinatorial 
method [8]. The principle of the transformation of  iSw  to 
 iSr  (and backward) for an arbitrary Boolean function iS  is 
represented with the following theorem. 
Theorem 1 [8]. The i th entry iw  of the truth 
vector    
1210
,...,,

 nwwwFw  of the Boolean function F  is 
calculated with the following formula: 
 



























,0
;2mod1...,1
21
otherwise
a
i
a
i
a
i
if
w qi  
where nai ,11 , 0







ja
i
 for jai  and 












 
1
1
1
121
,...,,1,0...,,0,0
an
a
a
nwww . In other words, q  is the number 
of unities of the truth vector    
1210
,...,,

 nwwwFw . 
 It is helpful to use a consequence of the Lucas theorem [9] 
to transform  iSw  to  iSr . 
 Theorem 2 [9].  2mod1





a
n
   each bit of a  is no 
more than the same bit of n . 
 Let’s demonstrate the implementation of theorems on the 
transformation of   }1,0,0,1,0,0,1,0{1 Sw  to  1Sr . According 
to Theorem 1 000 rw  and 111 rw , hence 
   721 ...,,,1,0 rrSr   and using Theorem 2 
      02mod02mod
01
10
2mod
1
2
2 











r ,
      12mod12mod
01
11
2mod
1
3
3 











r ,
      12mod12mod
100
100
001
100
2mod
4
4
1
4
4 








































r ,
      02mod02mod
100
101
001
101
2mod
4
5
1
5
5 








































r ,
      12mod12mod
100
110
001
110
2mod
4
6
1
6
6 








































r ,
   





















































 2mod
111
111
100
111
001
111
2mod
7
7
4
7
1
7
7r  
  12mod1  . 
 In the result    1,1,0,1,1,0,1,01 Sr . 
 As the q th unity of    1,1,0,1,1,0,1,01 Sr  correspond to the 
q th term of the Zhegalin polynomial of the function 1S , then 
  3213232113211 ,, xxxxxxxxxxxxS  . 
 The same procedure is used to generate expansions for 2S  
and 3S .  
III. SOFTWARE REALIZATION OF X (MOD P) 
The generating of the converter for the calculation of X 
(mod P)=S  consists of four steps: calculating of the truth 
numbers  iSA  and the truth vector  iSw  of function iS ; 
transformation of  iSA  or  iSw  to  iSB  and  iSr  
respectively; generating of a polynomial  iSP ; modeling and 
synthesizing (with ISE Xilinx or LeonardoSpectrum) of the 
resulting polynomials. 
The proposed approach is realized by four software blocks: 
Python   Java   Python   VHDL. The scheme of the 
software realization in step-by-step manner is pictured at the 
Fig. 
Inputs for the first step are values of P and X. Python 
realization calculates the truth vector  iSw  and the truth 
numbers  iSA . The calculation for  11415 ,...,, xxxX  and 
 1234 ,,, ppppP  is produced in 0,5 second. 
The second step is realized by Java-block. It transforms of 
 iSw  and  iSA  to  iSr  and  iSB  respectively. The 
process of calculating of  iSr  and  iSB  takes approximately 
30 seconds for  11415 ,...,, xxxX . 
The third step is dedicated to generating of all polynomials 
 iSP  from  iSr  (  iSB ), where ,1i  and 
 11,...,, SSSS   . The developed Python realization produces 
the step in 10 seconds. 
The last fourth step generalizes previous steps. It joins 
VHDL descriptions of all polynomials  1SP , 
 2SP ,…,  SP  in one file. The resulting description is 
synthesized. 
The next section represents the procedure of generating X 
(mod P)=S in details. 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
IV. EXAMPLE OF X (MOD P)=S CALCULATION 
 Let’s consider the process of generating of a converter 
  SPX mod  through all steps on the following example, 
where  145 ,...,, xxxX  and 5P . 
A. The first step: calculating of  iSw  
According to the condition  31...,,1,0X  and 
 123 ,, SSSS , sets of the truth numbers for function 123 ,, SSS  
consist of numbers from the range from 0 to 31. The set of the 
truth numbers and the truth vector for the function: 
– 1S . The truth numbers with the unity on the 1
st
 bit of 
numbers for modulo 5 is 
   31,28,26,23,21,18,16,13,11,8,6,3,11 SA  and the truth 
vector is   ,0,0,1,0,1,0,0,1,0,1,0,0,1,0,1,0,0,1,0,1,0,0,1,0,1,0{1 Sw  
}1,0,0,1,0,1 ; 
– 2S . The truth numbers with the unity on the 2
nd
 bit of 
numbers for modulo 5 is 
   28,27,23,22,18,17,13,12,8,7,3,22 SA  and the truth vector is 
  ,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0{2 Sw
}0,0,0,1,1 ; 
– 3S . The truth numbers with the unity on the 3
rd
 bit of 
numbers for modulo 5 is    29,24,19,14,9,43 SA  and the 
truth vector is   ,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0{3 Sw  
}0,0,1,0,0,0,0,1,0,0 . 
B. The second step: transformation  iSA  to  iSB  (or 
 iSw  to  iSr )  
As  iSA  is analogue to  iSw , and  iSB  is analogue to 
 iSr , thus we demonstrate this step producing on the 
transformation of the truth numbers  iSA  to the Zhegalkin 
spectrum  iSr . 
Illustration of the transformation is unwieldy. Therefore, 
we will illustrate by transforming only  3SA  to  3SB . From 
the previous step    29,24,19,14,9,43 SA  and according to 
Theorem 1, and to Theorem 2    3153 ...,,,1,0,0,0,0 rrSr   and 
  12mod
4
5
5 





r ;   12mod
4
6
6 





r ,   12mod
4
7
7 





r , 
  02mod
4
8
8 





r ,   12mod
9
9
4
9
9 



















r , 
  02mod
9
10
4
10
10 



















r ,   12mod
9
11
4
11
11 



















r , 
  12mod
9
12
4
12
12 



















r ,   02mod
9
13
4
13
13 



















r , 
  02mod
14
14
9
14
4
14
14 

























r , 
  12mod
14
15
9
15
4
15
15 

























r , 
  02mod
14
16
9
16
4
16
16 

























r ,
  02mod
14
17
9
17
4
17
17 

























r ,
  02mod
14
18
9
18
4
18
18 

























r ,
  12mod
19
19
14
19
9
19
4
19
19 































r ,
  12mod
19
20
14
20
9
20
4
20
20 































r ,
  12mod
19
21
14
21
9
21
4
21
21 































r ,
  12mod
19
22
14
22
9
22
4
22
22 































r ,
 
Calculation of  iSA  
and  iSw  
 
 
Python 
 
 
Calculation of  iSB  
 and  iSr  
 
Java 
 
 
X  
 
P  
 
 iSw         iSA  
 
Generating of  iSP  
 
Python 
 
 
 iSB         iSr  
 
Modeling and synthesis 
of  1SP ,  2SP ,…,  SP  
VHDL 
(ISE / 
Leonardo) 
 iSP
 
      
Fig. Structure of the software realization of the 
proposed approach 
 
  02mod
19
23
14
23
9
23
4
23
23 































r ,
  12mod
24
24
19
24
14
24
9
24
4
24
24 





































r ,
  02mod
24
25
19
25
14
25
9
25
4
25
25 





































r ,
  12mod
24
26
19
26
14
26
9
26
4
26
26 





































r
  12mod
24
27
19
27
14
27
9
27
4
27
27 





































r
  02mod
24
28
19
28
14
28
9
28
4
28
28 





































r ,
  02mod
29
29
24
29
19
29
14
29
9
29
4
29
29 











































r ,
  12mod
29
30
24
30
19
30
14
30
9
30
4
30
30 











































r ,
  02mod
29
31
24
31
19
31
14
31
9
31
4
31
31 











































r . 
In the result   1,0,1,1,0,0,1,1,1,0,1,0,0,0,0,1,(3 Sr  
0)1,1,0,0,1,1,1,0,1,0,0,0,0,1,1, . 
In same manner we produce   1,1,1,1,0,0,1,0,0,0,(1 Sr  
0,1)1,0,1,1,0,1,1,1,1,0,0,1,0,0,0,1,0,1,1,0,  and   0,0,1,(2 Sr  
1,1,0,0)1,1,0,1,0,0,0,0,1,1,1,1,0,0,1,1,1,0,1,0,0,0,0,1,1, . 
C. The third step: generating of polynomials  1SP ,  2SP , 
and  3SP  
The step aims to generate  iSP  according to  iSB  or 
 iSr . A unity of the Zhegalkin spectrum indicates which 
number of term is included in the polynomial. For example, 
unity on the 6
th
 bit of the Zhegalkin spectrum (it corresponds to 
the number    1106 123  xxx  from the set of truth numbers) 
corresponds to the term 32xx . 
Polynomial expansions for functions 321 ,, SSS  are 
represented in VHDL below: 
S(1) < = x(1) xor (x(1) and x(3)) xor (x(2) and x(3)) xor 
(x(1)and x(2) and x(3)) xor x(4) xor (x(2) and x(4)) xor 
(x(3) and x(4)) xor (x(1) and x(3) and x(4)) xor x(5) xor 
(x(3) and x(5)) xor (x(1) and x(3) and x(5) ) xor (x(2) and 
x(3) and x(5)) xor (x(1) and x(2) and x(3) and x(5)) xor 
(x(1) and x(4) and x(5) ) xor (x(1) and x(2) and x(4) and 
x(5)) xor (x(3) and x(4) and x(5)) xor (x(1) and x(2) and 
x(3) and x(4) and x(5)); 
S(2) <= x(2) xor (x(2) and x(3)) xor (x(1) and x(2) and 
x(3)) xor x(4) xor (x(1) and x(4)) xor (x(1) and x(2) and 
x(4) ) xor (x(1) and x(3) and x(4) ) xor (x(2) and x(3) and 
x(4)) xor (x(1) and x(5)) xor (x(1) and x(3) and x(5)) xor 
(x(2) and x(3) and x(5)) xor (x(1) and x(2) and x(3) and 
x(5)) xor (x(4) and x(5)) xor (x(2) and x(4) and x(5)) xor 
(x(3) and x(4) and x(5)) xor (x(1) and x(3) and x(4) and 
x(5)); 
S(3) <= x(3) xor (x(1) and x(3)) xor (x(2) and x(3)) xor 
(x(1) and x(2) and x(3)) xor (x(1) and x(4)) xor (x(1) and 
x(2) and x(4)) xor (x(3) and x(4)) xor (x(1) and x(2) and 
x(3) and x(4)) xor (x(1) and x(2) and x(5)) xor (x(3) and 
x(5)) xor (x(1) and x(3) and x(5)) xor (x(2) and x(3) and 
x(5)) xor (x(4) and x(5)) xor (x(2) and x(4) and x(5)) xor 
(x(1) and x(2) and x(4) and x(5)) xor (x(2) and x(3) and 
x(4) and x(5)), 
where  11 SS  ,  22 SS  , and  33 SS  . 
D. The fourth step: modeling and synthesizing of the resulting 
polynomials 
This step dedicated to the modeling and synthesis of the 
VHDL polynomial have been got on the previous step. 
Synthesis is produced in with a computer-aided-design system 
(e.g. ISE Xilinx or LeonardoSpectrum).  
V. DISCUSSION AND HARDWARE REALIZATION  
 This section provides results of comparison of area and the 
speed of processing between proposed and known approaches. 
The comparison produced for  189 ,...,, xxxX  and 7P . 
 Modeling and synthesis is performed in ISE 13.1 and in 
LeonardoSpectrum2010a_7. The best results in the speed 
processing (in ns) and in the area (in LUTs) from ISE and 
Leonardo were chosen, and they are depictured in two tables. 
 Table 1 includes results of the synthesis of 
– Pipelining approach [1] – PA. It is suitable for an 
arbitrary value of the modulo P; 
– Iterative approach [2,3] – IA. It is suitable for 12  nP ; 
– Polynomial expansion approach (proposed approach) – 
PEA. It is suitable for an arbitrary value of the modulo P; 
– Polynomial expansion approach (proposed approach) 
after BDD-optimization – PEA (BDD). The optimization 
[10] of number of terms for the proposed approach was 
applied. This optimization based on BDD-optimization. It 
is suitable for an arbitrary value of the modulo P. 
 The synthesis was performed on 
 – FPGA Xilinx Spartan 3 XC3S1000 FG456 – Spartan_3; 
 – FPGA Xilinx Virtex 7 XC7V285t 3FFG1157 – Virtex_7; 
– ASIC of the library POWER [11], witch is used  for 
design of ASIC circuits on hi-tech factory Integral (Minsk, 
Belarus) – POWER, where UNIT is an elementary measure 
of area. 
 The best indices are highlighted with bold. 
 
TABLE 3.   COMPARISON OF RESULTS OF THE SYNTHESIS 
 
FPGA ASIC 
Spartan_3 Virtex_7 POWER 
PA 
T, ns 35,078 13,828 35,67 
LUTs / 
UNITs 
73 34 88 393 
LA 
T, ns 12,033 6,599 9,7 
LUTs / 
UNITs 
14 10 23 313 
PEA 
T, ns 12,585 6,45 14,96 
LUTs / 
UNITs 
68 33 307 815 
PEA 
(BDD) 
T, ns 11,532 7,547 6,91 
LUTs / 
UNITs 
58 31 37 124 
 
Table 2 demonstrates speed characteristics (in ns) and in 
area units for FPGAs (in LUTs) for some prime modules. The 
synthesis performed for 10-bits range X and for 3-, 4, and 5-
bits modulo P. As well the Table 2 consists best results of the 
synthesis in ISE 13.1 and in LeonardoSpectrum2010a_7 of the 
proposed polynomial expansion approach – PEA. 
TABLE 3.   RESULTS OF THE SYNTHESIS OF X (MOD P) FOR X [10:1] 
 P 7 11 13 17 19 23 29 31 
S
p
ar
ta
n
 3
 time, 
ns 
12,7
25 
14,3
08 
14,7
53 
12,8
47 
19,7
08 
16,2
07 
16,7
74 
13,2
32 
LUT
s 
92 179 171 140 245 255 303 98 
V
ir
te
x
 7
 
time, 
ns 
8,28 
9,39
7 
9,42
9 
9,43 
9,26
6 
9,52
2 
9,82
7 
9,12
2 
LUT
s 
49 81 105 114 126 136 161 100 
 
Table 3 contains an example of the synthesis for P=691 
and it is a prime number. The synthesis executed using the 
proposed approach and with the conditions as for the above 
examples. 
TABLE 3.   RESULTS OF THE SYNTHESIS OF X (MOD 691) FOR X [11:1] 
 P 691 
Spartan 3 
time, 
ns 
13,912 
LUTs 128 
Virtex 7 
time, 
ns 
8,789 
LUTs 111 
VI. CONCLUSIONS AND FURTHER WORK 
There are two main advantages of the proposed approach 
for X (mod P) realization: flexibility of P, because P can be an 
arbitrary number, and no memory hardware realization, 
because it is used only XOR and AND operators. 
The Table 1 provides the results of comparison for the case 
123P . As we see, the proposed technique has the speed 
processing advantage over the known realizations. 
Theoretically the proposed approach of hardware 
realization of X (mod P) goals to calculate the operation for an 
arbitrary X and P. But the bottleneck is in the realization X 
(mod P) for a big range of  X, e.g. if X [30:1] and P=7 the 
realization process takes more than 24 hours. The process of 
the synthesis on FPGA takes most of this time. 
In this way, the proposed approach has two directions for 
improving: area optimization and expanding the range of P. 
Primarily further work will be directed to getting as short 
as possible polynomials and, as a consequence, reducing of the 
hardware complexity of the scheme of converter. The progress 
could be achieved at the expense of expanding of the range of 
X. 
REFERENCES 
[1] J.T. Butler, T. Sasao, “Fast hardware computation of x mod z”, Proc. of 
the 25th IEEE International Parallel and Distributed Processing 
Symposium, Anchorage, pp. 289-292, AK, May, 2011. 
[2] N.I. Cherviakov, P.A. Sahnuk, A.V. Shaposhnikov, S.A. Riadnov, 
“Modular parallel computing systems of microprocessors systems”, 
Moscow, Fizmatlit, 2003, (in Russian) 
[3] A. Omondi and B. Premkumar, “Residue number systems: Theory and 
implementation”, Imperial College Press, Singapore, 2007. 
[4] K. Navi., A.S. Molahosseini, and M. Esmaeildoust, “How to Teach 
Residue Number System to Computer Scientists and Engineers”, IEEE 
Transactions on Education, V. 54, № 1, pp. 156–163, February, 2011. 
[5] A.A. Zarandi, A.S. Molahosseini, and M. Hosseinzadeh, “Modern 
Residue Number System Moduli Sets: Efficiency vs. Complexity”, 
Neurocomputers,  No. 9, pp. 7–12, 2014. 
[6] A. Zakrevskij, Yu. Pottosin, and L. Cheremisinova, “Optimization in 
Boolean Space”, TUT Press, Tallinn, 2008. 
[7] T. Sasao and J.T. Butler, “The eigenfunction of the Reed-Muller 
transformation”, Proc. of the Reed-Muller Workshop, Nowrway, May, 
2007. 
[8] Danila A. Gorodecky, “Combinatorial Method of Polynomial Expansion 
of Symmetric Boolean Functions”, Proc. of the 11th International 
Workshop on Boolean Problems, Germany, pp. 211-218, September, 
2014. 
[9] A. Granville, “Arithmetic Properties of Binomial Coefficients I: 
Binomial coefficients modulo prime powers”, Canadian Mathematical 
Society Conf. Proc., Vol. 20, pp. 253-275, 1997. 
[10] P.N. Bibilo and U.I. Romanov, “Logical design of discrete circuits with 
representation of production-frame model”, Minsk: Belarusskaia 
navuka, 2011.  
[11] P.N. Bibilo and N.A. Kirienko, “Evaluation of energy consumption of 
CMOS circuits by switching their activity”, Mikroelektronika, No. 1, pp. 
65-77, 2012. 
 
