Abstract-This article provides a novel technique of X (mod P) realization. It is based on the Reed-Muller polynomial expansion. The advantage of the approach concludes in the capability to realize X (mod P) for an arbitrary P. The approach is competitive with the known realizations on the speed processing. Advantages and results of comparison with the known approaches for X [9:1] and P=7 is demonstrated.
I. INTRODUCTION
The realization of the X (mod P) operation occupies a central place in cryptography; an efficiency of its realization in the residue number system (RNS) defines whether RNS will find wide implementation in practice or not.
There are two ways for hardware realization of X (mod P).
The pipelining realization is implemented for transformation of a data flow to a sequence of residues. An example of a fast pipelining realization in cryptography has been proposed in [1] . Another way is based on the iteration process. An iteration produces the bits in decreasing significance with later iterations producing bits of lower significance. Variations of these techniques referring to RNS have been proposed in [2, 3] . The main goal of the realization X (mod P) is to achieve high speed processing.
The first way is suitable for an arbitrary value of P, but the speed of the approach is limited by a block of pipeline, which includes three kind of successive operations (comparison, multiplexing, and subtraction). The second way is efficient just for some types of P ( 3 2 , 1 2 , 2   n n n [2, 3, 4] and some variations of them [5] , e.g. ). The article proposes an approach for X (mod P) realization which is suitable and efficient for an arbitrary value of P, but it competitive with an iterative procedure. 
Result of the calculation of X (mod P
)
II. X (MOD P) REALIZATION BY USING BOOLEAN FUNCTIONS
The idea of the approach is to consider a result of X (mod P)=S as the system of  Boolean functions. Let's define
, and
For any other case For example, a system of functions for the case X (mod 5)=S, where
, takes the following form: ,  ,  ,  ,   ;  ,  ,  ,  ,   ;  ,  ,  ,  ,   1  2  3  4  5  3  3   1  2  3  4  5  2  2   1  2  3  4  5 
There is a one to one correspondence between the set of the truth numbers   The polynomial expansion of the function is the most efficient representation than others normal forms of Boolean functions for some criteria, e.g. because of a smaller number of terms and units in a circuit (in some cases is much smaller) [6] .
As 1 from the truth vector corresponds to the term from FDNF of the function i S , as well as 1 from the Zhegalkin spectrum (or Reed-Muller spectrum [7] ) corresponds to the term from the Zhegalkin (Reed-Muller) expansion. This expansion is referred as   i S r
. And the truth vector should be transformed to the Zhegalkin spectrum. This task may be solved with the number of methods, and to demonstrate the procedure of transformation we will use the combinatorial method [8] 
where It is helpful to use a consequence of the Lucas theorem [9] 
 each bit of a is no more than the same bit of n .
Let's demonstrate the implementation of theorems on the transformation of In the result
As the  q th unity of
correspond to the  q th term of the Zhegalin polynomial of the function
The same procedure is used to generate expansions for The proposed approach is realized by four software blocks: Python  Java  Python  VHDL. The scheme of the software realization in step-by-step manner is pictured at the 
In the result   (1) xor (x(1) and x(3)) xor (x(2) and x(3)) xor (x(1)and x(2) and x(3)) xor x(4) xor (x(2) and x(4)) xor (x(3) and x(4)) xor (x(1) and x(3) and x(4)) xor x(5) xor (x(3) and x(5)) xor (x(1) and x(3) and x(5) ) xor (x(2) and x(3) and x(5)) xor (x(1) and x(2) and x(3) and x(5)) xor (x(1) and x(4) and x(5) ) xor (x(1) and x(2) and x(4) and x(5)) xor (x(3) and x(4) and x(5)) xor (x(1) and x(2) and x(3) and x(4) and x(5)); S(2) <= x(2) xor (x(2) and x(3)) xor (x(1) and x(2) and x(3)) xor x(4) xor (x(1) and x(4)) xor (x(1) and x(2) and x(4) ) xor (x(1) and x(3) and x(4) ) xor (x(2) and x(3) and x(4)) xor (x(1) and x(5)) xor (x(1) and x(3) and x(5)) xor (x(2) and x(3) and x(5)) xor (x(1) and x(2) and x(3) and x(5)) xor (x(4) and x(5)) xor (x(2) and x(4) and x(5)) xor (x(3) and x(4) and x(5)) xor (x(1) and x(3) and x(4) and x(5)); S(3) <= x(3) xor (x(1) and x(3)) xor (x(2) and x(3)) xor (x(1) and x(2) and x(3)) xor (x(1) and x(4)) xor (x(1) and x(2) and x(4)) xor (x(3) and x(4)) xor (x(1) and x(2) and x(3) and x(4)) xor (x(1) and x(2) and x(5)) xor (x(3) and x(5)) xor (x(1) and x(3) and x(5)) xor (x(2) and x(3) and x(5)) xor (x(4) and x(5)) xor (x(2) and x(4) and x(5)) xor (x(1) and x(2) and x(4) and x(5)) xor (x(2) and x(3) and x(4) and x (5) ,..., , x x x X  and 7  P . Modeling and synthesis is performed in ISE 13.1 and in LeonardoSpectrum2010a_7. The best results in the speed processing (in ns) and in the area (in LUTs) from ISE and Leonardo were chosen, and they are depictured in two tables. -Polynomial expansion approach (proposed approach) after BDD-optimization -PEA (BDD). The optimization [10] of number of terms for the proposed approach was applied. This optimization based on BDD-optimization. It is suitable for an arbitrary value of the modulo P.
The synthesis was performed on -FPGA Xilinx Spartan 3 XC3S1000 FG456 -Spartan_3;
-ASIC of the library POWER [11] , witch is used for design of ASIC circuits on hi-tech factory Integral (Minsk, Belarus) -POWER, where UNIT is an elementary measure of area.
The best indices are highlighted with bold. Table 2 demonstrates speed characteristics (in ns) and in area units for FPGAs (in LUTs) for some prime modules. The synthesis performed for 10-bits range X and for 3-, 4, and 5-bits modulo P. As well the Table 2 consists best results of the synthesis in ISE 13.1 and in LeonardoSpectrum2010a_7 of the proposed polynomial expansion approach -PEA. Table 3 contains an example of the synthesis for P=691 and it is a prime number. The synthesis executed using the proposed approach and with the conditions as for the above examples. There are two main advantages of the proposed approach for X (mod P) realization: flexibility of P, because P can be an arbitrary number, and no memory hardware realization, because it is used only XOR and AND operators.
The Table 1 provides the results of comparison for the case  1  2 3   P . As we see, the proposed technique has the speed processing advantage over the known realizations.
Theoretically the proposed approach of hardware realization of X (mod P) goals to calculate the operation for an arbitrary X and P. But the bottleneck is in the realization X (mod P) for a big range of X, e.g. if X [30 :1] and P=7 the realization process takes more than 24 hours. The process of the synthesis on FPGA takes most of this time.
In this way, the proposed approach has two directions for improving: area optimization and expanding the range of P.
Primarily further work will be directed to getting as short as possible polynomials and, as a consequence, reducing of the hardware complexity of the scheme of converter. The progress could be achieved at the expense of expanding of the range of X.
