Quantum Key Distribution is the process of using quantum communication to establish a shared key between two parties. It has been demonstrated the unconditional security and effective communication of quantum communication system can be guaranteed by an excellent Gaussian random number generator with high speed and an extended random period. In this paper, we propose to construct the Gaussian random number generator by using Field-Programmable Gate Array (FPGA) which is able to process large data in high speed.
of generated GRNs as well compare these three algorithms. In Section 7, we give the discussion about our design of GRNs generator to be implemented in a continuous-variable QCK system. In Section 8, we give our conclusions.
| APPLICATION OF GAUSSIAN RANDOM NUMBER IN QUANTUM KEY DIS-TRIBUTION SYSTEM
In the field of quantum key distribution, randomness is an essential requirement. Even if the communication channel is eavesdropped by others, communication on this channel is still safe with the randomness. Here we introduce a coherent-state QKD protocol, whose security relies on the distribution of a Gaussian key obtained by continuously modulating the phase and amplitude of GRNs [8] Alice's (emitter) side, and subsequently detected at Bob's (receiver) side [12] .
The protocol runs as follows [20] , Alice prepares displaced coherent states with quadrature components q and p that are realizations of two independent and identically distributed random variables Q and P . The random variables Q and P obey the same zero-centered normal distribution:
where V is referred to as variance. The displaced coherent states |α 1 , ..., |α j , ..., |α n are expressed as:
The coherent states obey the usual eigenvalue equation:
whereq andq are the quadrature operators, defined in the framework of shot-noise units [22] . After preparation of each coherent state, Alice transmits |α j to Bob through a Gaussian quantum channel. Bob uses heterodyne detection to measure the eigenvalue of either one or both of the quadrature operators. In the last step, Bob sends the correct data to Alice and then Alice corrects her own message which have the same values as Bob.
As what we have illustrated above, to get useful key elements, at the first step Gaussian random numbers are needed to modulate the amplitude and phase information. On the other hand, the noise that the Gaussian random source introduces to the transmission system degrades the security. Thus, a high-quality Gaussian random number generator plays a significant role in the QKD system. Secondly, the high transmission speed is another advantage of quantum communication over a classical communication system. To improve the transmission speed further, a faster GRNs generator is required.
In what follows, we proposed a Gaussian random source with high output speed and low quantizing noise to efficiently generate a secure sequence of quantum key.
| ALGORITHM FOR GENERATING GAUSSIAN RANDOM NUMBER

| Box-Muller Algorithm
The Box-Muller transform proposed by Box and Muller [5] , is one of the precise algorithms for getting the Gaussian random number. The Box-Muller Algorithm is based on a property of a two-dimensional Cartesian system, assuming X and Y coordinates are described by two independent and normally distributed random variables (i.e. f X (
2δ 2 ). If transform X and Y to the corresponding polar coordinates variables r 2 and θ, the random variables r 2 and θ are also independent and can be expressed as:
where R obeys the Rayleigh distribution and Θ obeys the uniform distribution, so their joint probability density is f R Θ (r , θ) = f R (r )×f Θ (θ) which is also statistically independent. Then the corresponding distribution functions of R and Θ are:
Fortunately, the distribution functions F R (r ) F Θ (θ) is in closed form. Hence the Gaussian random variables can be generated by the inverse transformation method [33] . Since the F R (r ) ∈ [0, 1], as well as F Θ (θ), actually the Gaussian random variables X and Y in Cartesian coordinates can be obtained through a transformation of two sets of uniformly and independently distributed random numbers.
In practice, the Box-Muller algorithm samples two uniform distribution on the interval (0, 1) and then mapping them to two standards, Gaussian distributed samples with zero expectation and unit variance. The algorithm is implemented as follow:
1. Generate a pair of uniformly and independently distributed random numbers between the interval (0,1), denoted as U 1 and U 2 respectively.
2.
Mapping the random point to the Cartesian coordinate axis through the transformation:
where α(U 1 , U 2 ) and β (U 1 , U 2 ) are the random numbers following Gaussian distribution respectively.
The amplitude of the random numbers α(U 1 , U 2 ) and β (U 1 , U 2 ) depends on the uniform random number. Their phases equal to the product of U 1 , U 2 , and the constant 2π .
| Polarization Decision Algorithm
The polarization decision algorithm method proposed by Bell [2] is also a precise approach to obtain the two-dimensional Gaussian distribution. The polar algorithm is related to the Box-Muller transform but is superior to it.
Theoretically, we consider two independent and normally distributed random variables X and Y in Cartesian coor-
2δ 2 ). Then the probability density function are
Thus the square of F transformed to polar coordinate is:
Similarly to Box-Muller method, the transformation to polar coordinates makes θ is uniformly distributed from 0 to 2π. The normalized distribution function of radial distance r is:
The uniform random number U is also used here. Since U is uniformly distributed in the interval (0,1), then the point (cos(2πU ), si n(2πU )) is uniformly distributed on the unit circumference x 2 + y 2 = 1. A new point is generated by multiplying that point by radial distance r : (r cos(2πU ), r si n(2πU )). Finally, by the inverse transform, one obtains two jointly distributed two variables which are independent standard normal random variables.
In practical, the polar method is achieved by the rejection approach. Assuming y = f (x ) is a function with finite integral, C is a set of points (x, y ), and Z is a superset of C . Then from set Z , random points (x, y ) are uniformly selected until point (x, y ) falls into the range of C . The selected point (x, y ) is returned as the random number [18] . The set C here is set as a unit cycle:
Then a pair of normal random variables is obtained as [33] :
| Central Limit Algorithm
The central limit algorithm is based on the central limit theorem which states that when a sufficiently large number of samples drawn from independent random variables (i.e., uniform distributions ), the arithmetic mean of their distributions will have a normal distribution. Thus, the central limit algorithm is an extremely efficient method in GRNs generation, since it simply samples sufficient amount of identical and independent uniform distributions.
More formally, assume there are n independent and identically distributed uniform numbers U i ∼ U (0, 1). Then we can approximate results of the sum of U i as: S = n i =1 U i . The cumulative distribution function of S can be approximated as:
where Φ represents the cumulative distribution function of a Gaussian distribution. For a uniform random variable √ n the distribution function of z is Gaussian distribution:
After normalization, a standard Gaussian distribution is obtained. Central limit algorithm can be used to transform uniform random numbers to Gaussian with a very low hard-ware cost. However, the error in tail regions of is inversely
proportional to the number of U i ∼ U (0, 1) to be added. This makes the GRNs produced by central limit algorithm is not highly accurate in tail region [33] .
| HARDWARE ARCHITECTURE OF GRN AND URN ALGORITHM
| Uniform Random Number Generator
F I G U R E 1 General architecture of multi-return shift register generator. c 0 , c 1 ...c n−1 , c n are the feedback coefficients and a 1 , a 2 , a 3 ...a n are the output values.
Since uniform random numbers (URNs) U (0, 1) is essential for all of the three algorithms introduced in Sec.3, an efficient and robust URNs generator is indispensable in the whole system. Here we choose to use Multi-return Shift
Register Generator (MSRG) [41] in our design.
Multi-return Shift Register Generator is one type of Linear Feedback Shift Register (LFSR) [38] which is one of the most effective and simple ways to get uniform random number. The basic architecture of MSRG is shown in Fig.1 . The MSRG is composed of a shift register and a feedback function, which can be represented as a polynomial of variable x referred to as the characteristic polynomial:
Where c 1 , c 2 , c 3 ...c n are the feedback coefficients. The feedback coefficients are selected by the multiplexer which is controlled by the control signal. When the multiplexer selects the output signal a i directly instead of the 'XOR' function, the corresponding coefficient c i would be regarded as 0 in characteristic polynomial. The input bit is given from a linear function of the initial status and the next state of an MSRG is uniquely determined from the previous one by the feedback network. The initial value of the register is called seed and the sequence produced is completely determined by the initial status [26] . Because the register has a finite number of possible statuses, after a period the sequence will be repeated. The period of MSRG with order n is no more than 2 n − 1. Only if the feedback coefficients are properly chosen, the output sequence is the m sequence which has the longest period 2 n − 1 [40] . In this paper, the primitive polynomial we choose is:
After getting m sequence with period of (2 32 − 1), the uniform random number is obtained by dividing the m sequence by (2 n −1) using divider. The uniform random number generator which is the essential part of Gaussian random number generator has been obtained in Sec4. Then we show the hardware architecture design for implementing Box-Muller, polarization decision and central limit algorithm. The new variable S = n i =1 U i is obtained by a adder and the square root is calculated by module 'SQRT' (see Fig.4 ). The division function is achieved by modules 'DIV' . Tow external multipliers and one adder are used in the design. As a result, one set of Gaussian random numbers is obtained, i.e., α.
| Hardware Architecture Deign for Box-Muller Algorithm
| Hardware Architecture Deign for Polarization Decision Algorithm
| Hardware Architecture Deign for Central Limit Algorithm
| HARDWARE REALIZATION OF ALGORITHM FUNCTIONS
| Field-Programmable Gate Array
The Field-Programmable Gate Array (FPGA), which integrates programmable logic blocks, soft-core or hardcore processors, has become more and more common as a core technology used to build electronic systems. In most FPGAs, logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory. The FPGA configuration is generally specified using a hardware description language, like what we use in this work: Verilog HDL.
The main and the most significant difference between the micro-controller and the FPGA is that FPGA does not 1 The output is always large than zero when U 1 , U 2 ∈ (0, 1). To get the bilateral Gaussian distribution, two addition sets of uniform random number U 3 , U 4 ∈ (0, 1) are indispensable. The key point here is to change the sign bit of U 3 , U 4 to be negative after converted into floating point number, and then combine the results from U 1 , U 2 , U 3 , U 4 .
F I G U R E 4
The integral structural design diagram of Gaussian random generator, using central limit algorithm. U 1 , U 2 ...U n are the uniform random numbers generated by MSRG. n is the number of uniform random numbers sets. α is the output Gaussian random numbers. Sing-precision floating point numbers are used in the design.
have a fixed hardware structure. On the contrary, FPGA is programmable according to user applications. However, processors have a fixed hardware structure, which means that all the transistors memory, peripheral structures, and the connections are constant. Which the processor predefine the operations (addition, multiplication, I/O control, etc.), and then users make the processor sequentially do these operations by using a software.
Hardware structure in the FPGA is not fixed but defined by the user. Although logic cells are fixed in FPGA, functions they perform and the interconnections between them are determined by the user. So operations that FPGA can do are not predefined. Users can have the processes done according to the written HDL code "in parallel" which means simultaneously. The ability of parallel processing is one of the most critical features that separate FPGA from the processor and make it superior in many areas.
FPGA is generally more useful for routine control of particular circuits. For example, using FPGA for simple functions such as check the quantum key signals from communication. This process can be quickly done with many conventional micro-controllers (PIC series, etc.). However, a solution from FPGA is more reasonable, if users want to achieve a high-efficient communication.
Because QCK processing requires processing large data in high speed and make these types of applications are very suitable for FPGA that is capable of parallel processing. Since the user can determine the hardware structure of FPGA, FPGA can be programmed to process more extensive data with few clock cycle. Whereas, it is not possible to achieve this performance by the processor. Because data flow is limited by processor bus (16-bit, 32 bit, etc.) and the processing speed. As a result, for applications that require more performance such as intensive data processing FPGA has come to the fore for routine control operations. Nevertheless, micro-controllers can be embedded into the FPGA since they are logic circuits in fact. Thus it possible to define and use processor and user-specific hardware functions on only one chip by using FPGA. This solution shows the possibility to control the hardware because of its high flexibility. Users can modify and update whole design (FPGA on the processor and other logic circuits) by only changing the code on FPGA, without any change on circuit board layout. In this way, users can add different functions, improve performance and make your design resistant to time without having to redesign the cards.
According to the Gaussian random number generation algorithm described above, FPGA chip Altera Cyclone IV E EP4CE115F29I8L is chosen to achieve our design. 528 I/O ports, 114,480 logic elements, and 7155 logic array blocks are embedded in this chip. It is able to achieve 200 MHz maximum operating frequency.
F I G U R E 5
The input and output signals of FPGA floating-point IP cores. LOG is ALTFP_LOG IP core. SIN/COS is ALTFP_SINCOS IP core. DIV is ALTFP_DIV IP core. SQRT is ALTFP_SQRT IP core.
| FPGA Floating-Point IP Cores
Intellectual property (IP) cores are standalone modules that can be used in any field programmable gate array and source codes are ported across various FPGA platforms. These are developed using HDL languages like VHDL, Verilog and System Verilog. In this work we use soft IP cores to implement our design. Soft IP cores are completely flexible and do not depend on vendor technology. Hence, the IPs can be modified according to users' typical application and easily integrated with other modules.
| Logarithm IP Core
The logarithm calculation is achieved by the ALTFP_LOG IP core which can compute the natural logarithm of singleprecision format numbers. Fig5 shows the input and output signals of the ALTFP_LOG IP core. The function of each port is defined as:
• clock: Clock input to the IP core;
• clk_en: Clock enable. When the clk_en port is asserted high, a natural logarithm operation takes place.
• aclr: Asynchronous clear. When the aclr port is asserted high, the function is asynchronously cleared.
• data[ ]:
Floating-point input data.
• result[ ]:
The natural logarithm of the value on input data.
• zero: Zero exception output. This occurs when the actual input value is 1.
• nan: NaN exception output. This occurs when the input is a negative number or NaN.
| Trigonometric IP Core
The trigonometric calculation is achieved by the ALTFP_SINCOS IP core which can perform trigonometric sine and cosine functions single-precision format numbers. Fig5 shows the input and output signals of the ALTFP_SINCOS IP core. The function of each port is defined as:
• clock: Clock input to the mega-function.;
• clk_en: Clock enable. When the clk_en port is asserted high, sine or cosine operation takes place.
• data[ ]:
Floating-point input data. 
| Division IP Core
The division is achieved by the ALTFP_DIV IP core which performs the floating-point division operation. Fig5 shows the input and output signals of the ALTFP_DIV IP core. The function of each port is defined as:
• clk_en: Clock enable to the floating-point divider. This port enables division.
• dataa[ ]: Numerator data input.
• datab[ ]: Denominator data input.
• result[ ]:
Divider output port. The division result.
• overflow: Overflow port for the divider. Asserted when the result of the division exceeds or reaches infinity.
• underflow: Underflow port for the divider. Asserted when the result of the division is zero even though neither of the inputs to the divider is zero, or when the result is a denormalized number.
• zero: Zero port for the divider. Asserted when the value of result[] is zero.
• nan: NaN port. Asserted when an invalid division occurs, such as infinity dividing infinity or zero dividing zero.
| Square Root Calculation IP Core
The square root calculation is achieved by the ALTFP_SQRT IP core. This IP core performs a square root calculation based on the input provided. Fig5 shows the input and output signals of the ALTFP_SQRT IP core. The function of each port is defined as:
• clk_en: Clock enable that allows square root operations when the port is asserted high.
• data[ ]:
• result[ ]: Square root output port for the floating-point result.
• zero: Zero port. Asserted when the value of the result[] port is 0.
• nan: NaN port. Asserted when an invalid square root occurs, such as negative numbers or NaN inputs.
• overflow: Overflow port. Asserted when the result of the square root exceeds or reaches infinity. Based on the design of three algorithms above, we achieve the function by using FPGA. Here we show the resource utilization of LUT, logic elements, and fan-out analysis for each design. Please note, we choose 12 sets of uniform random number as inputs to the central limit algorithm. Tab.2 shows the logic element usage by normal mode and arithmetic mode. The design of the central limit algorithm requires fewer arithmetic resources than the others because it does not involves with logarithm and trigonometric. Moreover, the Box-Muller algorithm still uses much more resources. Tab.3 gives the results of fan-out rest. The maximum fan-out and total fan-out of Box-muller algorithm is much large than the others. Polarization decision and central limit algorithm gives similar results of maximum fan-out and total fan-out values. However, the central limit algorithm shows a large average fan-out than polarization decision. 
| THE RESULT OF SIMULATION AND TESTING
| FPGA Resource Usage Summary
| Statistical Analysis
To verify our design of the three algorithms, we import the random number generated to MATLAB for statistical analysis and compare the results with the random number generated by the function 'randn' [28] MATLAB software. The number of sampled random number is 1,000,000 for all cases. Fig.6 shows the histogram of random numbers generated by Box-Muller, polarization decision, and central limit algorithm. It is evident that all sets of random numbers follow the Gaussian profile. Box-Muller and polarization decision method are generating two sets of the random number simultaneously, i.e., α set and β set. The quantity of random number generated by the polarization decision algorithm is less than the others since the two uniform random number U 1 and U 2 are rejected if U 2 1 + U 2 2 > 1. A more robust way to estimate the accuracy of Gaussian random numbers is the null hypothesis, which is used to determine what outcomes of a study would lead to a rejection of the null hypothesis for a pre-specified level of significance [27] . We show the results of three null hypothesis in the following, i.e., Chi-Square Goodness of fit test, Anderson-Darling test, and Kolmogorov-Smirnov test.
| The Null Hypothesis Test
The Chi-squared test is a statistical test, which is a null hypothesis stating that the frequency distribution of specific events observed in a sample is consistent with a particular theoretical distribution. The theoretical distribution here is the chi-squared distribution. This test suitable for unpaired data from large samples [19] . The significance level is set to be 5% here.
The results are presented in Tab.4. The central limit method is rejected by the null hypothesis. It indicates the approximation the Gaussian is particularly poor, especially in the tails [33] . To improve the accuracy of the central limit method, one has to increase the number of the uniform random number used for approximation(See Sec.4). However, large numbers of uniform random numbers, also, constitutes a computational challenge. Thus, the central limit theorem is not ideally used in contemporary GRNs. The Box-Muller method and polarization method both pass the Chi-squared test. Also, β set of Box-Muller shows a smaller P value than the one generated by MATLAB. It indicates the quality of GRNs generated by the Box-Muller algorithm in FPGA is higher than MATLAB software. Furthermore, the Anderson-Darling test and the Kolmogorov-Smirnov test are also applied here for estimation.
Anderson-Darling test is a statistical test to confirm whether a given sample of data is drawn from a given probability distribution [30] , while the Kolmogorov-Smirnov test is a nonparametric test of the equality of continuous probability distributions that can be used to compare a sample with a reference probability distribution [24] . When applied to ( 
| DISCUSSION
Two groups of the Gaussian random number generated through Box-Muller and polarization decision algorithm have been examined to be well-distributed with high quality. To apply our GRNs design to a QKD system, one addition electrical modulator is required. Through the modulator, the phase and magnitude signal which follow Gaussian profile is modulated. The signal is regarded as pseudo quantum states, which usually generated by expensive optical devices. In a QKD system, high speed and efficient communication between two parties are required. The ability of parallel processing of FPGA shows an advantage in this case. Since the user can determine the hardware structure of FPGA, FPGA can be programmed to process more extensive data with few clock cycle. The high-speed communication is achieved. To guarantee the security of communication, usually, a truly random signal is required. However, the GRNs generated FPGA can be regarded as true GRNs when its period is long enough. Because of the extensive and flexible resources of FPGA, the period of GRNs can be extremely long. Of course, the extended period required a better FPGA type which requires more investment.
| CONCLUSION
Quantum Key Distribution is the process of using quantum communication to establish a shared key between two parties. The absolute security and effective communication of quantum communication system can be guaranteed by a good Gaussian random number generator with high speed and a long random period. In this paper, we propose a possible scheme whose results are proved to be satisfactory and all of these works are the foundation of subsequent work. We conclude that:
1. The unfixed hardware structure of FPGA provides users the parallel processing solution and makes FPGA superior in many areas than the microprocessor. FPGA is an ideal solution for QCK processing which requires processing large data in high speed.
2. Among three conventional GRNs algorithms, the Gaussian random number generated through polarization decision algorithm shows higher quality than others.
3.
FPGA floating-point IP cores can be easily modified and integrated with other modules. They are appropriate choices to achieve the complex mathematical operation in hardware.
AC K N O W L E D G E M E N T
AL acknowledges the support of National Undergraduate Innovation Program 2016. We are grateful to Prof. Guochun Wan and Prof. Meisong Tong, in the Department of Electronics and Information Engineering, Tongji University. We are extremely thankful and indebted to them for sharing expertise, and sincere and valuable guidance and encouragement extended to us.
