Abstract-This work presents a simple study of stochastic arithmetic complex number operators for ad dition and multiplication. Their usage is demonstrated by design of a sum of product circuit. As the stochastic complex number operators need more control random streams than stochastic rational number operators, we optimized the number of random generators used in the real circuit. In the end our sum of product circuit contains two LFSRs thus we analyzed the impact of the choice of the seeds for the LFSRs on the quality of the calculated results. By using exhaustive search over the LFSR state space we were able to reduce the output RMSE by 34 % in comparison to choice of the equally spaced seeds over the LFSR state space.
I. INT RODUCTION
Stochastic arithmetic circuits work with numbers represented by streams of random bits over which all the computations are performed. In such circuits number is represented as the mean value over a stochastic single bit data stream instead of its binary representation as with conventional parallel arithmetic. This brings the following advantages: low-cost implementation in terms of occupied silicon area. Logic circuits calculating arithmetic operations are simplified up to single gates performing bit-wise logic operations over the stochastic bit streams [1] . tolerance to soft errors as a single bit flip in the stochastic stream does not ruin the calculations; an occasional bit flip is not statistically significant in the long stream of bits [2] . reduced design and verification effort and more reusable hardware compared with parallel arithmetic datapaths [2] . This is because of the implementation of the operators nearly does not depend on the precision of the processed numbers and of the simplicity of arithmetic operators design. Further, accuracy can be traded off with computation time [1] .
On the other hand, this concept has also some disadvantages. The most important one is the low bandwidth: increase in precision by one bit requires exponential increase in the processing time [1] .
Thanks to its advantages, stochastic arithmetic is an interesting concept. Indeed, there are many publica tions available presenting lots of applications and vari ous building blocks -stochastic number generators [1] , [3] , arithmetic operators for rational numbers [1] , [4] , digital filtering [5] , [6] , more complicated functions [7] , [8] , [ 9] , [10] , and others.
The objective of this work is to present design of basic complex number stochastic arithmetic operators. A demonstrational design calculating sum of products is then implemented as Matlab model as well as in VHDL on the RTL level on Spartan6 FPGA platform. Since the final implementation contains two LFSR based random number generators, we also analyzed the influence of the choice of the generator seed on the precision of the calculated results. By using exhaustive search over the LFSR state space we were able to reduce the output RMSE by 34% in comparison to choice of the equally spaced seeds over the LFSR state space.
II. METHODS

A. Complex Number Representation
A complex number z = �(z) + i�(z), where �(z), �(z) E ( -V; V) is represented as stochastic complex number (SCN) by two stochastic bit streams (SBS) carried by two lines Wr (real part) and Wi (imaginary part). The real and imaginary part of the number are then represented by the probabilities Pr and Pi that the respective line is at logic 1. Each line is encoded using single-line bipolar representation (form
we denote an SBS where probability of logic 1 is equal to p, �(z) and �(z) denote also stochastic number representing real and imaginary part of z.
B. Stochastic Complex Number Generation
The device used to generate stochastic numbers is called the Stochastic Number Generator (SNG), see 
C. Summer and Multiplier
For the rational numbers a stochastic weighted sum mer operator is presented in [1] . This operator can be easily extended to complex number case by doubling it for real and imaginary parts, see Fig. 2a . The summer !;S(Zl)
SNR(O.
calculates Zs = 0.5Zl + 0.5z 2 . The range of the input number is (-V; + V), then after summing the range of the output is (-2V; + 2V) so we need to multiply by 0.5 to prevent overflow and thus need here two SBS:
Complex number multiplication Zm = 0.5ZlZ2 is performed using the well-known relationships
(1)
Complex multiplier is built using two rational number summers and four multipliers presented in [1] , see while the rational number stochastic multiplier does not need any. In the summer as well as the multiplier, the multiplexer controlling streams shall be mutually independent of the stochastic streams at the inputs of the summing multiplexers [11] .
D. Parallel Complex Number Decoding
For conversion of the CSN back to the binary format, two ADaptive DIgital Elements ADDlEs [1] in parallel (one for real, one for imaginary part) can be used. One N bit RNG is needed for both ADDlEs for their operation.
E. Test Circuit And Number Of RNGs
A simple circuit calculating complex sum of products (see Figure 3 ), While the implementation of the complex arithmetic operators is simple, they require more stochastic streams than the corresponding stochastic rational number processing. The naive implementation in Fig.  3 would need 31 RNGs, see Table I . This would need a lot of silicon area in the final implementation. To reduce the numbers of the RNGs we applied transformations based on Theorem 1 in [11] [11] and share one RNG among all CSNGs, see Fig. 4 . Circular shift by 4 bits is used since it gives the smallest correlation between generated stochastic numbers [11] . This way we would need only 1 RNG for all the CSNGs, see the multiplexers in the stochastic arithmetic operator. These bit streams shall not be correlated with the data inputs of the multiplexers in the operators, [11] . Here we can 1) use the same SBS to drive both SN1(a) and SNR(a) in Fig. 2 , to need only 7 SBS, Table   I , mux R1 shared. 2) use one SBS for all the operators in dashed box 1, see Fig. 3 , then a different SBS for all operators in box 2, and one more SBS for the adder in box 3, see Table I , mux levels. This will reduce number of necessary SBS to 3. 3) derive all the three SBS driving boxes 1, 2, and 3 from LFSRI by selecting e.g. bits, 0, 3, and 6 of its 8 bit output; the second LFSR2 will drive all the x CSNGs and after circular shift by 4 bits all the y CSNGs. Both 8-bit LFSRs will be the same with different seeds and we would need only 2 RNGs, see Table I , mux csng. 4) derive all three streams directly from the main LFSR driving the CSNGs, as in [11] to need only one RNG at all, see line all shared in Table I . MUXes will be controlled by inverted bits 0 (box 1), 3 (box 2), and 5 (box 3) of the LFSR output. This configuration is the same as in [11] for the edge detection circuit.
Third, we will need one more RNG for the output ADDlEs. Here we can easily use the RNG driving the CSNGs, see Table I , addie shared.
III. RESULTS AND DI SCUSSION
The architectural options were evaluated using the RMSE measure calculated as
where Z calcn is the stochastic circuit output in run n, Zi deal n is the ideal output expected from the circuit, and Nr is the number of runs with different input x and y data. The lower the RM S E value is, the better.
A. Number of RNGs
First, we had to evaluate all the options to reduce the number of RNGs needed by the circuit. To do this, we ran the following experiments. Sharing multiplexer control Two experiments were ran to evaluate impact on the shared MUX control, see Table II , mux RI/ shared and mux levels. No degrada tion of the RMSE is observed, here. Sharing of random streams between real and imaginary multiplexers in the datapath does not influence overall precision of the circuit.
Sharing RNGs fo r the CSNG Experiment y shifted was ran to check how the sharing of the RNGs be tween the CSNGs influences result of the stochastic computation. While 8 bit LFSR was used as RNG for all the CSNGs (either with or without circular shift), multiplexers were driven by random streams generated by Matlab rand function to be able to directly compare results of this experiment and the naive one. The achieved RMSE (see Table II ) is better than for the naive experiment as we use LFSR instead of pseudorandom numbers.
Sharing LFSR between MUX and CSNGs Possi bility of sharing the LFSR between the MUXes and CSNGs was evaluated, see Table II , mux csng and all shared. All RNGs in the Matlab model are here already implemented as LFSRs, LFSRI has seed of 10000000, LFSR2 of 111 10000. For option mux csng better performance was achieved than with all shared option. RMSE of all shared option is likely hampered by the correlation between the MUX control stream and MUX input data, a corollary of Theorem 1 in [1 1]. Due to this we chose option mux csng with two LFSRs.
Finally, we ran a simulation of the circuit with ADDIE driven by LFSR used to drive all the CSNGs, see Table II , addie shared. No significant degradation of performance was observed. . RMSE dependence on LFSR2 seed plotted over distance computed from LFSR state sequence (e.g., 50 at x axis means that the RMSE corresponds to the seed of LFSR2 which is the 50th state of the LFSRI from its reset to 10000000).
1.8 1.6 �""" "" "" "" "" "": The only way how to decorrelate their outputs is by using different seeds. Usage of the seeds to decorrelate outputs of LFSRs is a widely known technique (e.g., [6] , [9] , [8] ) and the usual approach is (e.g., [8] 
IV. CONCLUSIONS
We presented design of complex number stochastic operators as a simple extension of already used ra tional number operators. Complex number-processing circuits require more RNGs than rational number ones due to need to add in the multiplier and process real and imaginary channels, we thus reduced the number of RNGs using Theorem 1 from the [11] . After optimizations solution utilizing 2 LFSRs was chosen since it achieves three times better performance than circuit with only one LFSR (RMSE of 0.500 vs 1.535).
An analysis aimed to find the best value of the seed for the LFSR2 by exhaustive search was done.
The dependency of the output RMSE on the distance between seeds of LFSRl and LFSR2 in terms ofLFSR sequence is not monotonic; by choosing the seed for the LFSR2 as the global minimum of the RMSE we were able to further reduce the output RMSE from 0.473 to 0.312 (by 34%). Exhaustive search approach also outperformed commonly used choice of equally spaced seeds by 36% (RMSE of 0.49 vs the final one of 0.312).
Finally, the testing circuit was implemented in VHDL language on the RTL level according to Fig. 4 and verified against its Matlab model. The design was implemented into xc6slx25-3 Xilinx Spartan 6 FPGA occupying 160 flip-flops, 160 LUTs, with operating frequency of 133 MHz (no pipelining was applied).
