We present the design of stochastic computing systems based on sequential logic to implement arbitrary polynomial functions. Stochastic computing is an emerging alternative computing paradigm that performs arithmetic operations on realvalued data represented as random bitstreams using digital logic gates. Stochastic computing systems are capable of realizing complex mathematical operations using a small number of hardware resources by expressing the computation in terms of probabilities. Moreover, the stochastic representation of data using random bitstreams is extremely robust against bit errors. We present a systematic approach to implement arbitrary polynomial functions in stochastic computing using sequential logic, and compare our approach against prior conventional and stochastic implementations.
INTRODUCTION
CMOS technology has been instrumental in the proliferation of digital computing with submicron CMOS transistors realizing complex computing systems operating at GHz clock frequencies. However, concerns about increasing power dissipation and declining reliability in recent nanometer scale CMOS technologies have prompted research into unconventional computing paradigms. Stochastic computing [2] is an emerging alternative to conventional computing that represents data as random bitstreams, and performs arithmetic operations using digital logic gates by expressing the computation in terms of probabilities. Stochastic computing systems, in general, have lower hardware requirements than conventional systems, and the random bitstream representation improves the fault tolerance of the computation. We present the design of stochastic computing systems based on sequential logic to implement arbitrary polynomial functions, and compare our proposed approach against prior conventional and stochastic implementations.
As previously mentioned, stochastic computing relies on random bitstreams to represent data. In particular, a real number x (0 ≤ x ≤ 1) is represented in a stochastic computPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. ing system as a sequence of random bits, where each bit in the sequence is logic '1' with a probability x. The random bitstream representation of numbers enables us to express a computation in terms of probabilities. As an example, consider a two-input AND gate driven by independent random bitstreams A and B representing real numbers a and b.
P rob(A = 1) = a, P rob(B = 1) = b
The real number c represented by the output random bitstream of the AND gate is given by,
=⇒ P rob(A = 1)P rob(B = 1) = ab
A two-input AND gate, therefore, performs multiplication in stochastic computing. Similarly, a 2 : 1 multiplexer performs scaled addition, as depicted in Figure 1 . The stochastic implementations of addition and multiplication, therefore, require far fewer hardware resources than a conventional implementation. The simple illustrations likewise extend to more complex arithmetic operations in image processing [8, 5] and digital communications [11] . The random bitstream representation of numbers, moreover, is extremely robust against bit errors. Consider an Mbit weighted binary number system, representing 2 M fixedpoint numbers, 0.b1 b2 . . . bM . An error of magnitude |1/2| results when the most significant bit b1 of a fixed-point number is corrupted by noise. In contrast, the random bitstream representation is unweighted. Specifically, a random bitstream of length L, [b1 b2 . . . bL], represents the number
bi/L. Therefore, a smaller error of magnitude 1/L results if a bit in the random bitstream is erroneous.
The high fault tolerance and low hardware overhead have fueled a growing interest in stochastic computing with an increasing number of successful stochastic implementations of artificial neural networks [4] , digital filters [6] , and image processing systems [8, 5] . We present the design of stochastic computing systems based on sequential logic to realize arbitrary polynomial functions.
PRIOR RESEARCH
Polynomials are an important class of functions that frequently arise in numerous scientific disciplines, and are often employed to approximate other complicated functions such as the trigonometric functions. Specifically, a polynomial function f (x) of degree n is an expression of the form,
where x is the known as the input variable and ai are the (n + 1) polynomial coefficients.
A previous stochastic implementation of polynomial functions was presented in [9] , where the authors described a method to realize arbitrary polynomial functions using purely combinational logic circuits. However, a serious drawback of their proposed architecture (Figure 2 ) is the requirement of multiple independent random bitstreams for the input variable x. Concretely, their architecture requires at least n independent random bitstreams representing the input variable x, in order to implement a polynomial function of degree n. The linear dependence of the input interface complexity on the polynomial degree presents a significant hardware overhead as multiple independent random bitstreams ought to be generated corresponding to the input variable x, thereby restricting the practical applicability of the approach to low degree polynomials.
In contrast, we propose sequential logic circuits to implement polynomial functions in stochastic computing. Specifically, we present finite state machines (FSMs) in the form of binary counters that realize arbitrary polynomial functions. Our proposed approach requires an FSM containing at least (n + 1) states, and a single random bitstream for the input variable x, in order to implement a polynomial function of degree n. Using a binary encoding for the FSM states, we notice that the apparent linear dependence of the FSM complexity on the polynomial degree only translates into a logarithmic dependence in hardware area. As an example, a 7 th degree polynomial function is realized by an 8-state FSM in our approach that only requires 3 flip flops. However, of greater significance is the massive simplification in the input interface, and the corresponding reduction in the auxiliary circuitry that is required to generate the multiple independent input random bitstreams in the purely combinational solution. To summarize, our proposed approach realizes an arbitrary polynomial function of degree n using a single random bitstream for the input variable x that drives an FSM containing at least (n + 1) states.
BERNSTEIN POLYNOMIALS
The theory of Bernstein polynomials [7] forms the bedrock of the stochastic implementations of polynomial functions. In particular, the Bernstein basis polynomials of degree n are a family of (n + 1) polynomials given by, Figure 2 : A stochastic implementation of a degree n polynomial using combinational logic [9] .
and a linear combination of the Bernstein basis polynomials, gn(x), is called a Bernstein polynomial of degree n.
where bi are known as the (n + 1) Bernstein coefficients.
In general, a power-form polynomial of degree n, as depicted in Equation (4), has an equivalent representation as a Bernstein polynomial of degree n, where the polynomial coefficients ai are related to the Bernstein coefficients bi as,
The Bernstein coefficients for an arbitrary polynomial, however, seldom lie inside the unit interval [0, 1] as required for a stochastic implementation. Fortunately, by elevating the degree of the Bernstein polynomial representation, we can express a power-form polynomial of degree n as a Bernstein polynomial of degree q (q ≥ n) with the Bernstein coefficients bi ∈ [0, 1]. In fact, due to a theorem [10] , a power-form polynomial f (x) of degree n can be converted into a Bernstein polynomial of degree q (q ≥ n) with the Bernstein coefficients inside the unit interval [0, 1] if,
The details of the conversion method are provided in [9] . Following a successful conversion of a power-form polynomial to a Bernstein polynomial, we proceed with the stochastic implementation of the Bernstein representation by, 
MARKOV CHAINS
Before describing our stochastic implementation of polynomial functions, we present a brief review of Markov chains. Markov chains are crucial to the understanding of sequential logic based stochastic computing systems because an FSM with random bitstream inputs is mathematically equivalent to a Markov chain.
A Markov chain is a random process that models probabilistic transitions between states on a state space. We restrict our attention to finite-state Markov chains since only finite memory systems are physically realizable.
Consider a finite-state Markov chain with a state space S containing (n + 1) states,
Let Xt denote the state of the Markov chain at time instant t. The next state of the Markov chain, Xt+1, depends only on Xt, and is independent of the past system history.
= P r(Xt+1 = j|Xt = i) = pi,j, ∀t, ∀i, j ∈ S
The probability that a Markov chain transitions from a state i to another state j in the state space is known as the transition probability, and is denoted as pi,j.
Contrary to the transition probabilities pi,j, ∀i, j ∈ S that capture the probabilistic transitions between the states, the stationary probabilities πi, ∀i ∈ S of a Markov chain determine the probability of observing the states at equilibrium. Concretely, starting from a random initial state, a Markov chain reaches equilibrium after making a number of state transitions, at which point the probability of observing the Markov chain in a particular state i becomes a constant πi, independent of the initial state.
The relationship between the stationary probabilities πi and the transition probabilities pi,j of a Markov chain, in general, is complicated. However, by restricting a Markov chain to a linear arrangement of states with only nearest neighbor transitions, as illustrated in Figure 3 , a powerful set of detailed balance conditions become available. The detailed balance conditions relate the stationary probabilities of neighboring states to the transition probabilities between the states. Specifically,
We notice that the stationary probabilities of neighboring states in a Reversible Markov chain only depend on the ratio of the transition probabilities between the states. To summarize, the stationary probabilities of a Reversible Markov chain are conveniently obtained by applying Equation 12 and the detailed balance conditions listed in Equation 14. We next describe a specific Reversible Markov chain construct where the stationary probabilities equal the Bernstein basis polynomials. 
THE STOCHASTIC INTEGRATOR
Consider the (n + 1)-state Reversible Markov chain depicted in Figure 4 . Using Equation 14, the stationary probabilities of the Reversible Markov chain are related to the transition probabilities as,
Let us select π0 as the reference stationary probability.
. . .
Invoking Equation 12,
and solving for the reference stationary probability π0,
Therefore, the stationary probabilities of the Reversible Figure 4 are given as,
Markov chain in
Comparing with Equation 5, we conclude that the stationary probabilities of the Reversible Markov chain in Figure 4 form the Bernstein basis polynomials of degree n.
A hardware realization of the Reversible Markov chain in Figure 4 is a stochastic integrator [3] with parameter n. As illustrated in Figure 5 , a stochastic integrator consists of an up/down counter, a binary comparator, and source of random numbers. The input to the stochastic integrator is a random bitstream X representing the input variable x.
The up/down counter of a stochastic integrator with parameter n stores binary numbers from 0 through n in unit integer increments, and saturates at the maximum or minimum count. A summary of the up/down counter operation is presented in Table 1 .
The binary comparator compares the present counter state with a random number sampled from a uniform distribution on the integers {1, 2, . . . , n}, and produces a logic '1' bit if the counter state equals or exceeds the random number. Thus, if the present counter state is i, 0 ≤ i ≤ n, the probability of the comparator output to be logic '1' is given by,
Consequently, the probability that the up/down counter increments to i + 1 at the next clock becomes,
Similarly, the probability of the counter state to decrement to i − 1 is given by,
where we assume that the random bitstreams X and Z are independent. Figure 4 , we ascertain that a stochastic integrator with parameter n implements the Bernstein basis polynomials of degree n. Specifically, the probability of observing counter state i at equilibrium equals the Bernstein basis polynomial Bi,n(x) (Equation 5).
HARDWARE ARCHITECTURE
As previously mentioned, a stochastic implementation of a Bernstein polynomial representation consists of,
generating the Bernstein basis polynomials (Equation 5),

forming a linear combination of the Bernstein basis polynomials with the Bernstein coefficients (Equation 6)
We have demonstrated in the previous section that a stochastic integrator with parameter n generates the Bernstein basis polynomials of degree n. The remaining task of forming the linear combination is performed by an (n + 1) : 1 multiplexer. The inputs to the multiplexer are random bitstreams Bi, 0 ≤ i ≤ n representing the (n + 1) Bernstein coefficients bi, while the multiplexer select input is driven by the count, cnt, of the up/down counter of the stochastic integrator. As the counter states are mutually exclusive and exhaustive, the probability of the output random bitstream of the multiplexer to be logic '1' is obtained by invoking the law of total probability as,
Comparing with Equation 6, we conclude that our proposed architecture, depicted in Figure 6 , realizes an arbitrary polynomial of degree n expressed as a Bernstein polynomial. Furthermore, our stochastic implementation of polynomial functions ( Figure 6 ) using sequential logic requires a single random bitstream for the input variable x, compared to the combinational implementation (Figure 2 ) that relies on multiple independent random bitstreams.
EXPERIMENTAL RESULTS
Finally, we compare the fault tolerance and the hardware costs of our proposed stochastic implementation of polynomial functions using sequential logic with the combinational implementation proposed in [9] , and a conventional architecture based on the weighted binary number system. Figure 6 : The proposed stochastic implementation of a degree n polynomial using sequential logic.
Fault Tolerance
In order to compare the fault tolerance of the different polynomial implementations, we consider the gamma correction function. Gamma correction is a non linear operation used to efficiently encode image luminance as, We construct a conventional implementation of the gamma correction function using binary multipliers and adders performing arithmetic on 8-bit data, and generating 8-bit results. The stochastic implementations, on the contrary, were realized using random bitstreams of length 256, as the 8-bit weighted binary representation is equivalent to a random bitstream representation of length 256. We performed simulations on a test image by adding random noise to the input data and measuring the average output error. Specifically, we randomly corrupted % of the input data bits, computed the gamma correction function using the conventional and the stochastic implementations, and measured the average percent error between the result and the correct output in the absence of noise. Figure 7 presents the results of the simulation experiments. The results clearly indicate that the stochastic implementations outperform the conventional implementation in terms of fault tolerance. The stochastic implementations do produce small errors in the absence of noise ( = 0) as a result of the random nature of the computation. However, unlike the conventional implementation, the average output error of the stochastic implementations grows gradually as the input error increases, confirming the robustness of the unweighted random bitstream representation over the weighted binary number system. The simulation results also reveal that between the two stochastic implementations, the combinational implementation [9] is slightly better at tolerating random noise than our proposed sequential implementation. The lower fault tolerance of the sequential implementation is a result of the cor- relation between successive counter states of the stochastic integrator. Specifically, the counter state at time t is not independent of the state at time t − 1. Consequently, unlike the combinational implementation (Figure 2 ), successive bits in the output random bitstream Y of the sequential implementation ( Figure 6 ) exhibit correlation. The correlation in the random bitstream Y leads to a higher variance in the real-valued output gn(x). However, the multiplexer and the Bernstein coefficient random bitstreams Bi mitigate the impact of correlation in the counter states, as evidenced by the small difference in the fault tolerance of the combinational and the sequential implementations. The average output error of the sequential implementation is within 1% of the combinational implementation ( Figure 7) . Moreover, the gamma correction results at input error = 15% in Figure 8 reinforce our assertion as the output images of the combinational and sequential implementations are virtually indistinguishable, whereas the output image of the conventional implementation is barely recognizable.
Hardware Costs
To compare the hardware costs of the different polynomial implementations, we follow the approach outlined in [9] . We synthesize and optimize our proposed architecture in Figure 6 using ABC [1] , and map the optimized implementation to a circuit consisting of only inverters, and 2-input n combinational implementation [9] proposed sequential implementation area delay product area delay product 3  22  10  220  45  7  315  4  40  17  680  75  10  750  5  49  20  980  79  9  711  6  58  20  1160  85  9  765 NAND, NOR, AND, OR, XOR, and XNOR logic gates. We assume that each logic gate requires a unit area and contributes a unit delay to the circuit. Moreover, each memory element is composed of 6 2-input NAND logic gates. Table 2 compares the area and delay of our proposed sequential implementation with the combinational implementation [9] . The presence of memory elements does increase the area of our sequential implementations, but reduces the critical path delay, resulting in a better area-delay product as the Bernstein polynomial degree n increases. Finally, we compare the different polynomial implementations using the area-delay-latency product metric. The latency of a stochastic implementation corresponding to an M -bit weighted conventional implementation is 2 M , since a random bitstream of length 2 M is equivalent to an Mbit binary number. Table 3 presents the area-delay-latency products (qxx) of the different polynomial implementations. The ratios of the area-delay-latency products in the final two columns of Table 3 indicate that our proposed sequential implementation better competes with the conventional implementation as the Bernstein polynomial degree n and the data resolution M increase.
To conclude, we have presented a stochastic implementation of polynomial functions using sequential logic. Our proposed approach is able to realize arbitrary polynomials with a better area-delay-latency product than the previously described combinational implementation [9] . Moreover, our architecture requires a single random bitstream for the input variable, unlike the combinational implementation that relies on multiple independent random bitstreams. The fault tolerance of our sequential implementation, albeit slightly lower than the combinational implementation, is still better than the conventional implementation. Overall, our proposed sequential stochastic computing system is a competitive solution to realize arbitrary polynomial functions.
ACKNOWLEDGMENTS
This material is based on work supported by the National Science Foundation under grant CCF-1408123.
