Design of a Time Delay Reservoir Using Stochastic Logic: A Feasibility
  Study by Merkel, Cory
Design of a Time Delay Reservoir Using Stochastic
Logic: A Feasibility Study
Cory Merkel
Information Directorate
Air Force Research Laboratory
Rome, NY 13441
Email: cory.merkel.1@us.af.mil
Abstract—This paper presents a stochastic logic time delay
reservoir design. The reservoir is analyzed using a number of
metrics, such as kernel quality, generalization rank, performance
on simple benchmarks, and is also compared to a deterministic
design. A novel re-seeding method is introduced to reduce the ad-
verse effects of stochastic noise, which may also be implemented
in other stochastic logic reservoir computing designs, such as echo
state networks. Benchmark results indicate that the proposed
design performs well on noise-tolerant classification problems,
but more work needs to be done to improve the stochastic logic
time delay reservoir’s robustness for regression problems.
Index Terms—Reservoir computing, time delay reservoir,
stochastic logic, artificial neural networks.
I. INTRODUCTION
Reservoir computing (RC) is proving to be a powerful
machine learning technique for regression, classification, and
forecasting of time series data. Introduced in the early 2000s
by Jaeger [1] and Maass [2], RC is a type of neural network
with an untrained recurrent hidden layer called a reservoir. A
major computational advantage of RC is that the output of the
network can be trained on the reservoir states using simple
regression techniques, without the need for backpropagation.
In the last decade and a half, RC has been successful in a
number of wide-ranging applications domains such as image
classification [3], biosignal processing [4], and optimal control
[5]. In some domains, RC has outperformed state-of-the-art
techniques and is often easier to implement than methods such
as Kalman filtering or long short term memory. Beyond its
computational advantages, one of the main attractions of RC
is that it can be implemented efficiently in hardware with low
area and power overheads.
Today, there are three major categories of RC. The first is
echo state networks (ESNs) [1], where reservoirs are imple-
mented using a recurrent network of continuous (e.g. logistic
sigmoid) neurons. The second category, referred to as liquid
state machines (LSMs) [2] utilizes recurrent connections of
spiking (e.g. leaky integrate and fire) neurons. A challenge
in both of these categories is routing. A reservoir with H
neurons will have up to H2 connections, potentially creating
a large area and power overhead. A third category of RC
called time delay reservoirs (TDR) [6] avoids this overhead
by time multiplexing resources. TDRs utilize a single neuron
and a delayed feedback to create reservoirs with either a chain
topology or even full connectivity (see Supplemental Material
of [6]).
Besides a reduction in routing overhead, TDRs have two
key advantages over ESNs and LSMs. First, adding additional
neurons to the reservoir is trivial and amounts to increasing
the delay in the feedback loop. Second, TDRs can use any
dynamical system to implement their activation function and
can easily be modeled via delay differential equations. This
second point is particularly useful since it means that TDRs
can be implemented using a variety of technologies. For ex-
ample, in [6], Appeltant et al. used a Mackey-Glass oscillator,
which models a number of physiological processes (e.g. blood
circulation), as the non-linear node in a TDR. In [7], a TDR
is demonstrated using coherently driven passive optical cavity.
A TDR has also been implemented using a single XOR gate
with delayed feedback [8]. A common thread among all of
these implementations is that they are analog and some, such
as the photonic implementation, are still large prototypes that
have yet to be integrated into a chip. Aside from the higher
cost and design effort for analog implementations, they are
much more susceptible to noise, especially in RC, where the
system operates on the edge of chaos.
Digital RC designs, and digital circuits in general, have
much better noise immunity compared to analog implemen-
tations. There have been a number of digital designs proposed
for ESNs and LSMs, such as [9], but digital TDR designs
are presently scarce. One example is given in [10], where the
authors have implemented a Mackey-Glass-type TDR on an
FPGA. One of the challenges with digital implementations
is that the area cost can be high due to the requirement of
multipliers for input weighting and implementation of the
activation function. This is especially true if high precision is
required. However, not all applications require high precision.
An alternative design approach to conventional digital logic
is stochastic logic, where values are represented as stochastic
bit streams and characterized by probabilities. Stochastic logic
has previously been used to implement ESNs [11, 12]. In this
work, we explore the feasibility of implementing TDRs with
stochastic logic. To the best of the author’s knowledge, this is
the first paper discussing TDR implementation with stochastic
logic, and hopefully it will serve as a foundation for future
research in this area.
The rest of this paper proceeds as follows: Section II
ar
X
iv
:1
70
2.
04
26
5v
1 
 [s
tat
.M
L]
  1
3 F
eb
 20
17
provides background information on TDRs and stochastic
logic. Section III presents the stochastic logic TDR designed in
this work and discusses tuning of design parameters. Section
IV discusses the performance of the proposed TDR design on
two benchmark tasks: NARMA10 (regression) and sine/square
wave discrimination (classification). Section V concludes this
work.
II. BACKGROUND
A. Time Delay Reservoirs
RC makes use of a random recurrent neural network in order
to regress, forecast, or classify time series data. The basic
structure of an RC is shown in Figure 1. Time series inputs
in the input layer are multiplied by a random weight matrix
s(p) = Winu(p), and then used as inputs to the reservoir layer.
Here, the index p is used to denote a discrete timestep. Within
the reservoir layer, there are a number of neurons (circles) that
are connected with each other through a random weight matrix
x(p+1) = f
(
Wresx(p) + s(p)
)
, where f is an activation
function. The state of the reservoir x(p) is then connected to an
output layer via a third weight matrix yˆ(p) = c
(
Woutx(p)
)
,
where c is the output function. In this work, c is an identity
function, such that the output is given directly by the product
of the output weight matrix and the reservoir state. The output
layer is trained such that the reservoir performs a particular
function (e.g. regression, classification) of the inputs as
Wout∗ = arg min
Wout
1
2
mtrain∑
p=1
(
y(p) − yˆ(p)
)2
(1)
where y(p) is the expected output at timestep p and mtrain
is the size of the training set. In this work, this optimization
problem is solved via regularized least squares:
Wout∗ =
(
XTX+ λI
)−1
XTY, (2)
where λ is the regularization parameter, I is the identity
matrix, X, and Y are the matrices composed of the reservoir
states and the expected outputs corresponding the training set
Utrain, respectively. Note that the only parameters that are
modified during training are the output weights Wout. The
random input and reservoir layers serve to randomly project
the input into a high-dimensional space which increases the
linear separability of inputs. In addition, the recurrent connec-
tions of the reservoir provide a short-term memory that allows
inputs to be integrated over time. This is critical to analyzing
data based on its behavior over multiple timesteps.
A TDR (Figure 2) is a special type of RC that shares
resources in time to reduce routing overhead. A single non-
linear node (shown as an opaque black circle) provides the
activation function, analogous to the sigmoid and spiking
functions in the ESN and LSM, respectively. The activation
function can be any polynomial or transcendental function and
is sometimes governed by a delay differential equation. At
each timestep p, the reservoir’s input is sampled and held for
a duration of H smaller timesteps of duration Ω. For each
of the H timesteps, the held input is multiplied by an input
𝐖𝑟𝑒𝑠
𝐱
 𝐲 = 𝑐 𝐖𝑜𝑢𝑡𝐱
Input Reservoir Output
𝐬 = 𝐖𝑖𝑛𝐮
Fig. 1. Overview of RC. (a) Basic structure of an RC design, showing the
three layers: input, reservoir, and output.
𝑦
𝑥
𝜃
𝑤𝑖,𝑗
𝑖𝑛𝑢𝑗 𝑡
𝜏
 𝑥 =⋯
Ω
𝑤𝑖
𝑖𝑛𝑢 𝑝 + 𝜃𝑖
𝜏
𝑓 𝑠𝑖
𝑝
…
𝑡
Sample and 
hold points
𝑝 = 1
𝑝 = 2𝑝 = 3
𝑥1
𝑝−1
𝑥𝐻
𝑝
…
𝑥𝐻−1
𝑝
𝑥1
𝑝 𝑥2
𝑝
𝑥3
𝑝
Fig. 2. Overview of a TDR, showing the use of a delay line to create multiple
virtual reservoir nodes from one physical node.
weight wini and then added to a bias term θi. The weighted
and biased input is added to the delayed state of the reservoir
node, x(p−1)τ−H , where τ ≥ H is the delay of the feedback.
In this work, τ = H + 1. The sum is then fed back into
the activation function. In this way, H components make
up the reservoir’s state corresponding to each sampled and
held input. This approach is attractive because the hardware
implementation usually consists of a simple circuit and a delay
line without the routing overhead associated with ESNs and
LSMs. However, TDRs are more restricted in terms of their
connectivity patterns and may require numerical simulation of
DDEs when implemented in software. For an instantaneous
activation function (e.g. f settles within Ω) and τ = H + 1, a
TDR has a unidirectional ring topology.
B. Stochastic Logic
The next section will present an efficient hardware imple-
mentation of a TDR using stochastic computing techniques.
Stochastic logic was pioneered by von Neumann [13] half a
century ago and was later adopted by the machine learning
community to reduce hardware complexity, power, and unre-
liability [14, 15]. At the heart of stochastic computing is the
stochastic representation of binary numbers. In a single-line
bipolar [15] stochastic representation, a q-bit 2’s complement
number z ∈ {−2q−1, . . . ,−1, 0, 1, . . . , 2q−1 − 1} is mapped
to a Bernoulli process Z of length L [16, 17]:
z ≡ Pr(Zr = 1) = z + 2
q−1
2q − 1 =
z˜ + 1
2
, (3)
 𝑧1
 𝑧2
 𝑦 =  𝑧1  𝑧2
 𝑧  𝑦 = −  𝑧
 𝑧0
 𝑧𝑛
𝑉 ∈ {0,1, … , 𝑛}
 𝑦 =  
𝑘=0
𝑛
Pr 𝑉 = 𝑘  𝑧𝑘
…
Fig. 3. Basic logic gates implementing mathematical operations on single-line
bipolar stochastic bit steams.
where the terms z ∈ [0, 1] and z˜ ∈ [−1, 1] are defined
for convenience and r = 1, 2, . . . , L is an index into the
stochastic bit stream. Converting from the binary to the
stochastic representation can be achieved using a random
number generator such as a linear feedback shift register
(LFSR) and a comparator. If the random number is less than
or equal to the value held in the register, then a logic 1 value
is produced on the output. Otherwise, the comparator output is
logic 0 [16]. As L becomes large, Pr(Zr = 1) approaches the
value in (3). Converting from a stochastic representation back
to a digital number can be achieved by counting the number
of 1’s and 0’s in the stream. By initializing the counter to
zero, adding ‘1’ every time a ‘1’ is encountered in the stream
and subtracting ‘1’ every time a ‘0’ is encountered, the final
counter value will be the 2’s complement binary representation
of the bit stream.
One advantage of a stochastic representation is that sev-
eral mathematical operations become trivial to implement in
hardware. For example, consider the logic function Yi =
g(I1, I2, . . . , In), where each input Ij to the function is
mapped to a stochastic bit stream Zj and, therefore, the output
is also a stochastic bit stream. The probability that the function
evaluates to ‘1’ is given by
Pr(Yi = 1) =
∑
I1,I2...
f(I1, I2, . . .)
n∏
k=1
Pr(Zki = Ik), (4)
which is a multivariate polynomial in z1, z2, . . . , zn with inte-
ger coefficients and powers no greater than 1. Implementations
for a number of stochastic logic operations are shown in Figure
3. Note that, in general, the implementation of stochastic logic
operations will be different for unipolar representations. For
example, in the case of a bipolar representation, multiplication
is implemented using an XNOR gate (see Figure 3). However,
for a unipolar representation, the same operation uses an AND
gate. Other basic operations such as negation and weighted
averaging are achieved using inverters and multiplexers, re-
spectively.
In addition to the simple mathematical operations shown in
Figure 3, it will also be necessary for the stochastic logic RC
to implement a non-linear activation function. As indicated by
(4), this is trivial if the activation function is a polynomial.
However, if the activation function is not a transcendental
function, then one way to implement it is to approximate it
with a polynomial. Bernstein polynomials are a good choice,
since they can approximate any function on the unit interval (or
any other interval) with arbitrary precision, which was shown
by Bernstein as part of a proof the Weierstrass approximation
theorem [18, 19]. The Bernstein basis polynomials of degree
n are defined as [19]
bk,n(z) ≡
(
n
k
)
zk(1− z)n−k, k = 0, 1, . . . , n. (5)
A Bernstein polynomial of degree n is defined as a linear
combination of the nth-degree Bernstein basis polynomials:
Bn(z) ≡
n∑
k=0
βkbk,n(z) (6)
The coefficients βk are called the Bernstein coefficients. Fur-
thermore, the nth-degree Bernstein polynomial for a function
f(z) is defined as
Bn(f ; z) =
n∑
k=0
f
(
k
n
)
bk,n(z). (7)
Bernstein showed that Bn(f ; z) approaches f(z) uniformly on
[0, 1] as n approaches infinity.
It can also be shown that the set of Bernstein basis poly-
nomials {bk,n} of degree n forms a basis for the space of
power-form polynomials with real coefficients and degree no
more than n. In other words, the power-form polynomial
p(z) =
n∑
i=0
aiz
i (8)
can be written in the form of (6). In [20], it is shown that the
Bernstein coefficients can be obtained from the power-form
coefficients as
βk =
k∑
i=0
(
k
j
)(
n
j
)−1
ai. (9)
It is important to note that if f in (6) maps the unit interval
to the unit interval, then f(k/n) is also in the unit interval.
Similarly, Qian et al. have proven that if p in (8) maps the unit
interval to the unit interval, then the Bernstein coefficients
in (9) will also be in the unit interval. Coefficients in the
unit interval are important because they can be represented
stochastically.
In summary, any non-polynomial (polynomial) function that
maps the unit interval to the unit interval can be approximated
by (written in the form of) a Bernstein polynomial with
coefficients that are also in the unit interval. To find the
coefficients for non-polynomial functions, one may form the
constrained optimization problem [17]:
minimize
{β0...βn}
1∫
0
(
f(z)−
n∑
k=0
βkbk,n (z)
)2
dz
subject to βk ∈ [0, 1]∀k = 0, 1, . . . , n
(10)
which can be solved using numerical techniques.
Bernstein polynomials can be implemented in stochastic
logic using only an adder and a MUX [17]. Consider an adder
with n inputs, each one a stochastic bit stream Z1 . . . Sn.
Furthermore, let each bit stream be independent and identically
distributed, such that z ≡ zi = zj∀i, j. Then, at a particular
time t, the adder will be adding n bits, each one being ‘1’ with
probability z. Therefore, for their sum V to be a particular
value k requires that k bits are ‘1’ and n − k bits are ‘0’.
The probability of this occurring is
(
n
k
)
zk(1 − z)n−k. Now,
connecting the adder’s output to the select line of a MUX,
with inputs equal to the Bernstein coefficients, results in the
Bernstein polynomial in (6).
III. STOCHASTIC LOGIC TDR DESIGN
The stochastic logic TDR designed in this work is shown
in Figure 4. The design is composed of three parts to provide
input weight, compute the non-linear activation function, and
hold the reservoir state. The input weighting stage takes in an
analog signal, converts it to a digital signal using an analog-
to-digital (A2D) converter, and converts that to a stochastic
bit stream using a binary-to-stochastic (B2S) converter. In this
design, the number of bits in the LFSRs in each B2S is equal to
the number of bits in the binary representation of the input, q.
The stochastic representation of the input is then multiplied by
the input weight using an XNOR gate, as discussed in the last
section. Then, the signal is mixed with the delayed reservoir
state and added to the input bias using MUXes. The non-linear
node estimates the non-linear activation function f (s) using
Bernstein polynomials. Shift registers are used to delay the
non-linear node’s input in order to create multiple statistically
independent copies of the same stochastic bit stream. In this
design, the activation function implemented is
f(s˜) = sin(γs˜), (11)
where γ is a frequency term. However, recall from the discus-
sion in Section II-B that Bernstein polynomials map the unit
interval to the unit interval. By definition, s ranges from -1 to
+1, and the sin function also ranges from -1 to +1. In general,
any function to be implemented by a Bernstein polynomial
has to be shifted and scaled so that the portion of it that is
used lies entirely in the unit square. In the case of sin (γs),
this is achieved by transforming the function as
f (s) =
f (∆s [s− 0.5])− fmin
fmax − fmin , (12)
where ∆s is the domain of interest, fmax is the maximum of
the function on [−∆s2 , ∆s2 ], and fmin is the minimum of the
function on [−∆s2 , ∆s2 ]. For the function f(s), this results in
the dotted curve in Figure 5. Also shown is the stochastic logic
approximation using Bernstein polynomials with L = 1000
and n = 5. Notice that the features of the curve around s = 0
and s = 1 are not reproduced well by this approximation
but could be by increasing n. However, it was found that the
approximation, which is similar to a logistic sigmoid function
works well for the benchmarks explored in this work.
After the activation function is computed, the reservoir node
xi is converted back to a binary number using a stochastic-
to-binary (S2B) converter and then placed in a shift register
which holds the entire reservoir state x. Although the states
could be stored in their stochastic representation, storing them
as binary values is more area-efficient since L q. Note that
a control block is also included in Figure 4 to emphasize that
the sample-and-hold circuit (inside the A2D), the B2S, and the
S2B require a state machine when implemented in hardware.
However, all of the simulations in this work are behavioral
and implemented in MATLAB, so this block wasn’t explicitly
required. The shift register serves as the delay line shown in
Figure 2. Note that training was not the focus of this work
and performed using a non-stochastic implementation of (2).
Next is the task of choosing the design parameters α and γ.
One way is to look at application-dependent metrics such as
accuracy, specificity, sensitivity, mean squared error, etc., and
see how they vary over the parameter space via, e.g., a grid
search. Another way is to use metrics that capture features
such as the reservoir’s short-term memory capacity, ability
to linearly separate input data, capability of mapping similar
inputs to similar reservoir states (generalization), and different
inputs to distant reservoir states (separation). These types of
application-independent metrics provide more insight into the
effects of different parameter choices on the TDR’s computing
power than metrics like accuracy. This work makes use of such
metrics: Kernel quality (KQ) and generalization rank (GR)
[21]. KQ is calculated by driving the reservoir with H random
input streams of length m. At the end of each sequence, the
reservoir’s final state vector is inserted as a column into an
H ×H state matrix X. Then, KQ is equal to the rank of X.
It is desired to have X be full rank, meaning that different
inputs map to different reservoir states. However, note that
the number of training patterns is usually much larger than
H , so if rank(X) = H , it doesn’t mean that any training
dataset can be fit exactly. In fact, exact fitting, or overfitting,
is generally bad, since it means that the TDR (or any machine
learning algorithm) won’t generalize well to novel inputs.
Therefore, another metric, GR, is used to measure the TDR’s
generalization capability. GR is calculated in a similar way,
except that all of the input vectors are identical except for some
random noise. GR is an estimate of the Vapnik-Chervonenkis
dimension [21], which is a measure of learning capacity. A
low GR is desirable. Therefore, to achieve good performance,
the difference KQ-GR should be maximized.
Figure 6 shows the KQ and GR metrics for the stochastic
logic TDR with L = n = ∞. The values are normalized to
H and are averaged over 10 runs. In each subplot, the size
of the reservoir is H = 50, and m = 50. KQ is close to 0
for α = 0. When α = 0, the TDR does not accept any new
inputs, so if the initial TDR state is all zeros, then the final
state matrix will be a zero matrix, which has a rank of zero.
For larger α values, KQ becomes non-zero. When γ is small,
the activation function is approximately linear, which leads to
a smaller KQ. In fact, it is likely that the TDR operates in
the deterministic phase for γ < 1. As γ becomes larger, KQ
becomes equal to H . This is because the non-linearity of the
activation function increases with γ, and results in the TDR
operating within the chaotic regime. The GR metric (Figure
B2S    A2D
Input Weight Reg
Input Bias Reg
Shift Reg
α 
B2S
B2S
B2S
Bernstein 
Coefficient 
Reg
D Q
D Q D Q
B2S
Bernstein 
Coefficient 
Reg
B2S
S2B
...
...
0.5
...
Non-linear NodeInput Weighting
Reservoir State
Control
𝑢 𝑝
 𝑤𝑖
𝐱 𝑝
 𝑢 𝑝
 𝑤𝑖  𝑢
𝑝
𝛼 𝑤𝑖  𝑢
𝑝 + 1 − 𝛼  𝑥𝑖∗
𝑝
 𝜃𝑖
 𝑠𝑖
𝑝 =
𝛼 𝑤𝑖  𝑢
𝑝 + 1 − 𝛼  𝑥𝑖∗
𝑝 +  𝜃𝑖
2
 𝑠𝑖
𝑝
 𝑠𝑖
𝑝
 𝑥𝑖
𝑝
 𝑥𝑖∗
𝑝
Pr 𝑉 = 𝑘 =
𝑛
𝑘
 𝑠𝑖
𝑝 𝑘 1 −  𝑠𝑖
𝑛−𝑘
𝑥𝑖
𝑝
0 𝑛
𝑞
𝑞
𝑞
𝑞
log2 𝑛
2
𝛽0𝛽𝑛
𝑞𝑞
𝑞𝑁
𝑞
Input
Output
Fig. 4. Stochastic logic TDR designed in this work.
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
x
~ -
f(s)
Stochastic approximation
s-
-
Fig. 5. Activation function and stochastic approximation implemented using
Bernstein polynomials (n = 5, L = 1000).
6(b)) has similar behavior. When the difference KQ-GR is
taken (Figure 6(c)), a small region of optimal α and γ is
observed. Values of α should be somewhere between 0 and
1. If α is too large, then the TDR will have no memory, and
if α is too small, then it will ignore inputs. Furthermore, if γ
is too large, then the TDR will overfit the training data, and
if γ is too small then the TDR won’t have enough dynamic
behavior. Also note that the optimal parameter values will have
some application dependence. In this work, parameter values
of α = 0.2 and γ = 2 were determined empirically to be
the best for the studied benchmarks. However, from Figure
6(c), it appears that this choice is suboptimal. Therefore, one
should be cautious when using metrics such as GR and KQ
and always consider the behavior of the RC for a chosen set
of applications.
Studied next was the effect of the stochastic bit stream
length L on the TDR metrics. Intuitively, one would expect
that a small value of L would lead to both a large KQ and GR,
since the variance introduced from the stochastic computation
is ∝ 1/L. Indeed, this is true. Figure 7 shows the KQ and
GR metrics for two cases. In the first case (no seed), each
LFSR was only seeded at the beginning of the simulation. This
resulted in KQ-GR equal to zero over all L values, except
L = 1, where the noise wasn’t large enough to modify the
stochastic representation. From the previous discussion, we
see that KQ-GR will eventually become non-zero as L→∞.
However, that would mean that the TDR may have to wait
an impractical number of clock cycles for each calculation.
Instead, the approach used in this work is to re-seed each
PRNG for every reservoir node, with a unique seed for that
node. Although this doesn’t eliminate stochastic noise, it does
keep the effect of the noise approximately constant for each
node. With re-seeding (Figure 7), KQ-GR is non-zero for
reasonable L values such as 100 and 1000.
IV. BENCHMARK RESULTS AND ANALYSIS
The stochastic logic TDR proposed in this work was
tested on two simple benchmarks: NARMA10 (regression) and
sine/square wave discrimination (classification). NARMA10 is
α0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
𝛾 
KQ
(a)
α
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
𝛾 
GR
(b)
α
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.05
0.1
0.15
0.2
0.25
0.3
𝛾 
KQ-GR
(c)
Fig. 6. Computational capability of the TDR for L =∞. (a) KQ versus α and γ. (b) GR vs. α and γ. (c) KQ-GR vs. α and γ. In all cases, H = 50.
100 101 102 103
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
L
A
.U
.
KQ (no seed)
GR (no seed)
KQ−GR (no seed)
KQ (with seed)
GR (with seed)
KQ−GR (with seed)
Fig. 7. KQ and GR vs. L in the stochastic logic TDR. Results are shown for
the cases where the PRNG isn’t (no seed) and is (seed) re-seeded for each
reservoir node.
a standard benchmark used in RC research. Given a random
vector u ∈ [0, 0.5]m, the goal is to train the RC to compute
y(p+1) = 0.3y
(p)
i + 0.05y
(p)
9∑
k=0
y(p−k) + 1.5u(p)u(p−9) + 0.1.
(13)
In this work, the TDR was trained on a set of 1000 datapoints
and tested on an additional 1000. Figure 8 shows the normal-
ized mean square error (NMSE) of the TDR on the test data.
The NMSE is calculated as
NMSE =
∑
p
(
y(p) − yˆ(p))2∑
p
(
y(p) − 〈ytrain〉
) , (14)
where 〈·〉 is the arithmetic mean, and ytrain is the vector of
the entire training sequence. The plot shows an average over
10 runs, with error bars indicating the standard deviation. It
is observed that the NMSE is much larger than the “ideal”
case (L =∞) until L becomes very large (e.g. 1×104). Such
a large value of L would give the TDR a prohibitively large
latency and may only be feasible if the time constant of the
input signal is very large (i.e. the input changes slowly). It
is possible to improve the NMSE by adding more reservoir
nodes, which can be observed as H changes from 50 to 100.
However, in a digital implementation, each new node requires
additional hardware to store the additional component of the
reservoir state. This could become costly if H is very large.
To test its performance on classification tasks, the stochastic
logic TDR was trained to discriminated between sine and
square wave signals. 1000 training and test cases were used,
as was the case for the NARMA 10 benchmark. The training
and test sequences were created by randomly interposing
100 101 102 103 104
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
L
N
M
SE
H = 50
H = 100
L=∞ 
L=∞ 
Fig. 8. NARMA10 benchmark NMSE vs. L using stochastic logic TDR for
reservoir sizes of 10 and 100. Lines marked L = ∞ correspond to ideal
implementations with no stochastic noise.
100 101 102 103 104
50
60
70
80
90
100
L
A
cc
ur
ac
y 
[%
]
H = 50
H = 100
L=∞ 
L=∞ 
Fig. 9. Sine/square wave discrimination benchmark accuracy vs. L using
stochastic logic TDR for reservoir sizes of 10 and 100. Lines marked L =∞
correspond to ideal implementations with no stochastic noise.
segments of sine waves into a square wave such that there
was a 50% chance that any point in the sequence was part
of a sine (or square) wave. The results are shown in Figure
9. As expected, at very small values of L, the TDR gives
classification accuracies that are close to random chance.
However, when L is equal to 100, which is fairly small,
the TDR performs approximately as well as the deterministic
TDR. In-fact, for H = 50, the accuracy of the stochastic TDR
surpasses that of the deterministic design. At first, this seems
counterintuitive, since smaller values of L result in more noise.
However, the noise is actually acting as a regularizer in this
case, reducing the stochastic TDR’s ability to overfit.
V. CONCLUSIONS AND FUTURE WORK
This work studied a novel TDR design that uses stochastic
logic to perform weighting, biasing, and activation function
operations. The design is more flexible than previous ap-
proaches as it allows any activation function to be implemented
after it is properly shifted and scaled. Optimal design parame-
ters are chosen based on kernel quality and generalization rank,
and a method for reducing stochastic noise using re-seeding
was proposed. The design was tested using the NARMA10 and
sine/square wave discrimination benchmarks. Results indicate
that high-precision benchmarks, such as NARMA10, do not
perform well on a stochastic TDR due to random noise.
However, for classification benchmarks, the stochastic TDR
is more area efficient than previous design approaches. This
paper provides a foundation for future research directions
related to stochastic logic TDRs. Some potentially fruitful
avenues include investigation of other activation functions
(e.g. Mackey-Glass), methods for reducing the complexity
of B2S converters (i.e. removing expensive comparators and
LFSRs), exploring methods for reducing stochastic noise,
and investigation of emerging memory technologies for more
efficient storage of reservoir states.
ACKNOWLEDGMENTS
The author is grateful to Nathan McDonald, Clare
Thiem, Lisa Loomis, and Ashley Prater for proofreading the
manuscript and providing helpful discussions.
The material and results presented in this paper have been
CLEARED (Distribution A) for public release, unlimited
distribution by AFRL, case number 88ABW-2016-6393. Any
opinions, findings and conclusions or recommendations ex-
pressed in this material are those of the author and do not
necessarily reflect the views of AFRL or its contractors.
REFERENCES
[1] H. Jaeger, “The echo state approach to analysing and
training recurrent neural networks-with an erratum note,”
Bonn, Germany: German National Research Center for
Information Technology GMD Technical Report, vol.
148, p. 34, 2001.
[2] W. Maass, T. Natschla¨ger, and H. Markram, “Real-
time computing without stable states: A new framework
for neural computation based on perturbations,” Neural
computation, vol. 14, no. 11, pp. 2531–2560, 2002.
[3] A. Woodward and T. Ikegami, “A reservoir computing
approach to image classification using coupled echo state
and back-propagation neural networks,” in Proc. of 26th
Int. Conf. on Image and Vision Computing, Auckland,
New Zealand, November, 2011, pp. 543–458.
[4] D. Kudithipudi, Q. Saleh, C. Merkel, J. Thesing, and
B. Wysocki, “Design and analysis of a neuromemristive
reservoir computing architecture for biosignal process-
ing,” Frontiers in neuroscience, vol. 9, 2015.
[5] C.-Y. Tsai, X. Dutoit, K.-T. Song, H. Van Brussel,
and M. Nuttin, “Robust face tracking control of a
mobile robot using self-tuning Kalman filter and echo
state network,” Asian Journal of Control, vol. 12,
no. 4, pp. 488–509, 2010. [Online]. Available: http:
//doi.wiley.com/10.1002/asjc.204
[6] L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danck-
aert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso,
and I. Fischer, “Information processing using a single
dynamical node as complex system,” Nature communi-
cations, vol. 2, p. 468, 2011.
[7] Q. Vinckier, F. Duport, A. Smerieri, K. Vandoorne,
P. Bienstman, M. Haelterman, and S. Massar, “High-
performance photonic reservoir computer based on a
coherently driven passive cavity,” Optica, vol. 2, no. 5,
pp. 438–446, 2015.
[8] N. D. Haynes, M. C. Soriano, D. P. Rosin, I. Fischer, and
D. J. Gauthier, “Reservoir computing with a single time-
delay autonomous Boolean node,” Physical Review E -
Statistical, Nonlinear, and Soft Matter Physics, vol. 91,
no. 2, pp. 1–5, 2015.
[9] Y. Jin, Y. Liu, and P. Li, “SSO-LSM: A sparse and
self-organizing architecture for liquid state machine
based neural processors,” in Nanoscale Architectures
(NANOARCH), 2016 IEEE/ACM International Sympo-
sium on. IEEE, 2016, pp. 55–60.
[10] M. L. Alomar, M. C. Soriano, M. Escalona-Mora´n,
V. Canals, I. Fischer, C. R. Mirasso, and J. L. Rossello´,
“Digital implementation of a single dynamical node
reservoir computer,” IEEE Transactions on Circuits and
Systems II: Express Briefs, vol. 62, no. 10, pp. 977–981,
2015.
[11] M. L. Alomar, V. Canals, N. Perez-Mora, V. Martı´nez-
Moll, and J. L. Rossello´, “FPGA-Based Stochastic Echo
State Networks for Time-Series Forecasting,” Computa-
tional Intelligence and Neuroscience, vol. 2016, no. ND,
pp. 1–14, 2016.
[12] D. Verstraeten, B. Schrauwen, and D. Stroobandt, “Reser-
voir computing with stochastic bitstream neurons,” in
Proceedings of the 16th annual Prorisc workshop, 2005,
pp. 454–459.
[13] J. von Neumann, “Probabilistic logics and the synthe-
sis of reliable organisms from unreliable components,”
Automata Studies, vol. 34, pp. 43–99, 1956.
[14] B. Gaines, “Stochastic computing systems,” in Advances
in Information Systems Science, J. Tou, Ed. Plenum,
1969, ch. 2, pp. 37–172.
[15] B. Brown and H. Card, “Stochastic neural computation.
I. Computational elements,” IEEE Transactions on Com-
puters, vol. 50, no. 9, pp. 891–905, Sep. 2001.
[16] S. Toral, J. Quero, and L. Franquelo, “Stochastic pulse
coded arithmetic,” in IEEE International Symposium on
Circuits and Systems, ser. ISCAS 2000, vol. 1, 2000, pp.
599–602.
[17] W. Qian, X. Li, M. Riedel, K. Bazargan, and D. Lilja,
“An architecture for fault-tolerant computation with
stochastic logic,” IEEE Transactions on Computers,
vol. 60, no. 1, pp. 93–105, Jan. 2011.
[18] K. M. Levasseur, “Bernstein polynomials,” The American
Mathematical Monthly, vol. 91, no. 4, pp. 249–250, 1984.
[19] G. G. Lorentz, Bernstein Polynomials, 2nd ed. Chelsea
Publishing Company, 1986.
[20] W. Qian, M. D. Riedel, and I. Rosenberg, “Uniform ap-
proximation and bernstein polynomials with coefficients
in the unit interval,” European Journal of Combinatorics,
vol. 32, no. 3, pp. 448–463, 2011.
[21] R. Legenstein and W. Maass, “Edge of chaos and pre-
diction of computational performance for neural circuit
models,” Neural Networks, vol. 20, no. 3, pp. 323–334,
2007.
