A New Stochastic Inner Product Core Design for Digital FIR Filters by Wong, Ming Ming et al.
A New Stochastic Inner Product Core Design for Digital FIR Filters
Ming?Ming?Wong1,,?M.?L.?Dennis?Wong2,?Cishen?Zhang3,?and?Ismat?Hijazin3
1Faculty of Engineering, Computing and Science, Swinburne University of Technology, Sarawak Campus, Malaysia.
2Heriot Watt University Malaysia, Wilayah Persekutuan Putrajaya, Malaysia.
3School of Software and Electrical Engineeing, Swinburne University of Technology, Hawthorn VIC 3122, Australia.
Abstract. Stochastic computing (SC) is a computational technique with computational operations governed by
probability instead of arithmetic rules. It recently found promising applications in digital and image process-
ing areas and attracted attentions of researchers. In this paper, a new stochastic inner product (multiply and
accumulate) core with an improved scaling scheme is presented for improving the accuracy and fault tolerance
performance of SC based ﬁnite impulse response (FIR) digital ﬁlters. The proposed inner product core is de-
signed using tree structured multiplexers which is capable of reducing the critical path and fault propagation in
the stochastic circuitry. The designed inner product core can lead to construction of SC based light weight and
multiplierless FIR digital ﬁlters. As a result, an SC based FIR digital FIR ﬁlter is implemented on Altera Cy-
clone V FPGA which operates on stochastic sequences of 256-bits length (8-bits precision level). Experimental
results show that the developed ﬁlter has lower hardware cost, better accuracy and higher fault tolerance level
compared with other stochastic implementations.
1 Introduction
Stochastic computing (SC) [1] is a computational tech-
nique with operations based on probability instead of arith-
metic rules [2–4]. This technique can simplify mathemat-
ical functions, which are computationally demanding in
binary computation, with simple logic operations and re-
duced hardware requirement. It is robust against noise.
And it has a progressive precision characteristic that the
precision of stochastic numbers (bit streams) increases as
computation proceeds [2]. These advantages enabled SC’s
recent applications in signal and image processing, in par-
ticular, in realization and implementation of digital ﬁlters
[5, 6]
In this paper, a new stochastic inner product core with
an improved scaling scheme is presented for improving the
accuracy and fault tolerance performance of SC based ﬁ-
nite impulse response (FIR) digital ﬁlters. The proposed
inner product core is designed using tree structured multi-
plexers which is capable of reducing the critical path and
fault propagation in the stochastic circuitry. The designed
inner product core can lead to construction of SC based
light weight and multiplierless FIR digital ﬁlters.
The performance of the proposed SC based FIR ﬁlter
is tested via hardware implementation on an FPGA using a
case study on a 6th-order FIR digital ﬁlter. With the ﬁlter’s
order varying from 4th to 8th order and each having cutoﬀ
frequencies ranging from 0.2π to 0.8π, empirical analysis
is performed to evaluate the proposed FIR ﬁlter’s accuracy
levels, fault tolerance capabilities as well as the associated
Corresponding author: wmingming7@gmail.com
hardware costs. The obtained results show that our pro-
posed SC ﬁlter design outperforms other existing higher
precision stochastic FIR ﬁlter designs.
The rest of the paper is organized as follows. Section 2
reviews stochastic computational elements of SC. The mo-
tivation and problem statement of proposed SC based FIR
ﬁlter design are presented in Section 3. Next, an improved
stochastic inner product is presented in Section 4. The pro-
posed function is later employed to design a new SC dig-
ital FIR design and the case study is reported in Section
5. The experimental results (accuracy and fault tolerance
analysis) as well as the hardware implementation for the
application is reported and discussed in Section 6. Finally,
some conclusion remarks are drawn in Section 7.
2 Basic Theory of Stochastic Computation
The basic rule of SC is that the computational data (in
bit-streams) are represented as stochastic sequences and
are processed in form of digitized probabilities [3]. Nat-
urally, the representations and all the involved computa-
tions always lie within the real-number unit interval [0, 1].
Stochastic representation can be coded in two formats:
SC-unipolar and SC-bipolar [1].
In SC-unipolar format, the input s is a real number
within the unit interval, i.e. 0 ≤ s ≤ 1. As an ex-
ample, a 2’s complement binary input bit-stream {0100}2
is represented in stochastic bit-streams S , consisting of
4 of bit ‘1’ out of 24 = 16 bits (remaining bits are ze-
ros). This stochastic bit-streams S , is also interpreted as
p = P(S = 1) = 4/16. On the other hand, in SC-bipolar
format, the range of the real number input, s, is extended to
    
 
 
DOI: 10.1051/, 05006 (2017) 712501MATEC Web of Conferences 25 matecconf/201
CSCC 2017
5006
© The Authors,  published  by EDP Sciences.  This  is  an  open  access  article  distributed  under  the  terms  of  the Creative Commons Attribution
 License 4.0 (http://creativecommons.org/licenses/by/4.0/). 
−1 ≤ s ≤ 1. Consider a 2’s complement binary input bit-
stream {1100}2. In SC-bipolar bit-streams S , the determin-
istic value is mapped to p = P(S = 1)/2 = 12/(16 × 2) =
6/16.
In other words, stochastic representation observes the
probability of 1s at arbitrary bit position in S . Such repre-
sentation serves as the main reason for having high fault
tolerance in SC. A single bit-ﬂip in a long bit-stream
causes a minor change in original logical value. On the
contrary, a single bit-ﬂip in the conventional 2’s comple-
ment computation will result in huge error especially if the
bit-ﬂip occurs on the higher-order bit.
Multiplication of two inputs streams, which is com-
putational intensive in conventional signed binary comput-
ing, can be performed using single logical gate in SC. Con-
sider two stochastic input bit-streams, X1 and X2 and the
output for their multiplication, Y , is derived as,
y = P(Y = 1)
= P(X1 = 1)P(X2 = 1)
+(1 − P(X1 = 1))(1 − P(X2 = 1))
Stochastic multiplication in bipolar format is clearly a log-
ical XNOR operation between input bit-streams, X1 and
X2 in digital circuit. For unipolar format, the multiplica-
tion is performed using a logical AND operation instead.
Stochastic multiplier for both unipolar and bipolar formats
are as depicted in Figure 1.
Px2
Px1 Py
Px2
Px1 Py
(i) Unipolar Multiplier (ii) Bipolar Multiplier
AND XO
R
Figure 1. Stochastic Multiplier for (i) SC-unipolar and (ii) SC-
bipolar formats.
Addition in SC is performed using a special opera-
tion, termed as scaled addition. The addition is scaled such
that the value always lies between the probability interval
[0, 1]. With S being a constant scale, the sum of two in-
dependent stochastic bit-streams X1 and X2, produces Y ,
deﬁned as,
y = P(Y = 1)
= P(S = 1)P(X1 = 1) + (1 − P(S ))(P(X2 = 1))
= S X1 + (1 − S )X2
Thus, multiplexer with conditional select line S , set as
P(S ) = 12 can be used to realize the scaled addition of two
stochastic bit-streams in digital circuit. Subtraction in SC
is similar to the adder except that the stochastic scaled sub-
stractor requires an additional inverter and this only feasi-
ble in SC-bipolar format. Both the stochastic scaled adder
and scaled substractor are illustrated in Figure 2.
M
U
X
Px2
Ps=0.5
Px1
Py
M
U
X
Px2
Ps=0.5
Px1
Py
(i) Scaled Adder (ii) Scaled Substractor
Figure 2. Stochastic scaled adder/substractor
3 Problem Statement
FIR ﬁlter is widely used in many classical DSP applica-
tions in order to achieve ﬁltering stability and linear-phase
property. FIR ﬁlters are generally characterized by their
impulse response coeﬃcients which perform multiplica-
tion with the input signals, i.e. the inner product.
Alternatively, these computationally expensive opera-
tions can be well approximated through SC. To be precise,
the summation of the multiplication between input vectors
{X0, X1} and ﬁlter coeﬃcients {a0, a1} can be derived using
a single stochastic operation, the stochastic scaled addi-
tion, i.e. 12
(
a0X0 + a1X1
)
.
Through SC, the computational data are represented in
stochastic bit-streams of 2n bits (n is precision level) and
are processed in the form on digitized probabilities [2].
In terms of hardware, a stochastic scaled addition can be
realized using amultiplexer with its conditional select line,
S set as the scaling factor [2].
Unfortunately, when repetitive computations are in-
volved, the implicit scaling of 12 in stochastic scaled ad-
dition will severly degrade the ﬁlter’s output accuracy
[5]. An alternative stochastic inner product architecture
was reported in [5, 6] where the scaling is performed us-
ing with unevenly weightings. However, signiﬁcant accu-
racy degradation is observed as the ﬁlter’s order increases.
Therefore, to address this issue, a new scaling scheme in
stochastic inner product is required.
4 An Improved Stochastic Inner Product
In this work, an improved stochastic inner product is
designed using a new scaling scheme which considers
the weight distribution of the ﬁlter’s coeﬃcients. Under
this scheme, the coeﬃcients of equal (or near equivalent)
weightings are paired together and form the scaling factor
in the stochastic scaled addition. For instance, coeﬃcients
{a0, a1} is paired when a0 ≈ a1 and this produce scaling
factor |a0 ||a0 |+|a1 | .
A causal discrete-time FIR ﬁlter of order N (length
M = N+1) is generally described as y[n] =
∑N
k=0 a[k]x[n−
k], with y[n] as the output signal, x[n] is the input sig-
nal and {a0, a1, a2, a3, . . . , aN} are the ﬁlter coeﬃcients.
All the 4 types of linear phase FIR ﬁlters have symmet-
ric parameters in absolute value. Therefore, the distribu-
tion of the absolute value of the ﬁlter’s impulse response
    
 
 
DOI: 10.1051/, 05006 (2017) 712501MATEC Web of Conferences 25 matecconf/201
CSCC 2017
5006
2
coeﬃcients resembles a bell curve. The largest value is
weighted at the center of the distribution and decreases
gradually towards the ﬁrst and the last coeﬃcients, i.e
a0 ≈ aN < a1 ≈ aN−1 < a2 ≈ aN−2 < · · · < a N−1
2
. Hence,
the scaled additions are performed according to the pairs
of the FIR ﬁlter coeﬃcients arranged such as follows.
• P0 = {a0, aN}, P1 = {a1, aN−1}, P2 = {a2, aN−2}, . . . and
• PN′= N−12 =
{
a N−1
2
, a N+1
2
}
for even length (such as FIR
Types II and IV),
• PN′= N2 =
{
a N
2
}
for odd length (such as FIR Types I and
III).
Furthermore, note that P0 < P1 < P2 < · · · < P0 < PN′
with N′ = N−12 for even ﬁlter length and N
′ = N2 for odd
ﬁlter length. With that, the next round of scaled additions
are performed following the pairing shown below.
• P′0 = {P0, P1}, P′1 = {P2, P3}, . . . and
• P′
N′′= N′−12
=
{
PN′−1, PN′
}
for N′ is odd,
• P′
N′′= N′2
=
{
PN′
}
for N′ is even.
The similar addition process is repeated until the inner
product computation is completed. An example of the
resultant stochastic inner product using the proposed ap-
proach is shown in (7).
5 Case Study of New SC FIR Filter Design
Consider a 6th-order linear phase Type I FIR ﬁlter with
its taps coeﬃcients labeled as {a0, a1, a2, a3, a4, a5, a6} and
the input vectors listed as {X0, X1, X2, X3, X4, X5, X6}. The
inner product of the ﬁlter is derived as y = a0X0 + a1X1 +
a2X2 + a3X3 + a4X4 + a5X5 + a6X6.
Using the proposed stochastic inner product, the ﬁnal
computation is described in (7) and is illustrated in Figure
3. Note that, both of the input vectors and ﬁlter coeﬃcients
are ﬁrst converted into stochastic bit-streams using SNG
modules [1], which are not shown in the ﬁgure.
Y11 =
( |a0|
|a0| + |a6|
)
X0 +
( |a6|
|a0| + |a6|
)
X6 (1)
Y12 =
( |a1|
|a1| + |a5|
)
X1 +
( |a5|
|a1| + |a5|
)
X5 (2)
Y13 =
( |a2|
|a2| + |a4|
)
X2 +
( |a4|
|a2| + |a4|
)
X4 (3)
Y14 =
(
|a3|X3
)
(4)
Y21 =
( |a0| + |a6|
|a0| + |a1| + |a5| + |a6|
)
Y11
+
( |a1| + |a5|
|a0| + |a1| + |a5| + |a6|
)
Y12 (5)
Y22 =
( |a2| + |a4|
|a2| + |a3| + |a4|
)
Y13 +
( |a3|
|a2| + |a3| + |a4|
)
Y14 (6)
Y =
( |a0| + |a1| + |a5| + |a6|
|a0| + |a1| + |a2| + |a3| + |a4| + |a5| + |a6|
)
Y21
+
( |a2| + |a3| + |a4|
|a0| + |a1| + |a2| + |a3| + |a4| + |a5| + |a6|
)
Y22 (7)
With such ﬁlter’s coeﬃcients, a0 ≈ a6, a1 ≈ a5 and
a2 ≈ a4, the scaled addition in (1), (2) and (3) can be per-
formed using ﬁxed scaling factor 12 . Therefore, the con-
ditional probability selection line (which is determined by
the scaling factor) of the correspondence multiplexers can
share the same SNG modules to promote hardware cost
reduction. The savings will be more prominent in higher
order ﬁlter where there is a large amount of identical coef-
ﬁcients. In addition, our SC FIR ﬁlter is designed with
precision level of 8-bits, whereby the computations are
performed using 28 = 256 bits only. The ﬁlter designs
in [5, 6] are computed using 210 = 1024 bits instead.
6 Experimental Results
Several simulations were performed to test the eﬀective-
ness and the eﬃciency of the proposed SC FIR ﬁlter.
The metric of measurements included the output accuracy
(error-to-signal power ratio), the fault tolerance and the
hardware requirement and performance in FPGA imple-
mentation.
6.1 Accuracy Analysis
The new SC low-pass FIR ﬁlters, implemented in three
diﬀerent orders and each having four diﬀerent cutoﬀ fre-
quencies, are evaluated for their accuracy levels. A total
of 256 samples of input test signal is used in the test sim-
ulation. The test signal consists of a mixture of four si-
nusoidal waves of diﬀerent frequencies padded with white
noise. The accuracies of the proposed ﬁlters are measured
in term of the error-to-signal power ratio and are bench-
marked with the work reported in [5]. These results are as
summarized in Table 1.
The results from [5] showed the error ratio increases
with higher ﬁlters’ order. In constrast, our SC FIR ﬁlter
presents consistently lower error ratio regardless of the ﬁl-
ters’ order. Further accuracy justiﬁcation can be deduced
by comparing the frequency response and the power spec-
tral density (PSD) of the output signal deduced from both
our SC ﬁlter and the ideal ﬁlter (see Figure 4). It is ob-
served that the spectrum of our SC ﬁlter is very close to
that of the ideal ﬁlter.
6.2 Fault Tolerance Analysis
Apart from low hardware cost, SC is well recognized for
being insusceptible towards fault as opposed to the con-
ventional binary computing. Fault tolerance testing is con-
ducted on our proposed SC 6th-order FIR ﬁlter with cutoﬀ
frequency at 0.4π. The test is performed by randomly in-
jecting various percentage of bit-ﬂipping error in the input
signals and the corresponding error-to-signal power ratio
is measured and summarized in Table 2.
The results show that percentage of random bit-
ﬂipping error ranging from 0.5% to 3.0% has minimal
    
 
 
DOI: 10.1051/, 05006 (2017) 712501MATEC Web of Conferences 25 matecconf/201
CSCC 2017
5006
3
D D D Dx(n)
y(n)
mux mux
mux
mux
60
0
aa
a
 51
1
aa
a

6510
60
aaaa
aa


6543210
6510
aaaaaaa
aaaa


D
mux
XOR
Delay
2-1 multiplexer
Legend:
sign(a0) sign(a6)
sign(a5)
sign(a3)
D D
sign(a1)
mux
sign(a2) sign(a4)
42
2
aa
a

mux
432
42
aaa
aa


Figure 3. The new SC 6th-order FIR ﬁlter.
Table 1. Accuracy test results of (i) our proposed design (precision level of 8-bits) compared with (ii) [5] (precision level of 10-bits).
Filter Cutoﬀ Frequency
Filter 0.2π 0.4π 0.6π 0.8π
Order (i) (ii) (i) (ii) (i) (ii) (i) (ii)
2 0.0050 0.0037 0.0046 0.0025 0.0058 0.0013 0.0069 0.0004
4 0.0192 0.0597 0.0123 0.0314 0.0120 0.0465 0.0070 0.0145
6 0.0136 0.0648 0.0075 0.0462 0.0080 0.0637 0.0048 0.0626
8 0.0095 - 0.0107 - 0.0114 - 0.0076 -
Table 2. Error-to-signal power ratio analysis resultant from various percentage of random bit-ﬂipping error in 6th-order FIR ﬁlter with
cutoﬀ frequency at 0.4π.
Filter Percentage of Bit-Flipping
Implementation 0.5% 1.0% 1.5% 2.0% 2.5% 3.0%
Our Work 0.0076 0.0160 0.0209 0.0292 0.0355 0.0514
Conventional Filter 0.1180 0.1874 0.3033 0.3351 0.4409 0.4843
Direct Form [6] 0.0488 0.0540 - - - -
Lattice Form [6] 0.0563 0.0820 - - - -
Table 3. Hardware review for the FPGA implementation of the SC 6th-order FIR ﬁlter with cutoﬀ frequency at 0.4π.
Hardware Requirement/ SC FIR Inner Product SNG
Performance Filter Core Module
Combinational ALUTs (112,960) 128 4 26
Memory ALUTs (56,480) 0 0 0
Dedicated Logic Register (225,920) 159 1 33
Total Register (225,920) 159 1 33
Fmax (MHz) 306.0 0 429.92
Dynamic Thermal Power Dissipation (mW) 2.11 0.04 0.94
    
 
 
DOI: 10.1051/, 05006 (2017) 712501MATEC Web of Conferences 25 matecconf/201
CSCC 2017
5006
4
Figure 4. Output spectrum and PSD derived using the FIR ideal
ﬁlter and our SC FIR ﬁlter. Both ﬁlters are low-pass with 6th-
order and the cutoﬀ frequency at 0.4π.
impact on the output accuracy of our proposed SC FIR
ﬁlter. On the contrary, the conventional ideal FIR ﬁlter
shows signiﬁcant accuracy degradation as the injected bit-
ﬂipping error increases at every 0.5%. These results are
further benchmarked with the work reported in [6]. The
authors presented two SC 7th-order FIR ﬁlter with cutoﬀ
frequency at 0.1π using direct form and lattice form. Both
of their ﬁlters also exhibited higher error percentages in
comparison to our work.
The multiplexers in our proposed inner product core
are positioned in tree structure to avoid error propagation
that tends to occur in long critical path. Therefore, with
short critical path, the presented SC FIR ﬁlter has higher
fault tolerance in comparison to the conventional FIR ﬁlter
as well as the existing SC FIR ﬁlters.
6.3 Hardware Complexity
The proposed SC FIR ﬁlter is implemented in Cyclone V
5CGXFC7D6F31C6 using Quartus II 11.1. The full hard-
ware synthesis result of the ﬁlter as well as its core units;
the inner product and the SNG Module are summarized in
Table 3.
7 Conclusion
A case study of a new SC FIR ﬁlter design using an im-
proved stochastic inner product core was presented in this
paper. Without the use of multiplier, the inner product core
unit employed stochastic scaled addition with a new scal-
ing scheme that paired the ﬁlter’s coeﬃcients in according
to their weightage. The computation was realized using
multiplexers positioned in tree structure, which in turn re-
duces the critical path as well as the fault propagation in
the stochastic circuitry. Such design enhanced the com-
putational accuracy and oﬀered high fault tolerance in SC
ﬁlter system. For hardware evaluation, a new SC 6th-order
FIR ﬁlter with the cutoﬀ frequency at 0.4π on FPGA plat-
form has been implemented and tested. Experimental re-
sults have shown that the presented SC FIR ﬁlter outper-
forms the conventional ﬁlter and the existing works in both
metrics and also has low hardware cost.
References
[1] B. R. Gaines,‘Stochastic computing’, Proceedings of
the Spring Joint Computer Conference, New York, NY,
USA, pp. 149-156,(1967).
[2] A. Alaghi, and J. P. Hayes,‘Survey of Stochastic Com-
puting’, ACM Trans. Embed. Comput. Syst., vol. 12, no.
2, pp. 19, (2013), .
[3] W. Qian, X. Li, M. D. Riedel, K. Bazargan, and D.
J. Lilja, ‘An architecture for fault-tolerant computation
with stochastic logic’,IEEE Transactions on Comput-
ers, vol. 60, no. 1, pp. 93-105, (2011).
[4] B. Moons, and M. Verhelst, ‘Energy-eﬃciency and
accuracy of stochastic computing circuits in emerg-
ing technologies’, IEEE Journal on E,merging and Se-
lected Topics in Circuits and Systems, vol. 4, no. 4, pp.
475-486, (2014).
[5] Y. N. Chang, and K. K. Parhi, ‘Architectures for digital
ﬁlters using stochastic computing’, 2013 IEEE Inter-
national Conference on Acoustics, Speech and Signal
Processing (ICASSP), pp. 2697-2701, (2013).
[6] Y. Liu, and K. K. Parhi,‘Lattice ﬁr digital ﬁlter archi-
tectures using stochastic computing’, 2015 IEEE Inter-
national Conference on Acoustics, Speech and Signal
Processing (ICASSP), pp. 1027-1031, (2015).
    
 
 
DOI: 10.1051/, 05006 (2017) 712501MATEC Web of Conferences 25 matecconf/201
CSCC 2017
5006
5
