Concept and Microarchitecture of a Streaming Processor Specialized for Biomeditronic and Adaptronic Applications by Samman, Faizal Arya & Surapong, Pongyupinpanich
40 INTERNATIONNAL JOURNAL OF APPLIED BIOMEDICAL ENGINEERING VOL.6, NO.1 2013
Concept and Microarchitecture of
a Streaming Processor Specialized for
Biomeditronic and Adaptronic Applications
Faizal Arya Samman1 and Pongyupinpanich Surapong2 ,
ABSTRACT
This paper presents a streaming processor specif-
ically designed for adaptronic and biomedical engi-
neering applications. The main characteristics of
the streaming processor are the flexibility to im-
plement floating-point-based scientific computations
commonly performed in the digital signal processing
application. The floating-point operators are con-
nected to dual-port memories through separated 3
operand-buses and 2 resultant-buses. Synthesized
with 130-nm technology, the Spectron can be clocked
at 480 MHz. The processor can perform 4 paral-
lel streaming/pipeline floating-point operations us-
ing its FPMAC and CORDIC cores, resulting in
a performance of about 4 × 485 = 1.94 GFlops
(Giga Floating-point operation per second), which
is suitable for high performance image processing in
biomedical electronic engineering applications.
Keywords: Reconfigurable Streaming Processor,
Adaptive Signal Processing, Floating-Point Arith-
metic, CORDIC Algorithm, Biomedical Electronic
Engineering, Adaptronic
1. INTRODUCTION
Streaming processors are commonly used to accel-
erate the computations of a scientific formulas used in
many engineering and information technology appli-
cation areas. The scientific computations are specif-
ically required to solve some chemical and physical
problems, signal and image processing problems and
other problems in civil and mechanical engineering.
The scientific computations are often very complex
and sometimes also require very short computation
time (hard real-time constraints). By using a general-
purpose computer system, the problem can be solved
well functionally. However, due to the complexity of
the scientific computations, real-time requirements of
Manuscript received on June 27, 2013 ; revised on October
20, 2013.
1 Faizal A. Samman is with Universitas Hasanuddin, Fakul-
tas Teknik, Jurusan Teknik Elektro, Jl. Perintis Kemerdekaan
Km. 10, Tamalanrea, Makassar 90245, Indonesia, email: faiza-
las@unhas.ac.id.
2 Pongyupinpanich Surapong is with Ramkhamhaeng Uni-
versity, Faculty of Engineering, Department of Computer Engi-
neering, Ramkhamhaeng Road, Hua Mark, Bangkapi, Bangkok
10240, Thailand, email: surapong@riees.org.
the computation cannot be meet. Therefore, a spe-
cific streaming processor is proposed to solve the hard
real-time problem. So far, streaming processors have
been developed by some communities [13] [17], [2] for
a specific application and/or for wider range of ap-
plications. The RSV PTM streaming processor for
instance [2] can be programmed by describing the
shape and the location of vector data streams in mem-
ory and describing the computation of the streaming
data in a data-flow graphs. However, the RSV PTM
streaming processor does not support floating-point-
based operations.
Streaming processors have been widely developed
for many applications. Some of them are imple-
mented for example as an application-specific stream-
ing processor. A few application examples can be
found such as streaming processors for fluid dynamic
computations based on lattice Boltzmann method
[13] and for accelerating ray tracing traversal algo-
rithm in a graphic rendering application [17]. The
streaming processor can be also implemented as a re-
configurable streaming processor that can be recon-
figured for several scientific computing applications.
The RSV PTM streaming processor presented in [2]
can be programmed by describing the shape and the
location of vector data streams in memory and de-
scribing the computation of the streaming data in a
data-flow graphs. However, the RSV PTM stream-
ing processor does not support floating-point-based
arithmetic operations.
A few floating-point-based processors, which also
support streaming computations, have been devel-
oped so far. Intel Corporation has designed a 6.2-
GFlops Floating-Point Multiply-Accumulator (FP-
MAC) [18]. The FPMAC processors are dedicated
for a Teraflops Multicore System, which could run
tera floating-point operations per second by intercon-
necting 80 FPMAC cores through a 2D mesh 8 × 10
on-chip network. Another floating-point-based pro-
cessor that supports streaming computations is SPE
(Synergistic Processing Element) processor [6] from
IBM Corporation.
This paper presents a reconfigurable streaming
processor supporting pipeline floating-point oper-
ations called SPECTRON (S treaming Processor
SpEC ific for BiomediTRON ic and AdapTRON ic
Application). SPECTRON is designed to implement
several complex adaptive signal processing algorithms
F.A. Samman and P. Surapong 41
TSB TSBTSBTSB TSBTSBTSB TSBTSB
TSB TSB TSB TSB TSB TSBTSB TSB TSB TSB TSB TSB TSB TSB
TSB TSBTSB TSB TSBTSB TSB TSBTSBTSB TSBTSB
Bus B
Bus A
Bus C
Dual
RAM
Port
1
Bus D
Bus E
Shift
Register
FPMAC CORDIC
CoreCoreInterface
Central
Control
Unit
(CCU)
Dual Port
RAM
Dual Port
RAM
3
I/O or Network
2
mcf pni
eR,eW,rAdd eR,eW,rAdd,wAdd eR,eW,rAdd,wAdd eR,eW,rAdd,wAdd
Fig.1: The microarchitecture of the SPECTRON processor.
to solve some general hard real-time digital control
system problems. In particular, SPECTRON is also
dedicated to be used in adaptronic and biomedical
electronic engineering (biomeditronic) applications.
In a special case, it can be used to cover common
network-induced delay problems in a large distributed
control system.
2. MOTIVATIONS
Adaptronic system is a new technology trend re-
lated to the development of a smart structure, in
which sensors and actuators become integrated parts
of the structure and are integrated to the structure
during a manufacturing process, resulting in a new
multi functional material [9]. The smart structures
are developed in order to be capable to sense and re-
act actively and adaptively in response to the changes
of structure environment and disturbances applied to
the structures. In order to acquire a smart charac-
teristic of the structure, a real-time adaptive signal
processor is required.
Most of the problems in the adaptronic systems
have strong relationship with general control engi-
neering problems such as active noise control (ANC),
active structural acoustic control (ASAC) and ac-
tive vibration control (AVC). The adaptronics are
also related to the life-cycle engineering topics as well
as in structural health monitoring (SHM), structural
health control (SHC) [12], active crack control or ac-
tive structural damage control topics.
The problems in the biomeditronic applications
can also be generally solved by using adaptive sig-
nal processing systems such as adaptive least-mean-
square (LMS) filters and discrete fourier transforma-
tors. Biomedical engineering applications, such as fil-
tering and frequency analysis of electrocardiogram,
speech processing and hearing aid systems, medical
image registration and image segmentation [7], re-
quires basic signal processing computations such as
noise filtering, parameter estimation, spectrum com-
putation, etc. The computation will principally need
common operators such addition, multiplication, di-
vision and accumulation.
In hearing aid applications, digital signal proces-
sors (DSPs) give more benefits in the future, espe-
cially due to the shrinking of the transistor feature
sizes [5]. The circuit designers get more flexibility to
implement a more complex control signal algorithm
for the digital-based hearing aid products. Despite
the lower flexibility and higher noise-sensitivty, the
analog approach gives lower power dissipation com-
pared to the digital one. The hearing-aid chip pre-
sented by Serra-Graells et al. is a true example of
a programmbable analog signal processor [14]. The
analog core is used to process the audio signal, while
the digital core is used to program the analog core.
However, the digital programming core can only pro-
gram the parameters of the analog core. The struc-
ture of the analog processor core itself cannot be fur-
ther reconfigured.
3. PROCESSOR ARCHITECTURE
The proposed reconfigurable SPECTRON stream-
ing processor consists of a set of floating-point arith-
metic units and memory units. Fig. 1 shows the
architecture of the streaming processor. As shown
in the figure, there are 3 bus systems for operands
(Bus A, Bus B and Bus C) and 2 buses for operation
results (Bus D and Bus E). The data format used
in the SPECTRON is in accordance with the IEEE
Standard for floating-point computers (See Fig. 2)
[15, 16]. Additional valid-bit is used to guide data
accessing correctly arithmetic and memory units in
pipeline way. All units are controlled by a single Cen-
tral Control Unit (CCU) to guarantee the streaming
computing synchronization.
3.1 Arithmetic Units
The arithmetic units consists of two main com-
ponents, i.e. an FPMAC (Floating-Point Multiply-
Accumulator) core and a CORDIC (Coordinate Ro-
tation Digital Computer) core. The operations of the
Arithmetic units are controlled by the CCU through a
set of control paths. The architecture of the FPMAC
42 INTERNATIONNAL JOURNAL OF APPLIED BIOMEDICAL ENGINEERING VOL.6, NO.1 2013
S
31 30 23 22 0
EXPONENT (E) MANTISSA (M)
32 bits data (single precision) LSBMSB
V
Data Guide (Valid)
32
Fig.2: Data Format of the streaming processor.
TSB TSB
TSB TSB TSB
f5
1
0
e1
z
ef4
1
0
d1
z
d
ADD/SUBMUL
Bus E
Bus D
Bus A
Bus C
Bus B
c
e
d
b
a
d1
z = high impedance state
f9
f10
f11
f12
f13
f4
f5
2b
2b
1 0 1 1 0 102 0 2
a1 a2
b1 b2
c1 c2 f8
f7
f6
f1,2,3
e1
FPMAC
2b
2b
2b
Fig.3: Architecture of the reconfigurable FPMAC
core.
core is shown in Fig. 3.
3.1...1 FPMAC Core
The floating-point adder unit used in this work
uses a 3-stage pipeline architecture. The 3-stage
pipeline architecture is shown in Fig. 4. In the first
stage, three sub operations are made, i.e. sign iden-
tification, exponent difference and mantissa swap.
Mantissa alignment, leading-one-detection and man-
tissa adding/subtracting sub operations are made in
the second stage. The last stage is the packing out
of the operation result into the standard IEEE bi-
nary floating-point format. The last stage includes
overflow and underflow detection, rounding-logic and
exponent correction. The floating-point multiplier
is also a 3-stage pipeline architecture. The 3-stage
pipeline architecture of the multiplier is shown in
Fig. 5.
3.1...2 CORDIC Core
CORDIC (Coordinate Rotation Digital Com-
puter) algorithm was firstly introduced in 1959 [19],
[3]. The Nordic is a powerful iterative algorithm to
implement trigonometric function as well as other
functions such division function. The main advan-
tage of this algorithm is that it requires only simple
operator to implement the CORDIC hardware, i.e.
sign operator, adder/subtractor and shift operator.
Signa Signb Expa Expb Mantisaa Mantisab
  Sign identification 
Exponent difference 
Mantissa
   swap
New exp.
Exp. diff.
Sign
Mantissa alignment
         Add/sub
            LOD
Packing, overflow & underflow, rounding-logic
                    exponent correction
New mantisa
M1 M2
Output
Fig.4: Three-stage pipeline floating-point adder.
Signa Signb Expa Expb Mantisaa Mantisab
Mantissa unsigned 
     multiplication 
New Exp.Sign LOD
Packing, overflow & underflow, rounding-logic
                    exponent correction
New mantisa
Output
XOR Exponent
  add/sub
Fig.5: Three-stage pipeline floating-point multiplier.
Equ. (1) shows the general radix-2 CORDIC iterative
algorithm.
xk+1 := xk −m× Sk × 2
−k × yk
yk+1 := yk + Sk × 2
−k × xk
zk+1 := zk − Sk × Tk (1)
where m = {−1, 0, 1}, determines the modus of the
CORDIC algorithm. Sk is a sign function and de-
scribed in Equ. (2).
Sk =
{
−1 : zk ≤ 0
+1 : zk > 0
(2)
F.A. Samman and P. Surapong 43
The CORDIC is a powerful algorithm, because we
can implement many functions such as division, sine,
cosine, tangent, exponent, logarithmic, etc, in a sin-
gle CORDIC core with a special reconfiguration tech-
nique. More detail on the CORDIC algorithm can be
found in [11] and [10].
3.2 Memory Units
Memory units consists of 3 Dual-Port RAMs
(DPR1, DPR2 and DPR3) and a Shift-Register
(SREG). All memory units are also connected to 5
bus systems, where each input and output port of the
memory units can be multiplexed and demultiplexed
for flexible interconnect configuration purpose.
3.2...1 Dual Port RAM
Three Dual-Port RAMs DPR− j, j ∈ {1, 2, 3} are
used in the architecture allowing to perform stream-
ing computation of three input data streams concur-
rently. Two buses are implemented to enable paral-
lel streaming output computation and data storing
from the FPMAC and CORDIC cores. For exam-
ple by using the CORDIC core, storing a sine and
cosine outputs concurrently into two different mem-
ory units is possible. The DPR1, DPR2 and DPR3
are controlled respectively through a set of control
paths mj , nj and pj , where j ∈ {1, 2, · · · , 7}. Read-
Write operation modes of each Dual-Port RAM are
controlled by the CCU through eR, eW, rAdd,wAdd
paths for read-enable, write-enable, read address and
write address, respectively.
3.2...2 Shift-Register
The Shift-Register (SREG) is used to store the
history of data sampling. In many control and signal
processing applications, the historical data of a signal
are used to perform several time-based computation
for time-series analysis and frequency-based compu-
tation for spectrum analysis of the signal. There-
fore, for efficiency purpose, the Shift-Register is also
implemented in the processor. The SREG is con-
trolled by the CCU through a set of control paths
rj , j ∈ {1, 2, · · · , 7} and control paths eR, eW, rAdd
for read-enable, write-enable and write address, re-
spectively.
4. COMPUTATIONALMODEL AND CORE
CONFIGURATION
In general, the SPECTRON streaming processor
is targeted to perform high performance computa-
tion for modern real-time control algorithms. Some
modern control algorithms such as model reference
adaptive control algorithm, and parameter identifica-
tion algorithm and controller parameter calculation
for self-tuning adaptive control systems [1] are very
complex. Therefore, high-performance computations
are required to meet the real-time requirement.
This section will show one example of a computa-
tional model used in an adaptive Finite Impulse Re-
sponse (FIR) filter, where an Adaptive Least mean
Square (LMS) algorithm is used to adjust the filter
parameters. The next section (Section 4.2) will show
in detail how the stream processor are configured to
perform the adaptive LMS algorithm for the adaptive
FIR filter.
4.1 Adaptive Signal Processor Computational
Model
In this paper an adaptive finite impulse response
(FIR) filter architecture is used to in both ap-
plications. If there is a set of tap delay Ψ =
{0, 2, · · · , Ntap−1}, the filter output can be modeled
mathematically as in Equ. (3).
After the filter output is computed, the error sig-
nal between the filter output and the desired signal
d(k) is computed as described in Equ. (4). The pa-
rameters of the adaptive filter are then updated by
using a least-mean-square (LMS) algorithm as shown
in Equ. (5), where β is an adaptation gain. The LMS
algorithm was firstly introduced by Widrow et al. [20]
in 1960. The adaptation gain in the algorithm should
be chosen carefully to guarantee the stability of the
filter. The higher the adaptation gain, the faster the
filter converges to a steady-state condition, but the
lower the stability of the filter. More information
about adaptive filter theory can be found in Haykin’s
Book [8]. The use of the adaptive LMS filters for ac-
tive sound and vibration control applications, which
are general active adaptive control problems in adap-
tronic applications can be found in Elliot’s Book [4].
y(k) =
Ntap∑
j=1
wj(k)x(k − j), ∀j ∈ Ψ (3)
e(k) = d(k)− y(k) (4)
wj(k + 1) = wj(k) + βx(k − j)e(k), ∀j ∈ Ψ (5)
4.2 Core Configuration
The SPECTRON processor can be programmed
and reconfigured by changing the contents of a in-
struction vector memory (implemented as a part of
the CCU). The SPECTRON has a very long instruc-
tion word (VLIW) format. One instruction line repre-
sents a streaming computations. Six examples of the
computations shown in Table 1. is executed by using
one instruction line. The instruction vector consists
of bit fields used to enable tri-state-buffers (TSBs),
multiplexers, demultiplexer, as well as bit fields to
control the flow of streaming operations made by the
arithmetic units, and the read-write streaming opera-
tions of the memory units. Due to the parallel buses
and the multiple arithmetic and memory cores im-
plementation, more than one floating-point streaming
computations can be performed concurrently.
44 INTERNATIONNAL JOURNAL OF APPLIED BIOMEDICAL ENGINEERING VOL.6, NO.1 2013
Table 1: List of floating-point operations for the adaptive LMS signal processing.
No. Computation Flop. Bus A Bus B Bus C Bus D Bus E
Unit (from) (from) (from) (stored in) (stored in)
1 yj =
∑
j wjxj FPMAC xj n.a. wj n.a. aj
j ∈ Ψ (MUL,ADD) (SREG) (none) (DPR1) (none) (DPR2)
1.a y1 = aS−3 + 0 FPMAC aS−3 0 n.a. n.a. y1
(ADD) (DPR2) (DPR3) (none) (none) (DPR3)
1.b y2 = aS−2 + y1 FPMAC aS−2 y1 n.a. n.a. y2
(ADD) (DPR2) (DPR3) (none) (none) (DPR3)
1.c y = aS−1 + y2 FPMAC aS−1 y2 n.a. n.a. y
(ADD) (DPR2) (DPR3) (none) (none) (DPR3)
2 eβ = (d− y)β FPMAC d y β n.a. eβ
(SUB,MUL) (DPR1) (DPR2) (DPR3) (none) (DPR2)
3 wj = wj + eβxj FPMAC wj eβ xj new wj n.a.
j ∈ Ψ (MUL,ADD) (DPR1) (DPR2) (SREG) (DPR1) (none)
Note : xj = x(k − j), n.a. = not applicable
TSB TSB
TSB TSB TSB TSB TSB TSB
TSB TSB
TSB TSB TSB
TSB TSB
TSB TSB TSB
TSB TSB
ADD/SUBMUL
c
FPMAC
e
d
b
a
d1
f9=’01’
f10=’0’
f11=’11’
f12=’1’
f13=’0’
f4=’0’
f5=’1’
2b
2b
Bus B
Bus C
Bus A
SREG
e1
r5=’0’
r7=’0’0 1
1 20
1 0 1 01 10 202
r6=’100’
r3=’0’
r2=’0’
r1=’1’
Bus E
Bus D
c2c1a2a1
b1 b2
f1=’1’
f2=’0’
f3=’1’
f6=’10’
f7=’00’
f8=’10’0 1 0 1 10
x
x1
0
2x
xj
S−2x
S−1x
.....
.....
2b
2b
2b
DPR1
m5=’0’
m7=’0’0 1
1 20
m6=’001’
m3=’1’
m2=’0’
1
0
2
j
S−2
S−1
.....
.....
w
w
w
w
w
w
DPR2
n5=’1’
n7=’1’0 1
1 20
n6=’100’
n1=’0’
n2=’0’
n3=’0’
a
a1
0
2a
j
S−2a
S−1a
.....
.....
m1=’0’
n4=’0’m4=’0’r4=’0’
a
Fig.6: Example of core configuration for the streaming computation number 1 according to Table 1.
Due to a feedback datapath used in the MAC
(Multiply-ACcumulator) computations, the stream-
ing computation required a special attention. In
general, the resulting of the streaming MAC com-
putation can be obtained by accumulating the last
3 MAC computation results and perform 3 consecu-
tive adding operations as shown in Table 1. Other
non-feedback streaming computations can be per-
formed well in SPECTRON. The following Theorem
describes the short proof of the problem solution.
Theorem 1: If a streaming MAC (Multiply-
ACcumulator) computation y =
∑S
j=1 xjwj is made,
and the S number of the pipeline multiplication re-
sult mj = xjwj ,∀j ∈ {0, 1, 2, · · · , S − 1} is accumu-
lated in a pipeline way through an N -stage pipeline
adder, then the MAC computation result will be y =∑S−1
j=S−N aj, where aj is the accumulating result at
each pipeline stage j.
Proof: Since there is no feedback in the S num-
ber of streaming multiplications, then each jth mul-
tiplication result can be set as mj = xjwj , ∀ j ∈
{1, 2, · · · , S} as mentioned previously in the Theo-
rem. However, the pipeline accumulation has a out-
put feedback, which delayed for N cycle because of
the use of the N -stage pipeline adder. Hence, the
accumulation results can be formally written as
aj =


mj , 1 ≤ j ≤ N
mj + aj−N , N < j ≤ S
0, N < j ≤ S and mj = 0
(6)
If we expand Equ. (6) into a series of algebraic
equation then the following formulation is obtained.
akN+i =
∑
k∈κ
mkN+i, 0 ≤ i < N (7)
If the maximum number of streaming computation
F.A. Samman and P. Surapong 45
is S, then we can expand again Equ. (7) into the
following equation.
akN+i = mi +mN+i +m2N+i + · · ·+mkN+i (8)
· · ·
akN+2 = m2 +mN+2 +m2N+2 + · · ·+mkN+2 (9)
akN+1 = m1 +mN+1 +m2N+1 + · · ·+mkN+1 (10)
akN = mN +m2N +m3N + · · ·+mkN (11)
· · ·
a(k−1)N+3 = m3 +mN+3 + · · ·+m(k−1)N+3 (12)
a(k−1)N+2 = m2 +mN+2 + · · ·+m(k−1)N+2 (13)
a(k−1)N+1 = m1 +mN+1 + · · ·+m(k−1)N+1 (14)
a(k−1)N = mN +m2N + · · ·+m(k−1)N (15)
· · ·
a2N+1 = m1 +mN+1 +m2N+1 (16)
a2N = mN +m2N (17)
· · ·
aN+1 = m1 +mN+1 (18)
aN = mN (19)
· · ·
a1 = m1 (20)
If only the last N number of the above algebraic
equations is accumulated, we will have y =
∑S
j=1mj ,
which is in accordance with the formulas mentioned
in the Theorem 1, i.e.
y =
S−1∑
j=S−N
aj =
S∑
j=1
mj =
S−1∑
j=0
xjwj . (21)
Hardware Realization based on the Theorem: As
shown in Fig. 6 and as described in Table 1, we can
see that the stream variables xj and wj are stored
in SREG and DPR1, respectively. the accumulation
results aj is then stored in DPR2. Because the ADD
unit is a 3-Stage pipeline Adder, then in order to ob-
tain the variable y, the last three variables aj (aS−1,
aS−2, aS−3) stored in DPR2 must be accumulated.
The computations numbers 1.a, 1.b and 1.c in Table 1
show the detail how to make the operation.
5. SYNTHESIS USING CMOS STANDARD-
CELL AND FPGA TECHNOLOGY
The synthesis results of the SPECTRON processor
is shown in Table 2 and Table 3. In this current
processor implementation, we implement a CORDIC
core that can compute floating-point sine and cosine
functions. This core has also the largest logic cell
area. Due to larger area occupancy, a reconfigurable
will be designed in the future to implement various
trigonometric and logarithmic functions.
Table 2: Synthesis result using 130-nm CMOS
standard-cell technology from Faraday Technology
Corporation with target frequency 485 MHz.
Measured components Synth. result
Total logic cell area 1.5543386 mm2
Slack time (critical path) 1.96 ns
Switching power (1.32V) 68.305 mW
Internal power (1.32V) 249.276 mW
Table 3: Logic cell area of the components and sub-
components.
Component Cell area Percen-
(µm2) tage (%)
FPMAC 48888 3.1
ADD/SUB 15466 1.0
MUL 29513 1.6
CORDIC (Sine,Cos) 968340 58.5
DP-RAM 1 162158 9.2
DP-RAM 2 164179 9.2
DP-RAM 3 161556 9.2
SREG 134792 7.1
CCU 16422 1.0
Other Combinatorial logics 0.2
Table 4 shows also the synthesis result of the pro-
cessor on Virtex-5 FPGA device from Xilinx Corpo-
ration. By using the FPGA device, the SPECTRON
can be clocked until 186.108 MHz which is slower than
the synthesis result on CMOS standard-cell technol-
ogy.
6. CONCLUSIONS AND FUTURE WORKS
A streaming processor called SPECTRON has
been presented in this paper. SPECTRON can
be programmed and reconfigured for many em-
bedded real-time control engineering and embed-
ded high-performance signal processing problems.
SPECTRON can be clocked at 480 MHz when it
is synthesized using the 130-nm CMOS technol-
ogy. The processor can perform 4 parallel stream-
ing/pipeline floating-point operations using its FP-
MAC and CORDIC cores, resulting in a performance
of about 4 × 485 = 1.94 GFlops (Giga Floating-
point operation per second). In the future, the
Table 4: Synthesis result using Virtex-5 FPGA de-
vice (5v1x110ff1153-3) from Xilinx Corporation.
Utilization % of Total
Number of slice registers 14234 out of 69120 20%
Number of slice LUTs 27541 out of 69120 39%
Number of BRAMs 3 out of 128 2%
Minimum Delay 5.373 ns
Maximum Frequency 186.108 MHz)
46 INTERNATIONNAL JOURNAL OF APPLIED BIOMEDICAL ENGINEERING VOL.6, NO.1 2013
SPECTRON will be synthesized using newest fastest
CMOS technology to target 5 GFLops performance,
and will be interconnected using an on-chip network
to target Tera Flops performance for future multicore
high-performance streaming computer applications.
References
[1] K. J. A˚strom and B. Wittenmark. Adaptive Control. 2nd
Edition, Addition Wesley Publishing Company, 1995.
[2] S. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat,
J. Norris, M. Schuette, and A. Saidi. “The Reconfig-
urable Streaming Vector Processor (RSVP)”. In Proc.
36th Annual IEEE/ACM Int’l Symp. on Microarchitec-
ture (MICRO-36), pages 141–150, 2003.
[3] D. H. Daggett. “Decimal-Binary Conversions in
CORDIC”. IRE Trans. on Electronic Computers, EC-
8(3):335–339, Sep. 1959.
[4] S. Elliot. Signal Processing for Active Control. Academic
Press, London, 2001.
[5] A. Engebretson. “Benefits of digital hearing aids”.
IEEE Engineering in Medicine and Biology Magazine,
13(2):238–248, April-May 1994.
[6] B. Flachs, S. Asano, S. H. Dhong, H. P. Hofstee, G. Ger-
vais, R. Kim, T. Le, P. Liu, J. Leenstra, J. Liberty,
B. Michael, H.-J. Oh, S. M. Mueller, O. Takahashi,
A. Hatakeyama, Y. Watanabe, N. Yano, D. A. Broken-
shire, M. Peyravian, V. To, and E. Iwata. “The Microar-
chitecture of the Synergistic Processor for a Cell Proces-
sor”. IEEE Journal of Solid-State Circuits, 41(1):63–70,
Jan. 2006.
[7] J. Greenberg, B. Delgutte, and M. Gray. “Hands-on learn-
ing in biomedical signal processing”. IEEE Engineering in
Medicine and Biology Magazine, 22(4):71–79, July-Aug.
2003.
[8] S. Haykin. Adaptive Filter Theory. Prentice-Hall, New
Jersey, 1996.
[9] H. Janocha. Adaptronics and Smart Structures: Basics,
Materials, Design and Applications. 2nd ed. Springer,
2007.
[10] B. Lakshmi and A. S. Dhar. “CORDIC Architectures:
A Survey”. VLSI Design, Journal of Hindawi Pub. Co.,
2010:1–19, 2010.
[11] P. K. Meher, J. Valls, T.-B. Juang, K. Sridharan, , and
K. Maharatna. “50 Years of CORDIC: Algorithms, Ar-
chitectures, and Applications”. IEEE Trans. Circuits and
Systems-I: Regular Papers, 56(9):1893–1907, Sep. 2009.
[12] T. Melz, H. Hanselka, and M. Matthias. “Adap-
tronische Systeme fu¨r automotive Anwendungen am
Beispiel eines modularen, aktiven Strukturinterfaces
(in germany)”. Oldenbourg Wissenschaftsverlag, at-
Automatisierungstechnik, 54(6):284–293, 2006.
[13] K. Sano, O. Pell, W. Luk, and S. Yamamoto. “FPGA-
based Streaming Computation for Lattice Boltzmann
Method ”. In Proc. Int’l Conf. on Field-Programmable
Technology (ICFPT 2007), pages 233–236, 2007.
[14] F. Serra-Graells, L. Gomez, and J. Huertas. “A true-1-V
300-µW CMOS-subthreshold log-domain hearing-aid-on-
chip”. IEEE Journal of Solid-State Circuits, 39(8):1271–
1281, Aug. 2004.
[15] Standards Committee of the IEEE Computer Society.
“IEEE Standard for Binary Floating-Point Arithmetic”.
ANSI/IEEE Std. 754-1985.
[16] Standards Committee of the IEEE Computer Soci-
ety. “IEEE Standard for Floating-Point Arithmetic”.
ANSI/IEEE Std. 754-2008.
[17] M. Steffen and J. Zambreno. “A Hardware Pipeline for
Accelerating Ray Traversal Algorithms on Streaming Pro-
cessors”. In Proc. IEEE 8th Symp. on Application Specific
Processors (SASP’10), pages 22–29, 2010.
[18] S. R. Vangal, Y. V. Hoskote, N. Y. Borkar, and
A. Alvandpour. “A 6.2-GFlops Floating-Point Multiply-
Accumulator With Conditional Normalization”. IEEE
Journal of Solid-State Circuits, 41(10):2314–2323, Oct.
2006.
[19] J. E. Volder. “The CORDIC Trigonometric Computing
Technique”. IRE Trans. on Electronic Computers, EC-
8(3):330–334, Sep. 1959.
[20] B. Widrow and M. E. Hoff. “Adaptive switching cir-
cuits”. IRE WESCON Convention Record, Part 4(Los
Angeles):96–104, 1960.
Faizal Arya Samman was born
in Makassar, Indonesia. He received
his Bachelor of Engineering degree in
Electrical Engineering from Universitas
Gadjah Mada (UGM), Yogyakarta in
1999 and his Master of Engineering de-
gree from Institut Teknologi Bandung
(ITB) in 2002. Since 2002 he has been
appointed to be a research and teach-
ing staff at Universitas Hasanuddin in
Makassar, Indonesia. He received his
PhD degree in 2010 from Technische Universita¨t Darmstadt,
Germany with scholarship award (2006-2010) from Deutscher
Akademischer Austausch-Dienst (DAAD, German Academic
Exchange Service). From 2010 until 2012, he was a postdoc-
toral fellow in the research project in LOEWE-Zentrum AdRIA
(Adaptronik-Research, Innovation, Application) within the re-
search cooperation framework between Technische Universita¨t
Darmstadt and Fraunhofer Institut LBF in Darmstadt. He is
now a lecturer and research fellow at Department of Electrical
Engineering, Faculty of Engineering, Universitas Hasanuddin,
in Makassar, Indonesia. His research interests include net-
work on-chip (NoC) microarchitecture, NoC-based multipro-
cessor system-on-chip, design and implementation of analog
and digital electronic circuits for control system applications on
FPGA/ASIC as well as energy harvesting systems and wireless
sensor networks.
Pongyupinpanich Surapong] was
born in Prachinburi, Thailand. He
received his Bachelor and Master of
Engineering degree in Electrical Engi-
neering from King Mongkuts Institute
of Technology Ladkrabang (KMITL),
Thailand in 1998 and 2002, respec-
tively. He received his PhD degree in
2012 from Technische Universita¨t Darm-
stadt, Germany. Currently, he is now
a lecturer and research fellow at De-
partment of Computer Engineering, Faculty of Engineering,
Ramkhamhaeng University, in Bangkok, Thailand. His re-
search interests include computer-aided VLSI design, hardware
modeling, design optimization algorithm, circuit simulation,
digital signal processing, system-on-chip, biosensors and trans-
ducers, all in the context of field-programmable gate-array de-
vices and VLSI technology.
