Characterization and synthesis of a 32-bit asynchronous microprocessor in synchronous reconfigurable devices  by Pedroza de la Crúz, Adrian et al.
Ar
a
m
a
g
c
c
1
A
o
K
1
a
l
t
b
t
t
r
i
A
i
M
1
iAvailable  online  at  www.sciencedirect.com
Journal  of  Applied  Research
and  Technology
www.jart.ccadet.unam.mxJournal of Applied Research and Technology 13 (2015) 483–497
Original
Characterization and synthesis of a 32-bit asynchronous microprocessor
in synchronous reconfigurable devices
Adrian Pedroza de la Crúz a, José Roberto Reyes Barón b, Susana Ortega Cisneros a,∗,
Juan José Raygoza Panduro b, Miguel Ángel Carrazco Díaz a, José Raúl Loo Yau a
a Centro de Investigación y Estudios Avanzados, del Instituto Politécnico Nacional, Unidad Guadalajara, Zapopan, Jalisco, México
b Centro Universitario de Ciencias Exactas, e Ingenierías, Universidad de Guadalajara, Guadalajara, Jalisco, México
Received 3 October 2014; accepted 25 August 2015
Available online 26 October 2015
bstract
This paper presents the design, implementation, and experimental results of 32-bit asynchronous microprocessor developed in a synchronous
econfigurable device (FPGA), taking advantage of a hard macro. It has support for floating point operations, such as addition, subtraction,
nd multiplication, and is based on the IEEE 754-2008 standard with 32-bit simple precision. This work describes the different blocks of the
icroprocessors as delay modules, needed to implement a Self-Timed (ST) protocol in a synchronous system, and the operational analysis of the
synchronous central unit, according to the developed occupations and speeds. The ST control is based on a micropipeline used as a centralized
enerator of activation signals that permit the performance of the operations in the microprocessor without the need of a global clock. This work
ompares the asynchronous microprocessor with a synchronous version. The parameters evaluated are power consumption, area, and speed. Both
ircuits were designed and implemented in an FPGA Virtex 5. The performance obtained was 4 MIPS for the asynchronous microprocessor against
.6 MIPS for the synchronous.
ll Rights Reserved © 2015 Universidad Nacional Autónoma de México, Centro de Ciencias Aplicadas y Desarrollo Tecnológico. This is an
pen access item distributed under the Creative Commons CC License BY-NC-ND 4.0.
eywords: Asynchronous; Microprocessor; Floating point; FPGA delay macro; Real time
g
c
m
d
s
s
w
o
t
m
p
t
S
b.  Introduction
Nowadays, most successful implementations obtained in
synchronous microprocessors have been developed at the ASIC
evel. Asynchronous design has been used from the beginning of
he computer age, even before the VLSI technology was possi-
le. Due to the introduction and advances of integrated circuits,
he paradigm of synchronous design became popular and came
o be the dominant design style (Chu & Lo, 2013). However, in
ecent years, asynchronous design has had a comeback in ASIC
mplementations (Beerel, 2002; Lavagno & Singh, 2011; Smith,
l-Assadi, & Di, 2010).
Programmable devices are an excellent option for develop-
ng cheaper and faster digital circuit prototypes, due to their∗ Corresponding author.
E-mail address: sortega@gdl.cinvestav.mx (S. Ortega Cisneros).
Peer Review under the responsibility of Universidad Nacional Autónoma de
éxico.
t
t
t
i
a
http://dx.doi.org/10.1016/j.jart.2015.10.004
665-6423/All Rights Reserved © 2015 Universidad Nacional Autónoma de México,
tem distributed under the Creative Commons CC License BY-NC-ND 4.0.reat integration capability and flexibility. In that context, asyn-
hronous design can be performed using FPGAs devices. To
ake this platform practical and useful to the asynchronous
esign, some Self-Timed (ST) control block techniques and
teady/latch delays are required. This allows us to build the ST
ynchronization circuits. Most of the microprocessors are made
ith a global clock synchronization system, in which the whole
r part of the circuit is subject to a unique pulse line, which dis-
ributes and synchronizes data transfer. In addition, synchronous
icroprocessors that use a single clock can bring about various
roblems due to the high demand of processing. To overcome
his problem, asynchronous systems are proposed, since in an
T synchronization system, the control of data transfer between
locks is regulated through local signing lines that indicate
he request and data transfer between contiguous blocks. Since
hese types of systems do not depend on a global clock, they
ake full advantage of the speed and energy consumption when
mplemented in programmable devices. Asynchronous systems
re relatively new, but they present better performance than
 Centro de Ciencias Aplicadas y Desarrollo Tecnológico. This is an open access
484 A. Pedroza de la Crúz et al. / Journal of Applied Research and Technology 13 (2015) 483–497
Table 1
Asynchronous microprocessors.
Microprocessor Architecture Technology Performance
Caltech (Martin, Burns, Lee, Borkovic, &
Hazewindus, 1989)
4-phase, dual rail, 5-stage pipeline, 16-bit
RISC.
20,000 1.6 m transistors 18 MIPS
NRS (Brunvand, 1993) 2-phase, single rail, 5-stage pipeline, 16-bit
RISC.
FPGA Actel 1.3 MIPS
AMULET1 (Furber, Day, Garside, Paver, &
Woods, 1994)
2-phase, single rail, 5-stage pipeline, based
on a 32-bit ARM.
60,000 1.0 m transistors 9k Dhrystones
TICTAC 1 (Murata, 1989) 2-phase, dual rail, 2-step non-pipeline, 32-bit
RISC.
22,000 1.0 m transistors 11.2 MIPS
FRED (Richardson & Brunvand, 1996) 2-phase, single rail, multifunctional pipeline,
based on a 16-bit 88100.
Defined in VHDL 120 MIPS
80C51 (van Gageldonk et al., 1998) 4-phase, single rail, CPU and peripherals,
8-bit CISC.
27,4820 1.6 m transistors 2.10 MIPS
AMULET2 (Furber et al., 1999) 4-phase, single rail, forwarding pipeline,
based on a 32-bit ARM.
450,000 0.5 m transistors 42 MIPS
TICTAC 2 (Takamura et al., 1998) 2-phase, dual rail, 5-stage pipeline, based on
a 32-bit MIPS R 3000.
496,000 0.5 m transistors 52.3 VAX MIPS
AMULET3 (Furber, Edwards, & Garside,
2000)
4-phase, single rail, forwarding pipeline,
based on a 32-bit ARM.
113,000 0.35 m transistors 120 MIPS
BitSNAP (Ekanayake, Nelly, & Manohar,
2005)
4-phase, dual rail, based on 16, 32, and
64-bit SNAP ISAs
0.18 m CMOS 6–54 MIPS
NCTUAC18S (Hung-Yue, Wei-Min,
Yuan-Teng, Chang-Jiu, & Fu-Chiung,
4-phase, dual rail, 5-stage pipeline, based on
and 8-bit PIC18 ISA.
0.13 m TSMC n/a
t
c
i
G
i
i
p
I
D
l
i
a
n
t
2
a
s
y
t
(
M
g
&
d
f
d
1
2
i
p
1
2
3
r
3
a
b
b
F2011)
heir homologous synchronous systems. Moreover, micropro-
essors with asynchronous systems can be easily implemented
n FPGAs (Ortega-Cisneros, Raygoza-Panduro, & de la Mora-
álvez, 2007; Tranchero & Reyneri, 2008).
This paper presents the design, implementation, and exper-
mental results of an asynchronous 32-bit microprocessor
mplemented in a Xilinx FPGA Virtex 5 that are developed in a
latform designed exclusively for synchronous circuits (Xilinx
nc., 2015). The FPGAs uses synchronous components, such as
CM (digital clock manager) and DLL (delay-locked loop) uti-
ized by the software tools in order to synthesize a design. This
mplementation can be performed by means of a ST pipeline as
n activation signal generator block, as well as the hard macro
eeded to generate the delay time for the ST asynchronous pro-
ocol.
.  Background  of  Self-Timed  circuits
The potential benefits of asynchronous logic have caused
 resurgence of interest in the design methodology of these
ystems, which have received an important boost in recent
ears (Edwars & Toms, 2003; Geer, 2005). Recent initia-
ives in the industrial field include smart cards from Philips
Yoshida, 2003), Sun (Johnson, 2001), and Sharp (Terada,
iyata, & Iwata, 1999). There are many important research
roups specializing in asynchronous microprocessors (Werner
 Akella, 1997). This section describes the architecture and
esign style of some of these. Table 1 summarizes the main
eatures.
The microprocessors described in Table 1 can be broadly
ivided into two categories:
3
m. Those constructed using a conservative time model, suit-
able for formal synthesis or verification, but with a simple
architecture: TITAC.
. Those constructed using less care in the time models, with an
informal design approach, but with a more ambitious archi-
tecture: AMULET, NSR, FRED.
Another consideration that may be taken to evaluate the
mplementation of asynchronous circuits in the area of micro-
rocessors, is the type of application and implementation:
. Microprocessors used in commercial applications:
AMULET, Philips 80C51.
. Those implemented in full-custom: Caltech, TICTAC.
. Those that have only been proposed: FRED.
Fig. 1 shows a power consumption graph, where the power
ange is between 9 mW and 2 W.
.  Self-Timed  microprocessor  architecture
Characteristics and components of the ST microprocessor
re defined in this section. The fundamental parts are the delay
lock and the control unit, which activates all the microprocessor
locks and gives a sequence of how to execute each instruction.
ig. 2 shows the ST microprocessor diagram..1.  Asynchronous  microprocessor  structure
The ST control unit is based on FIFOs that contain
icropipelines of asynchronous control blocks (ACB) using
A. Pedroza de la Crúz et al. / Journal of Applied R
2500
2000
1500
1000
500
0
Po
w
er
 (m
W
)
Am
ul
et
 1
Am
ul
et
 2
Am
ul
et
 3
Ti
ct
ac
 1
Ti
ct
ac
 2
80
C5
1
Asynchronous microprocessors
t
M
&
i
a
c
a
e
a
l
b
p
t
p
o
m
a
p
i
t
A
o
a
p
e
3
c
T
r
f
i
t
L
&
S
A
f
e
a
i
f
t
t
o
o
s
m
i
c
l
of the 5 FIFOs blocks of the executing cycle is going to per-Fig. 1. Power consumption for different asynchronous microprocessors.
he 4-phase single rail protocol (Jung-Lin, Hsu-Ching, Chia-
ing, & Sung-Min, 2006; Ortega, Gurrola, Raygoza, Pedroza,
 Terrazas, 2009). This protocol is used because communication
s more effective than 2-phase single rail protocol on FPGAs,
s it fulfills the necessary characteristics for substituting syn-
hronous FIFOs. The 2-phase protocol uses fewer transactions
nd requires less energy consumption than the 4-phase. How-
ver, the latter ensures a stable asynchronous communication
nd is better adapted to the requirements of circuits that use
atches. The 4-phase protocol occupancy is smaller than 2-phase,
ecause an ACB implementation for the first requires fewer com-
onents than the second. Also, single rail requires less hardware
han dual rail protocols.
The ST microprocessor developed in this work is a general
urpose design. It is controlled with an asynchronous block that
rders the data flow through all the logic components. The ST
icroprocessor is based on micropipeline structures that gener-
te activation pulses toward different modules, using the 4-phase
rotocol. The first element described is the control unit, shown
n Fig. 3. It uses asynchronous control blocks along with delays
o adjust the time required for the request signal between each
CB. Compared with synchronous controllers, which depend
f
F
t
Control
unit
MEMORY
OPR
(Operation
register)
Ram
Rom
MAR
(Memory
access register)
Fig. 2. Asynchronous esearch and Technology 13 (2015) 483–497 485
n the slower process in order to optimize the clock speed,
synchronous versions improve the delay from each individual
rocess to the minimum possible and reduce the program time
xecution.
.2.  Asynchronous  control  unit
An explanation of changing synchronous FIFOs for asyn-
hronous is given here. Flip Flops (FF) are replaced by ACBs.
hus, instead of having a clock signal that activates each FF, a
equest signal is send sequentially to all ACBs.
Fig. 4 shows the asynchronous FIFO that corresponds to the
etch cycle. The time for the request to go through all X signals
s determined by a hard macro delay block implemented with
he Xilinx FPGA Editor tool. The delay consists of a single
ookUp-Table (LUT) assigned as a buffer (Ortega, Raygoza,
 Boemo, 2005). With this macro, it is possible to implement
elf-Timed designs on synchronous FPGAs. The FIFO uses 4
CBs in a micropipeline to generate the signals that activate the
etch cycle.
One important problem emerges when the designer uses
xactly the same number of delay macros needed to achieve
 specific time accorded to a specific delay graph. When try-
ng to implement the design into the FPGA, the time generated
rom the automatic routing of the software may be different every
ime. As a consequence, the design will not work properly, since
he delay macro time could be lower than the expected value. In
rder to avoid this problem, a place and route restriction is rec-
mmended in order to ensure the same delay time in each design
ynthesis. In addition, the sum of the logic delay and track delay
ust be greater than the delay of the processing logic function
mplemented in order to ensure stability of the output of the syn-
hronizing circuit before a new entry is applied. Fig. 5 exhibits
inear logic and non-linear track delays generated by the macros.
The selector shown in Fig. 6 is a decoder that indicates whichorm the instruction. This decoder sends a request to the chosen
IFO and awaits the acknowledgment signal; it also transmits
he request to the fetch cycle when the execution cycle ends.
I/O ports
PC
(Program
counter)
GPR
(Gegenral
purpose
register)
ALU
(Arithmetic
logic unit)
microprocessor.
486 A. Pedroza de la Crúz et al. / Journal of Applied Research and Technology 13 (2015) 483–497
Fetch cycle
Se
le
ct
or
Start
Microinstructions
Instruction
(OPR output)
Rec f1
FIFO 1 (2 ACB’s) FIFO 1 Selector
FIFO 2 Selector
FIFO 3 Selector
FIFO 4 Selector
FIFO 5 Selector
FIFO 2 (2 ACB’s)
FIFO 3 (6 ACB’s)
FIFO 4 (8 ACB’s)
FIFO 5 (10 ACB’s)
Execution cycle Microinstructions
Rec f2
Rec cb
Rec cb
Ack cb
Ack cb
Ack f1
Ack f2
Rec f3
Rec cb
Ack cb
Ack f3
Rec f4
Rec cb
Ack cb
Ack f4
Rec f5
Rec cb
Ack cb
Ack f5
ronou
I
s
a
L
e
t
i
3
c
t
s
o
p
p
A
c
s
i
m
r
iFig. 3. Asynch
n the asynchronous controller, this selector also performs the
ame process as the synchronous controller, i.e., loading a data
nd an instruction from port Sum (LPS). After the instruction
PS has been performed, the selector returns a request to the
xecution cycle in order to be able to execute another instruc-
ion. FIFOs execution cycle instruction selectors are the same as
n the synchronous controller.
.3.  Signals  that  activate  the  FIFOs  of  the  asynchronous
ontrollerSince asynchronous FIFOs can wait the necessary time
o deliver the next request, they may be optimized. In the
ynchronous version, some instructions require skipping
t
v
a
I_Req
X1
Δ1
Δ2
Δ3 + ... + Δn–1
X2
X3
Xn
Fig. 4. Micropipeline wis control unit.
ne clock pulse, because of the waiting time, as the extra
eriod required for complex operations in order to finish their
rocesses. All execution FIFOs have a different quantity of
CBs (or FF for synchronous FIFOs), which depend on the
omplexity of the instructions and the number of activation
ignals. Each FIFO is used to execute different instructions
n order to save FPGA area or hardware. Fig. 3 shows a
icroinstruction selector for each FIFO block.
Table 2 shows three examples of instructions with their cor-
esponding FIFOs. From left to right: the mnemonic of the
nstruction (MNE), the FIFO that can execute the instruction,
he number of the ACB that activates each signal, and the acti-
ated signals (microinstructions). The instructions presented
re: accumulator complement (NAC), load direct memory to
Δ1
Δ3 + ... + Δn–1
Δ2
th 4-phases ACBs.
A. Pedroza de la Crúz et al. / Journal of Applied Research and Technology 13 (2015) 483–497 487
140
120
100
 80
60
40
20
0
1 5 10 15 20 25 30 35 40
Hard macros
45 50 55 60 65 70 75 80 85 90 95 100
Ti
m
e
 (n
s)
Total delay Track delay Logical delay
a
t
t
o
s
b
4
4
t
d
i
o
r
s
Table 2
Signals that activate asynchronous instructions.
MNE FIFO N◦ ABC Signals
NAC 1 1 Compl acc
2 Acc clk
LDA 4 1 Gpr mar
2 Mar clk
3 Ram clk
4 M gpr
5 Gpr clk
6 Time to process
7 Load gpr
8 Acc clk
CSR 5 1 Gpr mar
2 Mar clk
3 Pc gpr
4 Gpr clk
5 Mar pc
6 Pc clk
7 W ram
8 Ram clk
9 Inc pc
10 Pc clk
A
r
p
s
A
gFig. 5. Delay macros on FPGA Virtex 5.
ccumulator (LDA), and subroutine call (CSR). In order to know
he time each fetch or execution cycle takes, it is necessary either
o implement the design on the FPGA and get a timing analysis
r simulate a program in the microprocessor. This is noteworthy,
ince in the synchronous design only the clock frequency must
e known.
.  Microprocessor  instructions
.1.  Arithmetic  Logic  Unit
The Arithmetic Logic Unit (ALU) is an important part of
he microprocessor, as it develops all the operations between
ata. These operations are logical, arithmetic, ports and reg-
sters access, floating point arithmetic, and bit shifting. These
perations are performed in parallel and a selector is used. The
esult of the desired operation is chosen and the output of this
elector is stored in a register called the accumulator (Acc). The
o
g
•
•
•
•
• • 
•• 
•
••
•
Start
Fetch cycle
FIFO
Rec cb Ack fi 
FIFOs selector
(Execution cycle)
Reset
Reset FI
FO
 1
FI
FO
 2
Instruction
(OPR out)
Fig. 6. Asynchronous contrLU uses 32-bit data to perform the operations. Data may be
eceived from the General Purpose Register (GPR), the input
ort (Port in), and the internal registers. The latter are used to
tore the information to be processed immediately. Also, the
LU contains a one-bit flag (register F), which stores the carry
enerated by arithmetic and shift operations. The block diagram
f the ALU is shown in Fig. 7.The ALU was designed to perform 29 operations and the
eneral reset. The operations are shown in Table 3. A brief
• •
•
•••
• •
•
••
••
•
Selector
fos Ack cb Rec fifos
Multiplexer
(Fetch cycle)
FI
FO
 3
FI
FO
 4
FI
FO
 5
oller’s main selector.
488 A. Pedroza de la Crúz et al. / Journal of Applied Research and Technology 13 (2015) 483–497
1
R
eg
ist
er
s
Ar
ith
m
et
ic
s
Fl
oa
tin
g
po
in
t
Sh
ift
Lo
gi
cs
Reg. F
Selector
2
3
4
5
Acomulator selector
D
Register
Register
Q
D Q
Register
D Q F
Port out
ACC output
6
7
8
9
10
11
12
13
14
15
16
17
19
20
21
22
23
24
25
26
27
28
29
30
18
GPR input
Port in
metic
e
p
4
a
b
2
O
t
i
b
T
d
t
s
t
T
i
2
s
f
t
o
T
n
t
a
a
5
a
microprocessors on a Virtex 5 FPGA (ML501). SimulationsFig. 7. Arith
xplanation of the floating point operations that the ALU
erforms are presented later.
.2.  Floating  point  arithmetic  operations
The floating point operations that the ALU performs are:
ddition, subtraction, and multiplication. These operations are
ased on Floating-Point Arithmetic IEEE 754-2008 (IEEE,
008) with 32-bit simple precision.
The adder–subtractor design can be seen in Fig. 8 (Raygoza,
rtega, Carrazco, & Pedroza, 2009). The first step is to iden-
ify the type of data that are present in the input, as proposed
n Table 4. The second step is to send the data to one of the 4
locks that perform the addition, depending on the type of data.
hen, the exponents are aligned with the same value. After that,
epending on the sign of the data, addition or subtraction of man-
issas is executed. In the special case in which data do not repre-
ent any particular number, such as infinite ones, zero and NaN,
he recommendation is to employ the symbolic operation block.
he final step is to choose the correct output with the multiplexer.
The design and steps of the floating point multiplier are shown
n Fig. 9 (Ortega, Raygoza, Pedroza, Carrazco, & Loo-Yau,
a
e
P
a Logic Unit.
010). The first step is to identify the data type. In the second
tep, the mantissas multiplication and exponent addition are per-
ormed. In the case that both inputs are infinite, zeros or NaNs,
he multiplication operation is performed through the symbolic
peration. In the last step, the data output could be normalized.
his adjustment is done with the idea of obtaining a normal
umber as a result.
Compared to the arithmetic multiplier, the floating point mul-
iplier delivers a 32-bit result. The results of the adder–subtractor
nd the multiplier go directly to the selector. From there, the
ccumulator can choose them.
.  Implementation  results
This section compares the occupations, power consumption,
nd components between asynchronous and synchronousre performed and tested in real time. The times the fetch and
xecution cycles take for each microprocessor are obtained.
erformance results of both microprocessors are presented with
 test program.
A. Pedroza de la Crúz et al. / Journal of Applied Research and Technology 13 (2015) 483–497 489
S SExp ExpGpr GprAcc Acc
Acc input
Selector
(Data type)
Normal &
Subnormal
adder
Subnormal
adder
Multiplexer
S Exp C Mantissa c
Normal
Adder
GPR input
Mantissa Mantissa
Symbolic
operations
oint a
5
o
o
s
p
t
u
I
F
t
i
m
c
d
5Fig. 8. Floating p
.1.  Occupation
Table 5 reports the occupation only for the control units
f the asynchronous and synchronous systems. The common
ccupations of the other microprocessor components are
hown in Table 6. The DSPs blocks are used to accelerate the
rocesses, for example, floating point arithmetic operations in
he ALU. The embedded RAM memories from Virtex 5 are
sed to implement the main memory of both microprocessors.
f the main memory is designed using LUTs (distributed RAM),
PGA resources increase considerably. As mentioned above,
he PC, GPR, OPR, ALU blocks, and the memory are shared
c
e
c
Exp 
S ExpAcc Acc
Acc input
Selector
(Data type
Expo
ad
Mantis
Multiplexe
S Exp C M
Mantissa 
Symbolic
operations
Sign
Sign
Fig. 9. Floating poidder–subtractor.
n both processors; consequently the resulted occupations are
entioned only once.
Table 7 shows the occupation of all elements of the asyn-
hronous and synchronous microprocessors. Note that the
ifferences of the final occupations are considerably lower.
.2.  SimulationThis subsection shows some simulations of both micropro-
essors on Virtex 5 FPGA. In order to measure the timing of
ach execution FIFO and the fetch FIFO, signals c fetch and
 execution were implemented to show the start point of each
Exp
Adjust
S ExpGpr Gpr
)
nents
der
sa
Mantissas
Multiplier
r
antissa C
Gpr input
Mantissa
nt multiplier.
490 A. Pedroza de la Crúz et al. / Journal of Applied Research and Technology 13 (2015) 483–497
Table 3
Arithmetic Logic Unit operations.
N◦ Signal Process
1 Regx clk RegX = 0, Register X clock
2 Regy clk RegY = 0, Register Y clock
3 RegHi clk RegHi = 0, Register High clock
4 RegLo clk RegLo = 0, Register Low clock
5 Dec acc Acc = Acc − 1
6 Add gpr Acc = Acc + GPR, F = Carry
7 Load gpr Acc = GPR
8 Rotate Right Acc = {Acc[n-1:0], Acc[n]}
9 Rotate Left Acc = {Acc[0], Acc[n:1]}
10 Compl acc Acc = ∼Acc
11 Shift Right Acc = {Acc[n-1:0], 1′b0}, F = Acc[n]
12 Comp regx Acc = Acc = <RegX
13 Inc acc Acc = Acc + 1, F = Carry
14 Load regx Acc = RegX
15 Load regy Acc = RegY
16 And xy Acc = RegX And RegY
17 Or xy Acc = RegX Or RegY
18 Load Pin Acc = Port in
19 Subt gpr Acc = Acc – GPR
20 Multp gpr {RegHi, RegLo}  = Acc * GPR
21 Shift Left Acc = {1′b0, Acc[n:1]}, F = Acc[0]
22 MultpHi gpr Acc = RegHi
23 MultpLo gpr Acc = RegLo
24 Addpf gpr Acc = Acc + GPR, floating point
25 Subtpf gpr Acc = Acc − GPR, floating point
26 Multppf gpr Acc = Acc * GPR, floating point
27 Acc clk Acc = 0, Accumulator clock
28 Pout clk Port out = Acc
29 F clk RegF = 0, Register F clock
30 Reset General Reset
Table 4
Identification of data type in floating point.
Data type Identification
Zero 3′b000
Subnormal 3′b001
Normal 3′b10X
Infinite 3′b110
NaN 3′b111
Table 5
FPGA occupation for asynchronous and synchronous control units.
Component LUT Slices Regs. Macros
Available 28,800 7200 28,800 28,800
Synchronous 117 30 46 0
Asynchronous 197 50 11 163
Regs. (Registers).
Table 6
Occupation of common components on Virtex 5 FPGA.
Cmp. LUTs Slices Regs. DSP RAM
Free 28,800 7200 28,800 48 48
PC 34 9 32 0 0
GPR 40 10 33 0 0
MAR 34 9 32 0 0
OPR 6 2 6 0 0
Memory 0 0 0 0 2
ALU 3410 853 169 6 0
Cmp. (Component), Regs. (Registers).
Table 7
FPGA occupation for asynchronous and synchronous microprocessors.
Cmp. LUT Slices Regs. I/O Macr.
Free 28,800 7200 28,800 440 28,800
Sync. 3622 1454 301 105 0
Async. 3529 1598 266 104 163
Cmp. (Component), Regs. (Registers), Macr. (Macros).
s
p
s
c
T
s
l
l
w
m
w
c
F
i
t
c
d
t
c
f
o
T
i
A
5
m
p
t
o
t
p
F
m
t
r
o
itage. In these simulations, the OPR, PC, Acc, input and output
ort were monitored, along with the fetch and execution start
ignals.
Fig. 10 shows a simulation that includes the measured fetch
ycle for the synchronous microprocessor working at 50 MHz.
he fetch cycle was 38.988 ns. The process performed in this
imulation is as follows: First, from the output port, a data was
oaded into the accumulator (instruction 0E); then, the accumu-
ator was complemented (instruction 01); and finally, the result
as placed in the output port (instruction 0F).
Fig. 11 shows a simulation that includes the fetch cycle
easure for the asynchronous microprocessor. The fetch cycle
as 25.648 ns. This time was 13.34 ns less than the syn-
hronous version. The developed process is similar to that in
ig. 10.
Fig. 12 presents a synchronous microprocessor simulation,
n which an instruction carried out by FIFO 5 was used, and
he time that it takes to perform the corresponding execution
ycle was 100.671 ns. The process this simulation performs is
escribed below. The data were loaded from the output port into
he accumulator (instruction 0E); afterwards, using the memory
ontent (zero), an indirectly floating point addition was per-
ormed (instruction 28); finally, the result was moved to the
utput port (instruction 0F).
Fig. 13 shows an asynchronous microprocessor simulation.
he time that FIFO 5 takes to realize the corresponding execution
s 56.275 ns. The developed process is similar to that in Fig. 12.
 complete graph with all FIFOs is shown later.
.3.  Real  time  implementation
This subsection analyzes fetch and execution cycles of both
icroprocessors when implemented in real time on Virtex 5. The
rocesses and the instructions that were tested are the same as
hose used in the simulation of Figs. 10 and 11. In real time,
nly the last 8 bits are shown (in hexadecimal) for each of
he monitored signals, since the card ML501 has only 32 user
ins, and the rest are used to connect several peripherals to the
PGA.
Figs. 14 and 15 present the processes in real time for each
icroprocessor. They report the timing of the fetch cycles. The
ime was 40 ns and 16 ns for the synchronous and asynchronous,
espectively.Fig. 16 presents a graph with simulation times for each cycle
f both microprocessors as well as in real time. It is worth not-
ng that for the asynchronous microprocessor there is a wider
A. Pedroza de la Crúz et al. / Journal of Applied Research and Technology 13 (2015) 483–497 491
Messages
clk
reset
ini
0
0
0
0
000000003
00000014 00000001
E
0000000200000003
00000003
0
0
0
00
pto_in
f
z
out_pc
out_acc
pto_out
AV
out_opr
c_fetch
c_execution
Cursor   1
3000 nsNow
250.13 ns
300 ns 350 ns
01
000000
309.82 ns 309.82 ns
348 808 ns48 808 ns
38 988 ns
Start, fetch cycle
End, fetCursor   2
Cursor   3
proces
d
t
f
a
c
p
i
5
t
fFig. 10. Synchronous micro
ifference between the simulation and the real time, while for
he synchronous microprocessor there are not considerable dif-
erences, since simulations and real time implementation work
t the same clock speed (50 MHz).
The measurements obtained in real time are the most pre-
ise, since they were obtained directly from the FPGA. The next
erformance measures are based on resulted timing from the
mplementation in real time.
m
l
e
Messages
reset
ini
0
0
0
00000003
00000014 0000
E
00000003
00000003
0
0
0
00
pto_in
f
z
out_pc
out_acc
pto_out
AV
out_opr
c_fetch
c_execution
Cursor 1
4000 nsNow
ns
Cursor 2
Cursor 3
52 279 ns
200.36 ns
26 008 ns
Fig. 11. Asynchronous microprocesor simulation: fetch cycle.
.4.  Power  consumption
Table 8 reports the microprocessors power consumption in
he FPGA (Hasan & Zafar, 2012). The measurements were per-
ormed with the Xpower Analyzer of Xilinx, which delivers
easurements of the FPGA in stable state. Table 8 reports a
ower consumption in the asynchronous microprocessor. How-
ver, this difference is not significant, as both microprocessor
0001
00000002
200   ns
01
00000
200.36 ns
226 008 ns
25 648 ns
Start, fetch cycle
End, fet
ssor simulation: fetch cycle.
492 A. Pedroza de la Crúz et al. / Journal of Applied Research and Technology 13 (2015) 483–497
Messages
reset
clk
ini
00000012
0
0
0
00000003
00000014
00000003
00000003
0
0
0
0
00
pto_in
z
f
out_pc
out_acc
pto_out
AV
out_opr
c_fetch
c_execution
Cursor 13
3000 ns 0 ns
28
Now
Start, execution cycle
Cursor 14
Cursor 15
69 303  ns
68 808 ns
69 479 ns
2468 808 ns 100.67
2500 ns
Fig. 12. Synchronous microprocessor simulation: execution cycle (FIFO 5).
Messages
reset
ini
00000012
0
0
00000003
00000014
00000003
00000003
0
0
0
0
00
pto_in
z
f
out_pc
out_acc
pto_out
AV
out_opr
c_fetch
c_execution
Cursor 14
4000 ns
2 8
Now
Start, execution cycle
Cursor 15
Cursor 16
2279 ns
6235 ns
22.51 ns
3066 235 ns 56 275 ns
3080 ns
sor sim
o
p
i
b
t
X
T
M
M
C
L
S
I
T
T
T
r
t
iFig. 13. Asynchronous microproces
ccupations in the FPGA are similar. The Xpower Analizer tool
resent the maximum power consumption.
In order to evaluate the power consumption in real time, an
nstrumentation and measurement workstation is set to obtain a
etter comparison between both microprocessors. The evalua-
ion includes the ML501 board, and not only the FPGA, as in the
power Analyzer case, so the values obtained will be of different
able 8
icroprocessors power consumption (mW).
easure Synchronous Asynchronous
locks 11.43 7.15
ogic 0.07 0
ignals 1.27 1.26
Os 2.71 0.63
otal idle 422.53 422.44
otal dynamic 15.47 9.04
otal power 438.01 431.48
C
•
•
u
c
vulation: execution cycle (FIFO 5).
anges. However, the difference in consumption between the
wo microprocessor versions can be seen in real time.
The power behavior of the circuits implemented in the FPGA
s monitored with the current probe and a data graphic is stored.
ircuit measurements are performed with the following criteria:
 Circuit activity is observed through the current behavior in the
main power line of the evaluation card with a current probe
and an ammeter.
 The capture of instantaneous measurements of current is syn-
chronized with a digital oscilloscope, taking into account the
initial trigger generated each time a program is executed.A connection diagram with the current probe and the eval-
ation board is shown in Fig. 17. The probes are electrically
ircuit isolated, i.e., this instrument indirectly detects the current
ariations through magnetic field changes in the power line.
A. Pedroza de la Crúz et al. / Journal of Applied Research and Technology 13 (2015) 483–497 493
Δ
t Cursor 1
Waveform
c_fetch
c_execution
f
z
pc
opr
acc
pto_out
to Cursor 2 = 40ns
125.2 ns 156.500ns
02 03
0F
FD
01
00
Start, fetch cycle End, fetch cycle
FD
2
187.800 ns 219.100 ns
Fig. 14. Synchronous microprocessor in real time: fetch cycle.
Δ
t Cursor 1
Waveform
c_fetch
c_execution
f
z
pc
opr
acc
0E 01 0F
01 02
02
00
03
pto_out
to Cursor 2 = 16ns
10ns 20ns 30ns 40ns 50ns 60ns 70ns 80ns
Start, fetch cycle End, fetch cycle
21
Fig. 15. Asynchronous microprocessor in real time: fetch cycle.
494 A. Pedroza de la Crúz et al. / Journal of Applied Research and Technology 13 (2015) 483–497
100
80
60
40T
im
e
 (n
s)
20
10
Fetch cycle
Synchronous (Simulation) Synchronous (Real time)
Asynchronous (Simulation) Asynchronous (Real time)
Execution cycle
(FIFO 1)
Execution cycle
(FIFO 2)
Execution cycle
(FIFO 3)
Execution cycle
 (FIFO 4)
Execution cycle
(FIFO 5)
Fig. 16. Real time implementation versus simulation graph.
p
1
2
3
o
c
a
c
c
p
p
p
p
t
c
n
i
aThe current behavior in the evaluation board with the FPGA
resents three distinctive levels:
. The level without programming the FPGA.
. The average level of power consumption when the device is
configured.
. The level when the microprocessors are working.
Fig. 18 shows a measurement graph, which indicates the three
perating levels with the numbers 1, 2, and 3.
Fig. 19 shows the behavior of the asynchronous and syn-
hronous microprocessor currents. In the latter, the activity has
 global clock dependence and is more uniform throughout the
ircuit. In addition, it does not present changes as large as its
Voltmeter
Power source
Evaluation
board
virtex 5
Ammeter
Current probe
– +
+
–
Fig. 17. Connection diagram with the current probe.
e
m
a
m
c
tounterpart asynchronous, i.e., once the synchronous micro-
rocessor executes the program, the trigger levels reach their
eak and then the current level falls slightly and continues
ermanently at a high level. If the two regions under both micro-
rocessors lines are compared, it is seen that the area under
he asynchronous microprocessor line is lower than the syn-
hronous.
In the case of the ST microprocessor, activation levels are
ot dependent on a global line and tend to be more local-
zed and appear only when a program is executed. This quality
llows more controlled and optimized levels of activity, thereby
nabling the reduction of power consumption.
The average current level when the FPGA is not program-
ing was 600 mA, and 680 mA when the device is configured
nd inactive. The level when a program is executed in both
icroprocessor was 890 mA. When the asynchronous micropro-essor finished the task, the current consumption was lowered
o 680 mA, and in the synchronous version, to 850 mA.
1
CH 10:1 20.0 mV/div DC full Width auto
2
3
Fig. 18. Current level measurements.
A. Pedroza de la Crúz et al. / Journal of Applied Research and Technology 13 (2015) 483–497 495
1100
Asynchronous microprocessor
Synchronous microprocessor
1000
900
800
700
1
2
3
600
500
500 1000 1500 2000 2500 3000
Time (ns)
Po
w
er
 c
on
su
m
pt
io
n 
(m
A)
 3500 4000 4500 5000
Fig. 19. Current level measurements with an executed program.
1000
900
800
Ast1 Ast1Bs1
700
600
500 1000 1500 2000
Time (ns)
 2500 3000 3500 4000
Asynchronous microprocessor
Synchronous microprocessor
Po
w
er
 c
on
su
m
pt
io
n 
(m
A)
-
c
c
a
Table 9
Test program for the microprocessor.
N◦ Address Instruction FIFO
1 000 0 → acc 1
2 001 pto in → acc 1
3 002 acc → pto out 1
4 003 acc shift left 1
5 004 acc → pto out 1
6 005 acc shift left 1
7 006 acc → pto out 1
8 007 acc shift left 1
9 008 acc → pto out 1
10 009 acc shift left 1
11 00A acc → pto out 1
12 00B acc shift left 1
13 00C acc → pto out 1
14 00D acc shift left 1
15 00E acc → pto out 1
16 00F acc shift left 1
1
a
c

p
5
m
t
a
c
pFig. 20. Current level area with an executed program.
From the current behavior of both circuits, and by the
onsideration that each microprocessor has a representative
urrent consumption area, as shown in Fig. 20, it can be
ssumed that the area Ast belongs to the asynchronous version
i
a
r
00
00 08 0E 0F 1A 1A0F 0F 1A
080706050403020100
01
0100
02
02
04
04
Waveform
c_fetch
c_execution
pc
opr
acc
pto_out
f
z
Ops
1
Cursor 1 Cursor 2 = 1.02ustoΔt
127.300ns 254.600ns 381.900ns 509.20
Fig. 21. Real time synchronous7 010 acc → pto out 1
nd Bs represents the synchronous area, therefore, Eq. (1) is the
onsumption difference.
C =  Bs −  Ast (1)
This represents the power saved by the asynchronous micro-
rocessor.
.5.  Test  programs  for  the  synchronous  microprocessor
A method to calculate the microprocessor performance is to
easure the time that a program takes to be executed on it. For
he evaluation, some performance test programs or benchmarks
re used. Then, the evaluation continues with a program that
onsists of several FIFO 1 operations. Table 9 shows the test
rogram instructions, which performs the following steps: first
t clears the accumulator, then, it loads a data from the input port
nd finally, it shifts the accumulator seven times and sends the
esults to the output port.
0F 1A 0F 1A 0F
0D0C0B0A09
1A
0E
0F
0F
1A 0F
08
08
10
10
20
20
40
40
80
80
00
121110
2
0ns 636.500ns 763.800ns 891.100ns 1.018us
 microprocessor program.
496 A. Pedroza de la Crúz et al. / Journal of Applied Research and Technology 13 (2015) 483–497
00
00 08 0E 0F 1A 1A0F 0F 1A 0F 1A 0F 1A 0F
0D0C0B0A09080706050403020100
1A
0E
0F
0F
1A 0F
01
0100
02
02
04
04
08
08
10
10
20
20
40
40
80
80
00
121110
Waveform
c_fetch
c_execution
pc
opr
acc
pto_out
f
z
Ops
1 2
Cursor 1 Cursor 2 = 424nstoΔt
54.300ns 108.600ns 162.900ns 217.200ns 271.500ns 325.800ns 380.100ns 434.400 ns
onous
m
C
C
c
c
p
3
T
c
c
1
M
o
m
t
a
i
V
o
b
V
6
c
d
i
o
(
i
i
s
F
2
d
t
v
r
v
w
F
a
m
1
F
A
c
I
w
c
t
t
fi
c
c
c
i
r
mFig. 22. Real time asynchr
The following equations were used to evaluate the perfor-
ance of both microprocessors (Hennessy & Patterson, 2011,
h. 1).
PI =
∑n
i=1(CPIi ∗  Ii)
N◦ Instructions
(2)
Considering the synchronous test program in Eq. (2), the
ycles per instruction (CPI) of each instruction (I) indicates the
ycles that FIFO 1 takes to execute the instruction (one cycle)
lus the fetch cycle (two cycles). Applying Eq. (2), the CPI was
.
p =  NI ∗  CPI ∗ T  (3)
Eq. (3) was used to find the program time (Tp) for the syn-
hronous program test. T  is the clock period (20 ns for 50 MHz
lock) and NI the number of instructions. The Tp obtained was
.020 s.
IPS = N
◦ Instructions
Tp ∗ 106
(4)
Eq. (4) calculates the MIPS (Millions of Instructions Per Sec-
nd) applied in order to compare the performance between both
icroprocessors running the same test program. The MIPS for
he asynchronous and synchronous microprocessors were 4.009
nd 1.666, respectively.
The program in Table 9 was performed in real time, shown
n Fig. 21, for the synchronous microprocessor implemented on
irtex 5. Note that the time program (Tp) was 1020 ns, the same
btained by Eq. (3). Fig. 22 shows the same program of Table 9,
ut now with the asynchronous microprocessor prototyped on
irtex 5. In this case, the Tp was 424 ns.
.  Conclusions
This work presents a Self-Timed microprocessor design
ompared with a synchronous version. Experimental results
emonstrated that asynchronous circuits can be implemented
n FPGAs, even though design tools for FPGAs are focused
n synchronous synthesis, as is the case with the ISE software
C microprocessor program.
from Xilinx). The FPGA editor simplified the asynchronous
mplementation on FPGAs. With this tool, delay macros can be
mplemented, which are useful for the asynchronous protocol
ignals required in order to correctly transfer the data. Moreover,
PGA editor scripts can help in delay designs.
The microprocessor occupation of slices on Virtex 5 was
0.19% for the synchronous version and 22.76% (including
elay macros) for asynchronous. Regarding inputs and outputs,
he asynchronous microprocessor used 23.64% of FPGA pins
ersus 23.86% in synchronous (due to clock pin). Occupation of
egisters was lower in the asynchronous microprocessor (0.92%
ersus 1.05%). As for memory blocks and DSPs, the occupation
as the same for both: 4.17% in RAMs and 12.50% in DSPs.
etch and execution cycles times were reduced considerably in
n asynchronous microprocessor compared with a synchronous
icroprocessor in real time. The time was reduced from 40 ns to
6 ns in fetch cycle and from 100 ns to 38 ns in execution cycle
IFO with the longest delay steps (the FIFO 5).
The power measurements were taken with the Xpower
nalyzer tool, which indicates 431.48 mW of power in the asyn-
hronous microprocessor and 438.01 mW in the synchronous.
n real time, the power consumption for the ST microprocessor
as lower than that for the synchronous, because when the asyn-
hronous finished processing, the current consumption returned
o a low operation level (680 mA), while the synchronous con-
inued at a high level (850 mA).
The asynchronous microprocessor implemented on Virtex 5
nished with a 4 MIPS performance, which outstrips the syn-
hronous at 1.6 MIPS with the same characteristics. We can
onclude that, despite the lack of design tools for asynchronous
ircuits, it is possible to use the tools for synchronous circuits
n order to design asynchronous circuits on FPGAs. This can
educe the process time as well as the power consumption,
eaning better performance and less cost for electronic circuits.onﬂict  of  interest
The authors have no conflicts of interest to declare.
lied R
A
3
R
B
B
C
E
E
F
F
F
G
H
H
H
I
J
J
L
M
M
O
O
O
O
R
R
S
T
T
T
v
WA. Pedroza de la Crúz et al. / Journal of App
cknowledgement
This work was supported by CONACYT, México, grant
22016.
eferences
eerel, P. A. (2002 August). Asynchronous circuits: An increasingly practical
design solution. In Proceedings of the international symposium on quality
electronic design (ISQED) (pp. 367–372).
runvand, E. (1993). The NSR processor. Proceeding of the twenty-sixth Hawaii
international conference on system sciences (Vol. 1) IEEE.
hu, S. L., & Lo, M. J. (2013). A new design methodology for composing com-
plex digital systems. Journal of Applied Research and Technology, 11(April
(2)), 195–205.
dwars, D. A., & Toms, W. B. (February 2003). The Status of Asynchronous
Design in Industry. Information Society Technologies (IST) Programme (2nd
ed.).
kanayake, V. N., Nelly, C. V., & Manohar, R. (2005). BitSNAP: Dynamic
significance compression for a low-energy sensor network asynchronous
processor. In Proceedings of the 11th IEEE international sympo-
sium on asynchronous circuits and systems (ASYNC) March 14–16,
(pp. 144–154).
urber, S. B., Day, P., Garside, J. D., Paver, N. C., & Woods, J. V. (1994 March).
AMULET1: A micropipelined ARM. In Compcon Spring’94, Digest of
Papers (pp. 476–485). IEEE.
urber, S. B., Edwards, D. A., & Garside, J. D. (2000). AMULET3: A 100 MIPS
asynchronous embedded processor. In Proceedings of the international sym-
posium on advanced research in asynchronous circuits and systems (pp.
329–334).
urber, S. B., Garside, J. D., Riocreux, P., Temple, S., Day, P., Liu, J., et al.
(1999). AMULET2e: An asynchronous embedded controller. Proceedings
of the IEEE, 87(February), 243–256.
eer, D. (2005). Is it time for clockless chips. IEEE Computer Society, (March),
18–21.
asan, L., & Zafar, H. (2012). Performance versus power analysis for bioinfor-
matics sequence alignment. Journal of Applied Research and Technology,
10(December (6)), 920–928.
ennessy, J. L., & Patterson, D. A. (2011). Computer architecture: A quantitative
approach (5th ed.). Elsevier.
ung-Yue, T., Wei-Min, C., Yuan-Teng, C., Chang-Jiu, C., & Fu-Chiung, C.
(2011). A self-timed dual-rail processor core implementation for micro-
controllers. In International conference on electronic devices, systems and
applications (ICEDSA), April 25–27 (pp. 39–44).
nstitute of Electrical and Electronics Engineers, Inc. (August 2008). IEEE
Standard for Floating-Point Arithmetic. IEEE Std 754-2008.
ohnson, C. (2001). Scrap system clock Sun exec tells Async. EE Times,. March
19.
ung-Lin, Y., Hsu-Ching, T., Chia-Ming, H., & Sung-Min, L. (2006). High-
level synthesis for self-timed systems. In IEEE Asia Paciﬁc Conference on
Circuits and Systems (APCCAS), December 4–7 (pp. 1410–1413).
X
Yesearch and Technology 13 (2015) 483–497 497
avagno, L., & Singh, M. (2011). Guest Editors’ Introduction: Asynchronous
design is here to stay (and is more mainstream than you thought). Design &
Test of Computers IEEE, 28(September–October (5)), 4–6.
artin, A. J., Burns, S. M., Lee, T. K., Borkovic, D., & Hazewindus, P. J.
(1989). The design of an asynchronous microprocessor. In Proceedings of
the decennial Caltech conference on VLSI on advance research in VLSI (pp.
351–373). Cambridge: MIT Press.
urata, T. (1989). Petri nets: Properties, analysis and applications. Proceedings
of the IEEE, 77(April (4)), 541–580.
rtega, S., Gurrola, M. A., Raygoza, J. J., Pedroza, A., & Terrazas, G. (October
2009). Implementación de estructuras ASIC Self-Timed aplicando el con-
junto de herramientas Alliance. In Proceedings of the SOMI XXIV.
rtega, S., Raygoza, J., & Boemo, E. (2005). Disen˜o e implementación de
módulos de control con protocolos de comunicación Self-Timed en FPGAs.
V Jornadas de Computación Reconﬁgurable y Aplicaciones CEDI.
rtega-Cisneros, S., Raygoza-Panduro, J. J., & de la Mora-Gálvez, A. (2007).
Design and implementation of the AMCC self-timed microprocessor in
FPGAs. Journal Universal Computer Science, 13(May (3)), 377–387.
rtega, S., Raygoza, J., Pedroza, A., Carrazco, M., & Loo-Yau, J. R. (2010).
Design and implementation of self timed and synchronous floating-point
multipliers. In The 1st international congress on instrumentation and applied
sciences conference, implemented in reconﬁgurable devices, October.
aygoza, J., Ortega, S., Carrazco, M., & Pedroza, A. (2009). Implementación
en hardware de un sumador de punto flotante basado en el estándar IEEE
754-2008. Digital Technological Journal,. October.
ichardson, W. F., & Brunvand, E. (1996). Fred: An architecture for a self-timed
decoupled computer. In Proceedings of the second international symposium
on advanced research in asynchronous circuits and systems (pp. 60–68).
March 18.
mith, S. C., Al-Assadi, W. K., & Di, J. (2010). Integrating asynchronous digital
design into the computer engineering curriculum. IEEE Transactions on
Education, 53(August (3)), 349–357.
akamura, A., Imai, M., Ozawa, M., Fukasaku, I., Fujii, T., Kuwako, M., et al.
(1998). TITAC-2: An asynchronous 32-bit microprocessor. Proceedings of
the IEEE, (November), 319–320.
erada, H., Miyata, S., & Iwata, M. (1999). DDMPs: Self-timed super-pipelined
data-driven multimedia processors. Proceedings of the IEEE, 87(February
(2)), 282–295.
ranchero, M., & Reyneri, L. M. (2008). Implementation of self-timed circuits
onto FPGAs using commercial tools. In 11th EUROMICRO conference on
digital system design (DSD), architectures, methods and tools September 3,
(pp. 373–380).
an Gageldonk, H., Baumann, D., van Berkel, K., Gloor, D., Peeters, A., &
Stegmann, G. (1998). An asynchronous low-power 80c51 microcontroller.
In Proceedings of the international symposium advanced research in asyn-
chronous circuits and systems (pp. 96–107).
erner, T., & Akella, V. (1997). Asynchronous processor survey. Proceedings
of the IEEE, (November), 67–77.ilinx Inc. (2015). Virtex-5 Family Overview. DS100 (v5.1), August 21. Available
at:. www.xilinx.com
oshida, J. (2003). Philips gambit: Self-timing’s time is here. EE Times,. March
31.
