Design of Self-Timed Reconfigurable Controllers for Parallel Synchronization via Wagging by Guido JS & Yakovlev A
 Newcastle University ePrints 
 
Guido JS, Yakovlev A. Design of Self-Timed Reconfigurable Controllers for 
Parallel Synchronization via Wagging. IEEE Transactions on Very Large Scale 
Integrated (VLSI) Systems 2014. In Press. 
 
Copyright: 
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all 
other uses, in any current or future media, including reprinting/republishing this material for advertising 
or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or 
reuse of any copyrighted component of this work in other works.  
  
http://dx.doi.org/10.1109/TVLSI.2014.2306176 
Further information on publisher website: http://www.tandfonline.com/ 
Date deposited:  19th February 2014 
Version of article:  Author final 
 
 
This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License 
 ePrints – Newcastle University ePrints 
http://eprint.ncl.ac.uk 
 
IEEE-TVLSI-00063-2013, VOL. X, NO. X, FEB. 2014 1
Design of Self-Timed Reconfigurable Controllers
for Parallel Synchronization via Wagging
James S. Guido, Graduate Student Member, IEEE, and Alexandre Yakovlev, Senior Member, IEEE
Abstract—Synchronization is an important issue in modern
system design as Systems-on-Chip (SoCs) integrate more diverse
technologies, operating voltages, and clock frequencies on a single
substrate. This work presents a methodology for the design
and implementation of a self-timed reconfigurable control device
suitable for a parallel cascaded flip-flop synchronizer based on
a principle known as wagging, through the application of dis-
tributed feedback graphs. By modifying the endpoint adjacency
of a common behavior graph via one-hot codes, several config-
urable modes can be implemented in a single design specification,
thereby facilitating direct control over the synchronization time
and the mean-time between failures (MTBF) of the parallel
master-slave latches in the synchronizer. As a consequence, the
resulting implementation is resistant to process non-idealities
which are present in physical design layouts.
This study includes a discussion of the reconfiguration proto-
col, and implementations of both a sequential token ring control
device, and an interrupt subsystem necessary for reconfiguration,
all simulated in UMC 90 nm technology. The interrupt subsystem
demonstrates operating frequencies between 505 and 818 MHz
per module, with average power consumptions between 70.7 and
90.0 µW in the typical-typical case under a corner analysis.
Index Terms—Asynchronous, Combinational, Controllers, Dig-
ital Circuit Design, Reconfigurable, Self-timed, Sequential, Syn-
chronization
I. INTRODUCTION
AS technology trends lead modern systems-on-chips(SoCs) to incorporate designs of increasing complexity,
the reliable transmission of data items across a chip remains
an issue of paramount concern. It becomes difficult to design
a single global clock that is capable of regulating data transac-
tions throughout the system, due to the large number of design
constraints required to guarantee reliable operation [1]. In the
presence of multiple (possibly dynamic) voltage and frequency
operating points, the single clock paradigm becomes even
less tractable as a design solution. It is simpler to construct
a SoC with several different voltage/frequency islands (each
with their own local clock) and then synchronize the data items
between regions. This principle forms the basis of the globally
asynchronous locally synchronous (GALS) signaling paradigm.
GALS requires the presence of an asynchronous wrapper in
order to reliably pass data between two clock regions, which
must then be synchronized at each end of the transfer [2].
Manuscript received Jan 25, 2013; revised Feb 11, 2014.
J. S. Guido is with the Department of Electrical and Electronic En-
gineering, University of Newcastle, Newcastle NE1 7RU, U.K. (e-mail:
james.guido@ncl.ac.uk)
A. V. Yakovlev is with the Department of Electrical and Electronic
Engineering, University of Newcastle, Newcastle NE1 7RU, U.K. (e-mail:
alex.yakovlev@ncl.ac.uk)
This work was supported by the EPSRC grant for Globally Asynchronous
Elastic Logic Synthesis (GAELS) - EP/I038551/1.
Another related concept known as a networks-on-chip (NoC)
accomplishes a similar purpose by transmitting data items
as packets along the wires of a homogenous interconnection
network [3]. In a NoC, synchronization is required at the
endpoints of the network interface [4]. In both cases, the
design constraints of the clock distribution network in SoCs
which incorporate either GALS or NoCs are easier to satisfy,
though a full discussion of both are beyond the scope of this
paper. As synchronization is required at some juncture in SoCs
incorporating either GALS or a NoC, it serves to illustrate
that synchronization remains a relevant issue in the design of
modern digital systems.
The synchronizer circuits present in the network interface
of a NoC serve as a useful example for the purposes of
this introduction. Several approaches exist for constructing
synchronizer circuits suitable for such an interface, each
with benefits and tradeoffs. The relevant synchronizer circuit
designs of interest are based on cascaded master-slave flip-
flops, with the key points of difference being whether or not
the circuits incorporate a first-in first-out (FIFO) buffer, and/or
a (possibly variable) number of parallel master-slave flip-flops
[5][6][7].
In a basic cascaded flip-flop synchronizer (incorporating
neither a FIFO nor parallelism), synchronization is performed
using two or more flip-flops in a master-slave configuration at
each end of a transmitter/receiver (Tx/Rx) device pair. When
the master device is transparent, the slave is opaque, and vice
versa. Voltages in the master latch may be indeterminate when
the latch becomes opaque and the signal is sampled, however
the value must be resolved to a clear logical high or logical
low value by the time the slave device becomes opaque and
the master latch once again becomes transparent. Sadly, the
throughput of data is governed by the roundtrip delay through
the transmitter and receiver ends of the synchronizer, which
increases with the number of serial flip-flop elements in the
chain [5]. Fortunately, incorporating a FIFO buffer into the
Tx/Rx pair decouples the reading and writing operations from
each other, allowing both operations to be done as soon as
the system is ready [5]. The read operation can take place as
long as there is valid data stored in the FIFO, while the write
operation can be performed as long as the FIFO has space
available in the buffer. However, the FIFO will incur penalties
due to overflow/underflow if the sender and receiver clocks
are not well matched [6].
To further exacerbate the problem, instances of synchroniza-
tion failures (i.e. when the voltages in the master-slave flip-
flops of a synchronizer fail to resolve within an allotted time)
are increased in modern technologies due to both increased
data rates, and reduced voltages [8]. While many solutions
IEEE-TVLSI-00063-2013, VOL. X, NO. X, FEB. 2014 2
exist to resolve the issues of data synchronization, the solutions
which employ parallelism (i.e. using many components to
perform one specific task) remain of particular interest to this
work. Employing parallelism in the context of a synchronizer
circuit has two useful properties:
1) With appropriate scheduling of parallel tasks, mis-
matches between the transmitter and receiver ends of
a synchronizer can be minimized.
2) The mean-time between synchronization failures
(MTBF) can be manipulated by adjusting the degree of
parallelism employed.
These two properties have appeared in other bodies of litera-
ture in one form or another. Of specific interest is prior work
by C.H. van Berkel, on handshaking circuits where he defined
a concept known as wagging (i.e. employing parallelism, in
tandem with the scheduling of tasks via time division) [9].
Ebergen also experimented with the scheduling of tasks in
parallel compositions of finite state machines in his own work
[10]. More recently, Brej used the concept of wagging to
compose a system of parallel logic wherein he used the phrase
wagging level to denote the number of data copies in his
system. Each copy was then given a unique slice and slice
number in order to schedule its tasks [11].
On the topic of parallel synchronizer circuits, Jex and Dike
put forth the idea of using parallel interconnects to increase the
throughput and metastability characteristics of a synchronizer
circuit [7]. Alshaikh et al, also presented a synchronizer design
which employed wagging [12]. However, prior literature on
parallel synchronizer circuits remains deficient on two key
points:
1) Prior controller designs have been static in nature. The
manipulation of the MTBF in parallel synchronizer
circuits via the process of reconfiguration has remained
unexplored.
2) The construction of a sequential token ring controller
suitable for such a task has also remained obscure.
To that end, this paper presents a brief overview of a parallel
synchronizer design based on the principles of wagging, and
then uses that framework in order to specify a reconfigurable
controller based on sequential logic and embedded graphs
which is suitable for the manipulation of the same. The
work is organized as follows. Section II presents an overview
of the wagging principle as well as its applications to the
design of synchronizers, along with the top-level architecture
of the controller design. Section III presents the theoretical
underpinnings of a reconfigurable control device based on
token shifting and targeted for use in parallel synchronizers
will be presented. A comparison of the benefits and tradeoffs
of various token ring designs will be discussed in Sections IV,
along with a presentation of some experimental results. Section
V will give similar treatment to the interrupt subsystem.
Finally, conclusions will be drawn in Section VI concerning
future research directions.
II. WAGGING OVERVIEW
A design incorporating wagging will always contain two
properties, regardless of whether or not the design is controlled
a
No
rm
al
  
Start
DATA F LOW
CONTROL
Mixer  (DEMUX)
Mixer  (MUX)
Mixer  
(MUX) Split
Merge c
1
2
3
| T|
x
| TT
T
y
Fig. 1. Annotated diagram of a 2-way wagging buffer.
via synchronous control signals or asynchronous communica-
tion signals called handshakes.
1) Usage of parallel components to share the workload of
a task.
2) Scheduling of said tasks via time division.
To understand the concept of wagging we must look to prior
literature on the subject. In 1992, C.H. Van Berkel presented
work on asynchronous handshaking circuits, including work
on a two-way buffer which he referred to as a wagging buffer,
reproduced in Fig. 1 [9]. Active ports in Fig. 1 are indicated
by black dots, while passive ports are indicated by white dots.
The data flow aspect of the buffer is comprised of mixers
(|), transferrers (T ), and variables x, and y which function
as memory. Transferrers are components which pass values
through their active ports when triggered along their passive
ports, while mixers are components which pass handshakes
from their passive ports to their active ports, and can either
act as demultiplexers (DEMUX) or multiplexers (MUX). The
operation of the circuit is as follows.
1) x← a @t1 (x is written with the value of a at time t1).
2) c← x (i.e. a @t1 is passed to the output of the buffer),
and y ← a @t2 (y gets the value of a at time t2).
3) c← y (i.e. a @t2), and x← a @t3
Step 1 only occurs at the startup of the circuit, in order to
place valid data on x prior to its read out during the next step.
Thereafter, only steps 2 and 3 are executed. Thus, even though
x and y in Fig. 1 are placed in parallel, functionally they act as
if they were placed in series due to the scheduling of tasks[9].
A. Applications to Synchronizers
Let us take a look at a FIFO synchronizer which incor-
porates the principles of the prior subsection, as in Fig. 2.
Central to this design are two major points. First, is the
maximization of the synchronization performance from the
input data channel to the input of the FIFO by pooling the
gain-bandwidth products of the parallel master-slave flip-flops,
as illustrated in prior work by Horstmann [13]. Second is
the manipulation of the data flow mismatch between the
transmitter (Tx) and receiver (Rx) ends of the synchronizer.
As the focus of this work centers on the construction of a
IEEE-TVLSI-00063-2013, VOL. X, NO. X, FEB. 2014 3
FIFO
Full Empty
WRITE
READ
Read  
done
RECEIVER
TRANSM
ITTER
SYSTEM
Subsystem  
ACK  (Rx)
UD
AT
A(
Rx
)
Reconfig  RQ  (Rx)
| D Q D Q
M S
| D Q D Q
M S
D Q D Q
M S
D Q D Q
M S
|
|
dataprime  
datasec  
#  parallel  M/S  flip-­‐flops  
=  2  x  i
CTR  signals  are  
distinguished  
by  their  time  
offsets
Write  
Valid
DATA
j
j
RECONFIG  
CTRL  (Tx)
j-­‐way  SYNC f0j(f0)
SPLIT
j
f0
j
÷  j
÷  j
j(f0)
#  parallel  M/S  flip-­‐flops  =  j
RDATA(Tx)
i-­‐way  SYNC  (x  i)
DQDQ
Free  to  
Write
MS
D Q D Q
M S
x  i
x  i
MERGE
2  mixers
Subsystem  
ACK  (Tx)
UDATA(Tx)
Reconfig  RQ  (Tx)
RECONFIG  
CTRL  (Rx)
i
f0
RDATA
(Rx)
i
f0
i(f0)
1
i(f0)
READ  CLOCK  
fclk(Rx)=i(f0)
Data  
Available
j
CTR(j)  (Tx)  =  fclk(Tx)  /j  =    f0   CTR(i)  (Rx)  =  f0
(2  x  i)
i,  2i
WRITE  CLOCK  
fclk(Tx)=j(f0)
T
Assume:  
j  >  i,  j  <  2i
1
T
i-­‐way  SYNC  (x  i)
1
1
Fig. 2. Top view of a FIFO synchronizer incorporating wagging at the transmitter and receiver.
control device used to schedule the tasks in the synchronizer
from the input data channel to the input of the FIFO, only
the first point is relevant in this work. A detailed study and
analysis of the second point is left for future work, though
Fig. 2 provides a general overview of the entire system.
First, let us assume the FIFO is asynchronous, and that reads
and writes occur independently of each other, where fclk(Tx)
and fclk(Rx) represent the local data rates of the transmitter
and receiver regions of the synchronizer, respectively. Let us
further assume that i and j are multiples of a common base
frequency f0, and that i < j < 2i (i.e. the transmitter is faster
than the receiver, but not by more than a factor of 2).
With those assumptions in hand we can step through the
operation of Fig. 2 as follows. The serial input data and
write validation signals (DATA, WRITE) from the transmitter
arrive at the splitter module at a rate of j × f0 where they
are both split into j identical signals using mixers. These
signals are then broken into j tasks (slices), through sampling
via the use of j parallel master-slave flip-flops which act as
synchronization (SYNC) elements. Thereafter, these SYNC
elements are triggered using control signals which all operate
at a base frequency of f0, but are offset from each other by
j divisions as shown in Fig. 3 (though only DATA is shown).
Finally, the DATA and WRITE signals arrive at the FIFO input
in j parallel lines operating at rate of f0. Subsection B will
illustrate the impact of this process on the synchronizer MTBF.
Continuing forward, we can now examine the operation of
the synchronizer from the output of the FIFO to the input of
the receiver. Because i ￿= j, and i < j < 2i, the FIFO is
still subject to data accumulation. In order to minimize such
accumulation, the receiver needs to be designed to allow for
the data being read out from the FIFO to temporarily exceed
the amount of data being written in when certain conditions
are met. This is accomplished by allowing the read operations
to be done either serially, or in parallel.
Both read operations can use the same hardware. During
a serial read, i parallel data lines from the FIFO are sent to
a mixer that recombines them along dataprime at a rate of
i × f0, in a manner which is identical to the recombination
of the signals in the wagging buffer discussed earlier. A
parallel read operation functions similarly, except that data is
simultaneously read out along both dataprime and datasec.
Whether or not a serial or parallel data read is necessary
depends on the present memory differential between the trans-
mitter and receiver ends of the FIFO synchronizer, (j − i)cur
as defined by (1).
If (j − i) +￿(j − i)cur < i, then increase￿
(j − i)cur by (j − i), (serial read)
If (j − i) +￿(j − i)cur ≥ i, then decrease￿
(j − i)cur by i− (j − i), (parallel read)
(1)
When the conditions for a parllel read are met, the system
continues parallel read operations until the memory differential
reaches 0, whereafter it resumes serial operation. It should
also be noted that read acknowledgement signal (READ) is
processed in vectors of length i or 2i depending on whether
a serial or parallel read operation was last performed.
However, it is worth re-iterating that the focus of this work
is on the design and implementation of the reconfigurable
control device suitable for this architecture, and that the
material above merely forms crucial context.
B. Impacts on Synchronizer MTBF
In general, the MTBF of a synchronizer is characterized in
terms of 3 parameters, τ , Tw, and ∆tin. ∆tin is defined as the
region of vulnerability where concurrent transitions between
the clock and data signals will lead to longer than normal
resolution times in the system, possibly causing failure if they
occur sufficiently close together. In the case of a master-slave
flip-flop synchronizer, this region is defined by (2), where Tw
IEEE-TVLSI-00063-2013, VOL. X, NO. X, FEB. 2014 4
tsample(1)
tsample(2)
tMSR(0)
tMSR(1)
tMSR(2)
INPUT DATA
tMSR(orig)tsample(orig)
0 1 2 0
CTR0
CTR1
CTR2
tsample(0)
(a) Data flow from the transmitter end of a wagging synchronizer to the input
of the FIFO in Fig. 2 with j=3 and a 50% duty cycle.
Embedded  Token
Ring
Subsystem
ACK
Reconfig
  RQ
UDATA Interrupt  Subsystem
RD
AT
A
(O
ne
-­‐H
ot
)
CTR
  
(CLK  Enable)
MUTEX
RQ  
Denied
Resource  
RQ  
Issued
RQ
  
Granted
Token  
Present
Resource  
Reserved
j
CT
R(
  j  
)  =
  C
TR
(  j
-­‐1
)  +
t d
...TO  M/S  LATCH  PAIRS
...TO  RECONFIG  ISSUING  
HARDW
ARE
(b) Top view of a reconfigurable controller suitable
for a wagging synchronizer.
Fig. 3. Overview of Wagging Synchronization.
is a circuit parameter known as the metastability window, τ
is the resolution time constant, and t is the synchronization
time allotted for recovery from metastability (also referred to
in this paper as tMSR) [5].
∆tin = Twe
−t
2τ (2)
The presence of metastable behavior may lead to a failure
in the synchronizer depending on the length of time the master
latch remains unresolved. As a linear increase in τ results in
an exponential decrease in the MTBF of a cascaded flip-flop
synchronizer, maintaining or improving the value of τ (i.e.
causing it to become smaller) remains an important design
concern. The MTBF of a master-slave flip-flop synchronizer
is defined in (3), where fc and fd are the rates of the clock
and data signals, respectively [5].
MTBF =
e
t
τ
fdfcTw
τ
Tw
(3)
From the information above, we can modify the failure
equation for a master-slave flip-flop synchronizer shown in
(3) to account for the effects of increased parallelism. Par-
allelism functions to linearly increase the value of t (i.e.
the synchronization time) by splitting the synchronization
“workload” across j devices. Thus, j affects the numerator in
the exponential portion of (3) as shown in (4), and illustrated
by Fig. 3(a), where tsample is the sampling time of the master
latch in the master-slave pair.
MTBF(parallel) =
e
jt
τ
fdfcTw
τ
Tw
(4)
A linear increase in the synchronization time results in
an exponential improvement in the MTBF of each paral-
lel master-slave flip-flop in the synchronizer. Consequently,
these components are less likely to suffer a synchronization
failure. However, the denominator in the exponential portion
of (4) (i.e. τ ) exponentially increases with a linear decrease
in voltage[8]. Thus, while the impact of parallelism on the
exponential portion of (4) is linear, the impact of voltage on
the exponential portion of (4) is exponential (at low voltages).
C. Top-Level Control Circuit Architecture
The control circuit is divided into two parts, one being a
token ring composed of several embedded cycles which selects
its cycle length based on a one-hot control code (RDATA)
generated by the interrupt subsystem. The ring determines
the number of parallel master-slave flip-flops present in the
synchronizer, and outputs the delayed clock/enable (CTR)
signals to each of the flip-flops in the synchronizer, thereby
partitioning the input data into slices [11]. The second part is
an interrupt subsystem responsible for halting the operation
of the token ring (and by proxy the synchronizer) while
the system undergoes reconfiguration, ensuring the functional
correctness of the synchronizer by preventing loss of the
control token.
It should be noted that the impetus for reconfiguration needs
to be specified at a higher level of abstraction. Generally,
this requires some a priori knowledge about the system itself
(i.e. operating voltages, temperature, relative phase/frequency
relationships between two clock domains). Regardless, once
the need for reconfiguration has been identified, the system
should issue a reconfiguration request (RQ) to the interrupt
subsystem of the controller bundled with the relevant one-hot
control data (UDATA) generated external to the controller. The
request is then fed to a mutual exclusion element (MUTEX)
that determines whether or not the token ring can be safely
halted, issuing a grant when sucessful. Once the request has
been granted, reconfiguration proceeds. The subsystem then
alters the one-hot code being sent to the embedded token
ring, issuing an acknowledgement (ACK) signal when the
process is complete, wherupon the MUTEX is released and
the synchronizer resumes its operation. Due to the bundled
data assumption (i.e. the cauality of asynchronous signals can
be enforced at the physical level during circuit layout) the RQ
and UDATA signals are prevented from becoming metastable
IEEE-TVLSI-00063-2013, VOL. X, NO. X, FEB. 2014 5
1 2
3
45
8
6
7
(a) with a single cycle of 8 vertices
1 2
3
45
8
6
7
(b) with several cycles and a maxi-
mum cycle length of 8 vertices
1 2
3
45
10
6
9
7
8
(c) with several cycles, a maximum adjacency
equal to 4, and a maximum cycle length of 8
vertices
Fig. 4. Behavior graphs of a token ring.
0 1 2 3 4 5 6 7
(a) Basic Linked List Figure
0 1 2 3 4 5 6 7
a0 a1 a2 a3
0 1 1 1
L/2
b0 b1 b2
1 1 1
E
8 9 10
O
{}
CM
L=8
Y=7  
(L-­‐1)  -­‐  k  @  k=7 (C+L/2)-­‐k  @  k=10
C  =  Y+(L/2)-­‐1
M  =  (C+L/2)/2  
(b) Annotated diagram outlining the methodology for creating abstract dis-
tributed feedback loops
Fig. 5. Distributed Feedback Algorithm Overview
due to a hazard (i.e. a race between two asynchronous signals).
[14].
III. DESIGN METHODOLOGY
With the prior discussion in hand, we can now move onto
the theoretical underpinnings of the control circuit. Concepts
such as the topology and cyclic behavior of token rings and
linked lists serve as useful stepping stones in understanding
how the final circuit behaves and will be discussed in subsec-
tions A and B, respectively. Subsection C will relate the cycle
length of the token to the synchronization time alloted for
recovery from metastability. Finally, subsection D will discuss
the reconfiguration process in detail.
A. Topology of Token Rings
Explaining the functionality of the reconfigurable control
device explored in this paper requires understanding a few
key concepts. The first of these is that of a behavior graph.
Such a graph is composed of vertices and edges. Vertices are
defined as the corner points of the behavior graph which are
formed by the intersection of edges. Edges refer to the set of
unordered pairs (i.e. lines) which link together the vertices
within the graph. If the edges are ordered (i.e. directed) they
tend to be referred to as arcs.
Fig. 4(a) illustrates a token ring comprised of eight vertices,
each containing two arcs (one input and one output). The
connectivity and directionality of the system is defined by the
adjacency of the vertices. In Fig. 4(a), vertices 1 and 2 are
adjacent since the token travels from 1 to 2 as it proceeds
around the loop. The path from vertex 1 to 8 within the token
ring constitutes a cycle. However the token ring in Fig. 4(a)
only contains a single cycle. If a single token ring contains
more than one cycle, the cycles are said to be embedded within
the token ring. Fig. 4(b) shows the same token ring as Fig. 4(a),
but instead of having only one cycle of length 8, it contains
cycles of length 3, 4, 5, 6, and 7. Many cycles are necessary
to implement reconfiguration. However, the token ring of Fig.
4(b) is still inadequate due to the fact that its connections are
not distributed.
Embedded token rings are of primary interest in this
work, as they constitute a useful abstraction for defining the
operating modes of a reconfigurable control device. Each
embedded cycle corresponds to one configurable mode in the
underlying control device. However, while this is sufficient
to construct a specification that covers the set of all possible
states, ensuring that differing cycles within the token ring
remain “reachable” from each other requires the integration
of an interrupt subsystem into the controller which halts the
operation of the device during switching to guarantee that the
token is preserved.
B. Cyclic Behavior of Token Rings
Using the definitions above, Fig. 4(b) contains several cycles
where the average adjacency over all vertices is equal to 3.
Unfortunately, the topology of this token ring also contains a
single vertex which accumulates a number of edges propor-
tional to the number of embedded cycles in the specification.
Though this vertex only contains 7 connections in Fig. 4(b),
such clustering is an acute problem, as the edges in this
example are a literal representation of the wiring connections
in the controller. A systematic increase in the number of
edges (wires) about a single vertex (device) can lead to a
system failure as the number of configurations increases.
The situation is analogous to a wire becoming increasingly
capacitive in proportion to the number of connections added
to it, as supported in prior work by Sutherland [15]. In order
to ameliorate this problem, the edges must be distributed.
IEEE-TVLSI-00063-2013, VOL. X, NO. X, FEB. 2014 6
An embedded graph utilizing distributed feedback loops can
be modeled by taking a basic linked list of L objects, as
shown in Fig. 5(a), and adding a few parameters. The objects
are data structures which contain the following 3 pieces of
information: an integer index, k, used to order the entries of
the list; a pointer, tailnormal, containing the initial path to the
next object in the list; and a pointer, tailjump, containing the
secondary path to the next object in the list.
Initially, the objects in in the list from 0 to L-1 are created
and linked using the tailnormal pointers. When the final entry
is created, the tailnormal of this entry will be given a pointer
to the head of the list, as in Fig. 5(a), where L = 8. Using this
linked list as a basis, a new list can be constructed which
models the embedded token rings used in this work. The
embedded list has the following noteworthy properties.
1) The distributed list (ring) is divided into sets of odd and
even cycles, as depicted by the dotted and solid feedback
arcs in Fig. 5(b), while the normal one, as in 4(b), is not.
2) The odd and even cycles in the distributed list (ring)
each have their own sets of shared vertices.
3) The set of objects which are common to all cycles in
the distributed list (ring) is smaller than in the normal
one (possibly zero).
Next, the tailjump pointers for the even cycles are statically
assigned values via the application of a few observations, as
shown in Fig. 5(b), assuming that we start at object 0, L > Y ,
where Y represents the valid configurable mode which contains
the highest number of vertices which is not of the same parity
as L, and L is even. As object (L/2) is located in the middle of
the graph and the jump target is located a symmetric distance
away on the other side of the midpoint (between 0 and L-
1), the target index, T(even), at any given juncture can be
calculated as T(even) = (L-1) - k, where k represents the current
position of the list pointer.
Afterward, the remaining objects between L and C are
created and linked together via their tailnormal pointers. As M
is located in the middle of the graph of the odd cycles, and the
jump target is located a symmetric distance away on the other
side of the midpoint (between L/2 and C), the target index,
T(odd), at any given moment can be calculated as T(odd) = (C
+ L/2) - k. Distributing the number of edges within the linked
list (or token ring), while preserving the number of embedded
cycles, incurs an additional overhead of (Y+L/2)-L additional
vertices.
Fig. 4(c) represents a token ring which is identical in behav-
ior to the token ring of Fig. 4(b), but both the total number of
vertices and the cycle lists of the two figures are different as
characterized in Table I. Additionally, it can be inferred that
the odd and even cycles will become mathematically disjoint
from each other as the length of the token ring increases. This
will have implications on the reconfiguration protocol which
will be discussed later in subsection D.
C. Variable Control Parameter (Synchronization Time)
We may now explore the links between the number of
vertices in a given cycle and its effect on the synchronization
time of the system. Synchronization time, tMSR, as it pertains
TABLE I
DIFFERENCES IN THE CYCLE LISTS OF A MULTI-CYCLE TOKEN RING
WITH DISTRIBUTED EDGES VS. AN UNDISTRIBUTED RING (MAXIMUM
CYCLE LENGTH = 8 VERTICES)
Normal Distributed
Cycles 1, 2, 3→ 3 Odd Cycles 6, 7, 8→ 3
1, 2, 3, 4→ 4 5, 6, 7, 8, 9→ 5
1, 2, 3, 4, 5→ 5 4, 5, 6, 7,
8, 9, 10→ 7
1, 2, 3, 4, 5, 6→ 6 Even Cycles 3, 4, 5, 6→ 4
1, 2, 3, 4, 5, 6, 7→ 7 2, 3, 4, 5, 6, 7→ 6
1, 2, 3, 4, 5, 1, 2, 3, 4, 5,
6, 7, 8→ 8 6, 7, 8→ 8
Shared 1, 2, 3 6
Vertices
to a master-slave latch configuration is defined as the maxi-
mum time interval over which the master latch may remain
unresolved before metastability propagates to the slave latch.
It is defined by the duty cycle as in (5):
tMSR = tcycle − tsample − tsetup(slave) (5)
where tcycle is the cycle time of the system, tsample is
the sampling time of the system, and where tsetup(slave) is
the setup time of the slave latch in a master-slave flip-flop
synchronizer. The setup time constraint is included to account
for errors in the simple failure models of master-slave flip-
flop synchronizer equations presented earlier, due to what is
referred to in other literature as the back edge effect. The back
edge effect refers to the phenomenon where the MTBF curve
changes (is displaced) when the master latch resolves from
metastability near the ∆tin of the slave, which is beyond the
scope of this paper but has been previously documented [5].
Varying the number of vertices in the system modifies the
cycle time, tcycle(j), as in (6):
tcycle(j) = ncycle(j) ∗ td(vertex) (6)
in which ncycle(j) is the cycle length of a valid configurable
mode, as previously discussed, where 0 ≤ j < L, and
td(vertex) is the average delay across vertices in the underlying
physical implementation. The index value j in the terms ncycle
and tcycle denotes that the number of vertices present within
the system is variable, and L is defined as the index value (i.e.
cardinality) of the valid configurable mode which contains the
highest number of vertices. Thus, the number of vertices in
a given configuration affects the synchronization time of the
final control circuit via a direct link.
D. Reconfiguration Protocol
We can now move onto an examination of the protocols gov-
erning the reconfiguration process. The process is contingent
on the ability of the circuit to direct the flow of a control token
as it passes between the graphs of two separate configurable
modes, and to ensure that the operation occurs without failure.
Fig. 6 depicts the graph of an 8-way reconfigurable controller
with several cycles, as in Fig. 4(c). The vertices of the graph
are represented by the control (CTR) signals, while the token
request (RQ) and token acknowledge (ACK) signals represent
IEEE-TVLSI-00063-2013, VOL. X, NO. X, FEB. 2014 7
CTR0
CTR1
CTR2
CTR3
CTR4
CTR9
CTR8
CTR7
CTR6
CTR5
Interrupt Device (Odd Tail Node)
Interrupt Device 
(Even Tail Node)
Interrupt Device 
(Odd Head Node)
RQ/
ACK0
RQ/
ACK1
RQ/
ACK2
RQ/
ACK3
RQ/
ACK4
RQ/
ACK5
RQ/
ACK6
RQ/
ACK7
RQ/
ACK8
(E)RQ/ACK7
(C)RQ/
ACK6
(A)RQ/ACK5
(B)RQ
/ACK8
(D)RQ
/
ACK9
cy
cl
e(
m
ax
) (
od
d)
(f
ee
db
ac
k)
cy
cle
(m
in
) 
(e
ve
n)
cycle(min
) (odd
)
(X)RQ
/ACK7
Fig. 6. Behavior graph of an 8-way reconfigurable controller with several
cycles.
the edges. The RQ signals of the token ring move clockwise
around the graph, while the ACK signals move in the opposite
direction. The paths denoted A, B, C, D, and E represent
the valid feedback paths in the system, while X denotes an
invalid path. The reasons for this will be discussed in Section
IV-B. Unfortunately, the exact position of the control token
is unknown at the time of reconfiguration. If the system
reconfigures when the token is located in a region of the
present configuration which is not covered by the graph of
the subsequent configuration, then the token will be lost.
Arbitration is therefore necessary to ensure that the token is
passed between configurations without incident. This is similar
to prior work used to arbitrate between configurations in a
system with pausible clocks [16]. It also shares commonalities
with prior work on lazy-ring arbiters [17]. However, while
a lazy ring arbiter stalls the system and sends the token
backward to the point of an initial request signal, the proposed
design stalls the control circuit and waits for the token to
traverse forward through ring of vertices until it arrives at
the request point. In short, the control token can only flow in
one direction. Stalling the control device requires the use of a
MUTEX element, as shown in Fig. 7(a). A MUTEX has the
property that if access is granted to a single request, then all
other requests are disabled until the operation is completed
and the resource is released. Therefore, if such a device is
inserted into the token ring at a location which is common
to both the old configuration and the new one, then the token
can be stalled until the reconfiguration operation has finished
and then released along the new configuration without error.
The protocol itself functions as follows, denoted by the dotted
portion of Fig. 7(b):
1) The reconfiguration request arrives at R1 (R1 goes high),
and the MUTEX resource is reserved (G1 goes high).
2) The control token arrives at the MUTEX element along
R2, and is halted from continuing.
3) System reconfiguration is performed. (RC EN1 goes
high)
4) An acknowledge signal is generated by the interrupt
hardware.* (RC ACK1 goes high)
5) The reconfiguration acknowledgement arrives, the MU-
TEX resource is released (R1 goes low), and the token is
allowed to continue along G2. (All other signals become
low thereafter).
* If necessary, intermediate reconfiguration is used to pass
the token to the correct configuration of interest.
Intermediate reconfiguration is not required if the protocol
is implemented across systems which contain either an exclu-
sively odd or an exclusively even number of vertices, as defined
by the embedded behavior graph of the token ring (similar to
the one shown in Fig. 4(c) and Fig. 6). Assume cycle(g)(even)
is a valid configuration (set of vertices) where g is even. The
embedded nature of our ring construct is such that for all valid
configurations with an even number of vertices the following
holds:
cycle(g)(even) ⊂ cycle(h)(even) (7)
where g < h. As a consequence, only one interrupt device
is required in the above case, which is to be placed at the tail
vertex of the minimal even (cycle(min)(even)) configuration
of vertices. The argument follows similarly for the valid
configurations in the system with an odd number of vertices.
Conversely, intermediate reconfiguration is mandatory if any
two valid configurations are mathematically disjoint from each
other. If (8) holds
numconfig(odd) ≥
￿￿cycle(min)(even)￿￿ , or
numconfig(even) ≥
￿￿cycle(min)(odd)￿￿ (8)
where numconfig(odd) (numconfig(even)) is equal to the
total number of valid odd (even) configurations in the be-
havior graph, then the intersection of the elements in the
sets of cycle(min)(even)) and cycle(min)(odd)) are equal
to the empty set (i.e. the two configurations are disjoint).
Thus, a series of intermediate steps are required to link these
configurations together.
In order to minimize the hardware overhead in the final
control device, care must be taken when placing the interrupt
devices in the system. Interrupt devices should be placed at the
tail vertices of the minimal odd and even configurations, while
an interrupt device must also be placed at the head vertex of
either the maximal length odd or even configuration, as shown
in Fig. 6. Mathematically, (9) must be true
cycle(max)(odd) ∩ cycle(min)(even) ￿= {},
cycle(max)(even) ∩ cycle(min)(odd) ￿= {} (9)
where cycle(max)(odd) (cycle(max)(even)) is the valid con-
figuration which contains the maximum number of odd (even)
elements, and cycle(min)(odd) (cycle(min)(even)) represents
the valid configuration which contains the minimum number of
even (odd) elements. Thus, the vertices within the graph of the
largest odd (even) configuration and the smallest even (odd)
configuration are not mutually exclusive, as shown in Fig. 4(c)
and Fig. 6. Because of (9) and the containment property in
(7), the control token can now be successfully passed between
any valid configuration (set of vertices) in the behavior graph
IEEE-TVLSI-00063-2013, VOL. X, NO. X, FEB. 2014 8
RECFG  
CTRL
RECFG  
CTRL
RECFG  
CTRL
MUTEX MUTEX MUTEX
TOKEN  RING
RC
_E
N
1
RC
_E
N
2
RC
_E
N
3
R1
R2
RC
_A
CK
1
G1
G2
R3
R4
RC
_A
CK
2
G3
G4
R5
R6
RC
_A
CK
3
G5
G6
RECONFIG  PROCESS
(a) Block diagram (b) Signal transition graph
Fig. 7. Abstract representations of a reconfigurable control device with 3 interrupt modules
through the use of only three interrupt devices. In this manner,
the control token is contained when switching between even
and odd configurations. This will be true for control circuits
with an arbitrary number of vertices (devices). Therefore, it is
possible to create an interrupt system which covers a complete
range of cases with only three interrupt devices.
An effective way to represent the reconfiguration protocol
is via the use of a signal transition graph (STG) [18] [19].
A STG is composed of transition arcs, each which must
be enabled in order to fire, and initialized with a set of
markings which correspond to the initial state of the signals
in the system. Every STG must conform to the conditions
of boundedness, consistency, output persistency, and complete
state coding. If these four properties are satisfied, then the
STG can be implemented as a speed-independent circuit. Such
graphs are useful for characterizing the behavior of protocols
where several concurrent events can occur. Tools, such as
Petrify and Workcraft, can then take these graphs and derive
Boolean equations which implement the desired behavior [20]
[21]. Fig. 7(a) shows the block diagram which defines a system
of three interrupt modules. This diagram helps to understand
the meaning of the lines in the STG of Fig. 7(b).
Special attention should be given to signals CSC1, CSC2,
and CSC3 of Fig. 7(b), highlighted in solid black boxes. These
signals are going to be referred to in the the remainder of
the paper as CSC threads. By inserting these CSC threads
between the sections of the STG which define the interrupt
devices (indicated by the dotted boxes numbered 1, 2, and
3) it is possible to guarantee that the modules present in the
subsystem fire in a specific order (i.e. 1, 2, 3) despite each
interrupt module being a separate entity. The STG shows that
the CSC signals CSC1, CSC2, and CSC3, whose positive
transitions are generated in loops 1, 2, and 3, respectively
are interleaved (i.e. threaded) in such a manner as to enforce
the directionality of the subsystems. Thus, it is clear that
the system functions correctly. Interesting enough, the CSC
equations derived in this work are similar to those presented
in the top-level control specification of an asynchronous A/D
converter [14].
However, the STG of Fig. 7(b) only assumes operation in
the forward direction, while the implementation in Section
V allows for operation in the reverse direction as well.
The modifications to the STG necessary to characterize this
additional behavior are documented in Table II. It should be
noted that transitions which contain the symbol “(M)” indicate
that a token is present at those locations during initialization.
The portion of the STG not encapsulated within the dotted
boxes corresponds to external token ring which passes the
control token through the system once the interrupt request
signals (i.e. R1, R3, R5) have been de-asserted. It is also worth
noting that the CSC signals of the inner loop (A+/−, B+/−,
and C+/−) are necessary when implementing the token ring
using sequential logic, but have no physical meaning if the
token ring is memoryless (i.e. combinational).
IEEE-TVLSI-00063-2013, VOL. X, NO. X, FEB. 2014 9
TABLE II
TRANSITION AND MARKING VARIATIONS BETWEEN THE FORWARD AND
REVERSE CONFIGURATIONS OF A SYSTEM COMPOSED OF THREE
INTERRUPT DEVICES
Forward Direction Reverse Direction
(FWD) (REV )
CSC2− → R1 + (M) CSC3− → R1+
R1− → R3+ R1− → R5 + (M)
R1− → CSC3− R1− → CSC2−
CSC3− → R3+ CSC1− → R3+
R3− → R5+ R3− → R1+
R3− → CSC1− R3− → CSC3−
CSC1− → R5+ CSC2− → R5 + (M)
R5− → R1 + (M) R5− → R3+
R5− → CSC2− R5− → CSC1−
IV. TOKEN RING (IMPLEMENTATION & RESULTS)
A. Control Token Ring Implementation
With the above theory in hand, we can now examine
implementations of embedded token rings suitable for use
in the control device presented in Fig. 3(b), which follows
the reconfiguration protocol as defined in Section III-D. Three
separate implementation styles are presented here, one combi-
national and two sequential, based on a ring oscillator, a Muller
pipeline, and a chain of fast David Cells (DCs), respectively
[22] [23] [24]. Each of these design styles for the token ring
carries various benefits and tradeoffs.
When utilizing a token ring implementation based on com-
binational logic, the outputs of the ring are a function of
the present inputs only, and as a consequence the resulting
circuit is easier to design than in the sequential case. The
most common example of this is a ring oscillator, which is
constructed from an odd number of inverting elements. It has
a cycle time, tcycle , which is governed by (10):
tcycle =
￿
tpLH + tpHL
2
￿
∗ ngate (10)
in which tpLH and tpHL are the low-to-high and high-to-
low gate response times, respectively, and ngate refers to the
number of gates in the inverter chain. The duty cycle of this
design is 50%, which allows only half of the clock cycle time
to be used for synchronization. However, the area cost of this
implementation is smaller than in a sequential implementation,
due to the simplicity of the circuit.
By contrast, sequential designs are governed by both the
present and prior outputs of the system. While such systems
are more difficult to design and verify, they also offer improved
synchronization times over their combinational counterparts.
Primarily, this is due to the sampling time of the master latch,
tsample , being governed by the mark to space ratio (i.e. the
ratio of time spent at logic “1” versus logic “0”) as defined
by the underlying STG. As a consequence, the sampling time
of the system, tsample, remains invariant to the number of
vertices (devices) in the chain. Consequently, the time allotted
to recover from metastability (i.e. the synchronization time)
increases as the number of devices in the chain increases.
Two sequential implementation styles lend themselves to the
design of an embedded token ring suitable for our purposes.
One of them is based on a Muller Pipeline, and the other is
based on fast David cells (i.e. cross-coupled NAND gates).
A Muller Pipeline is formed from a chain of C-elements (i.e.
gates which change their value only when all the inputs match)
and has a “mark” which varies between 3 to 5 gate delays,
eventually converging to an average of 4 gate delays over
several cycles of operation. Similarly, a chain of devices based
on fast David Cells also has a “mark” which is 4 gate delays
long. However, the latter implementation requires certain rel-
ative timing constraints in order to guarantee correct circuit
operation. Relative timing constraints can be understood as a
set of rules inherent to a signal transition graph which limit
the reachability of certain firing patterns thereby making sure
the conditions of complete state coding are easier to satisfy
[14] [25]. As a result, systems incorporating relative timing
constraints are simpler in design than those which make no
assumptions at all. However, care must be taken to ensure
that such assumptions are reasonable. Utilizing relative timing
constraints in the above manner results in an implementation
that is simpler and faster than the design for the token ring
based on a Muller Pipeline, but at the cost of being harder to
modify.
B. Implementation (Reconfigurable Token Ring)
Given the prior discussion on token ring topologies, let us
now examine a practical example of how an embedded token
ring controller, depicted by Fig. 8, changes its synchronization
parameters based on application of various one-hot codes. Fig.
9 depicts one output (Out4/CTR4) of an 8-way sequential
token ring based on fast DCs, with 5 possible configurations.
As the reconfiguration data signal RDATA, comprised of the
one-hot vector combination of [A, B, C, D, E], is applied
to the corresponding multiplexer (MUX) and demultiplexer
(DEMUX) inputs in Fig. 8, the number of DCs traversed
changes, with A representing the shortest path [CTR2, 3, 4,
5] and synchronization time, and E representing the longest
[CTR0, 1, 2, 3, 4, 5, 6, 7]. The synchronization time is depicted
as transient time between the falling edge of Out4 and its next
successive rising edge (tMSR(4), tMSR(5), tMSR(6), etc.).
Naturally, the methodology for the selection of the CTR
signals for matches that shown in Fig. 5, with the sole
exception of the path containing 3 elements [CTR5, 6, 7].
It does not exist because token rings composed of fast DCs
require a minimum of 4 elements to function (i.e. satisfy the
relative timing constraints of their STG) [25]. Only two other
points bear mentioning. First, the signal RST Bar functioned
as a reset signal to the token ring (i.e. control token flush), as
it was tested independently from the corresponding interrupt
device. Second, token rings comprised from combinational
elements have negligible behavioral differences from their
sequential counterparts, aside from requiring that the ring
contain a strictly odd number of elements.
V. INTERRUPT SUBSYSTEM (IMPLEMENTATION &
RESULTS)
A. Reconfigurable Interrupt Overview
Continuing forward from the prior discussion, the under-
lying implementation of the interrupt subsystem in this work
IEEE-TVLSI-00063-2013, VOL. X, NO. X, FEB. 2014 10
0
1
DC
8
DC
6
0
1
DC
4
0
1
DC
7
0
1
DC
5
0
1
0
1
DC
4
DC
6
0
1
DC
8
0
1
DC
5
0
1
DC
7
0
1
C A D B
C A D B
A C E B D
A C E B D
CTR0 CTR1 CTR2 CTR3 CTR4
CTR5 CTR7CTR6 CTR8 CTR9
MUX
DEMUXDEMUXDEMUXDEMUX
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
MUX MUX MUX MUX
DEMUXDEMUXDEMUXDEMUXDEMUX
DEMUX
MUX MUX MUX MUX MUX
Fig. 8. 8-way reconfigurable controller based on sequential logic (5 possible configurations)
Fig. 9. Annotated transient response of the self-timed reconfigurable token ring control circuit based on DCs illustrating the effect of increased parallelism
on synchronization time (Output: CTR4).
can now be explored. As stated previously, the reconfiguration
system acts as a set of individual modules linked together via
CSC threads. Each interrupt module uses a MUTEX to stall the
token ring while reconfiguration is performed, in accordance
with the protocol of Section III-D. For each module added
to the system, a delay equal to the propagation time across
a single MUTEX element is added to the critical path of the
embedded token ring.
When an external reconfiguration request arrives at the con-
troller bundled with reconfiguration data, the interupt subsytem
issues a resource request to the relevant MUTEX element, and
waits for a grant. The subsystem also waits for the head of
control token to arrive at the other MUTEX input (indicating
that the embedded token ring has been halted). Once both
conditions are met, the reconfiguration process begins. Old
reconfiguration data is then flushed from the interrupt hard-
ware and updated with new data, which is then fed to the
multiplexer (MUX) inputs of the embedded token ring. When
the new data has finished settling, an acknowledgement signal
is generated, indicating and the subsystem is free to proceed
to the next stage of the reconfiguration process (if any exist).
B. Limitation: Reconfiguration Data Merging
However, as the interrupt modules are tied together via CSC
threads and each module generates its own output signals
based on the reconfiguration data provided by its bundled
data lines, it is necessary to both select the appropriate output
signals during the reconfiguration process and also ensure that
the final configuration persists once the interrupt device has
powered down. To accomplish this a different multiplexer
element, hereafter referred to as INTMUX, has to be con-
structed which not only provides a single output signal to the
MUX elements in the token ring but also produces separate
acknowledgement signals specific to each interrupt module (in
the event of multiple configuration steps). Thus, the INTMUX
must be programmed with the control signal combinations
which correspond to each respective configuration so that
acknowledgement signals are generated which enforce the
behavior specified in Fig. 7 and Fig. 8. Furthermore, the
INTMUX must also contain redundant configurations that
IEEE-TVLSI-00063-2013, VOL. X, NO. X, FEB. 2014 11
16 U
DA
TA
INTERRUPT  MODULE  
#1
CS
C1
_I
N
J
RQ
1_
IN
J
RQ
+C
SC
_E
N
RQ
_E
XT
(F
W
D)
RQ
_E
XT
(R
EV
)
FL
AG
RC
_A
CK
1(
IN
)
CS
C_
EX
T(
FW
D)
CS
C_
EX
T(
RE
V)
TK
N
1_
IN
RQ
1
TK
N
1_
O
U
T
RC
_A
CK
1(
O
U
T)
CS
C1
16 U
DA
TA
INTERRUPT  MODULE  
#2
CS
C2
_I
N
J
RQ
2_
IN
J
RQ
+C
SC
_E
N
RQ
_E
XT
(F
W
D)
RQ
_E
XT
(R
EV
)
FL
AG
RC
_A
CK
2(
IN
)
CS
C_
EX
T(
FW
D)
CS
C_
EX
T(
RE
V)
TK
N
2_
IN
RQ
2
TK
N
2_
O
U
T
RC
_A
CK
2(
O
U
T)
CS
C2
16 U
DA
TA
INTERRUPT  MODULE  
#3
CS
C3
_I
N
J
RQ
3_
IN
J
RQ
+C
SC
_E
N
RQ
_E
XT
(F
W
D)
RQ
_E
XT
(R
EV
)
FL
AG
RC
_A
CK
3(
IN
)
CS
C_
EX
T(
FW
D)
CS
C_
EX
T(
RE
V)
TK
N
3_
IN
RQ
3
TK
N
3_
O
U
T
RC
_A
CK
3(
O
U
T)
CS
C3
RQ1
RQ2
RQ3
FLAG
CSC2 CSC3
CSC3
FL
AG
  M
EM
O
RY
  
BL
O
CK
FLAG
RC_ACK2(WRITE_EN)
M_INJ_D
M_INJ_EN
MEM
CSC1
CSC2CSC1
RQ+CSC_EN
INTMUX  MODULE
16 R
DA
TA
(M
U
X)
MUX_ACK1
MUX_ACK2
MUX_ACK3
16 R
DA
TA
16 R
DA
TA
16 R
DA
TA
FL
AG
M
EM
Fig. 10. Top view of three 16-bit interrupt modules tied together via CSC threads. Solid lines represent the portions of the system which were implemented.
Grayed out portions and dotted lines represent the connections and blocks which were unimplemented at the time of this study.
store the last known system state and persist once the system
operation has ceased. Thus, the INTMUX module also requires
memory which allows the device to “remember” the interrupt
module which contains the last known output signals. In this
way the behavior outlined by Fig. 7 and Fig. 8 is maintained,
and the system does not lose its output signals when the
interrupt module completes its operation.
C. Implementation (Interrupt Subsystem)
Having covered the architecture of the token ring previously,
the implementation of the reconfigurable interrupt modules
now merits examination. The lack of the INTMUX module
changes certain properties of the STG of Fig. 7(b). First,
all of the transition arcs derived from the INTMUX module
(MUX ACK1+/−, MUX ACK2+/−, and MUX ACK3+/−)
disappear from the STG (they become straight lines). Second,
the transition arcs related to the acknowledgement subsystem
of the interrupt modules (RC ACK1+/−, RC ACK2+/−, and
RC ACK3+/−) inherit the initial markings which previously
belonged to the transition arcs of the INTMUX module.
Furthermore, the signals CSC1, CSC2, and CSC3, in Fig. 10
correspond to the CSC threads of the same name in Fig. 7,
while the signals RQ1, RQ2, and RQ3 correspond to signals
R1, R3, and R5, respectively. The signals CSC INJ(1,2,3),
RQ INJ(1,2,3), and MEM INJ D in Fig. 10 are responsible
for injecting the initial markings for the CSC, request, and
memory blocks of the STG, while the signals RQ+CSC EN,
and MEM INJ EN act as their active-high enable signals.
The following example documents the operation of three 16-
way reconfigurable interrupt devices linked together via CSC
threading as discussed previously, albeit with some notable
differences. MUX elements have been inserted into the CSC
threads, which make it possible to reverse the directionality of
the reconfiguration with a single signal line, FLAG, as shown
in Fig. 10. When the FLAG signal is high, the token traverses
the interrupts in order from module 1 to 3 (FWD direction),
and vice versa when FLAG is low (REV direction). If the
FLAG and MEM signals have initial logic values which are
different, then the token traverses all three interrupt devices
(multi-mode operation), whereas only one interrupt module
is activated if the initial values are the same (single-mode
operation). Perhaps one of the most interesting points in Fig.
10 deals with the activation and termination of the interrupt
subsystem. The STG of Fig. 7(b) is cyclic (i.e. the first
IEEE-TVLSI-00063-2013, VOL. X, NO. X, FEB. 2014 12
0
1
0
1
0
1
M
UT
EX
RQ_PTH1
RQ_PTH2
GR_PTH1
GR_PTH2
N N
UDATA RDATA
RC_EN RC_ACK
RE
CO
NF
IG
 
M
OD
UL
E 
0
1
RQ_EXT(REV)
RQ_EXT(FWD)
CSC_EXT(REV)
CSC_EXT(FWD)
CSC_INJ
RC_ACK(IN) RQ+CSC_EN
RQ_INJ TKN_IN TKN_OUT
RC_ACK(OUT)
CSC_OUT
FLAG
RQ_OUT
CS
C 
GE
NE
RA
TE
 
M
OD
UL
E
CSC_EXT
RQ_FB
RQ_EXT
CSC_OUT
CSC_FB
CS
C 
RE
QU
ES
T 
M
OD
UL
E
RC_ACK(BAR)
CSC_OUT(BAR)
CSC_EXT
RQ_OUTRQ_FB
Fig. 11. Internal view of an individual 16-bit interrupt module (i.e. module #1 in the example).
and final states are identical), which means that forcing the
interrupt modules to halt their operation and terminate is
problematic. In order to force the system to stop operating,
a specific CSC line must be de-asserted (CSC3 in the FWD
mode, and CSC1 in REV mode).
In Fig. 10, RC ACK2 acts as an enable to the FLAG
MEMORY block, and copies the current value of the FLAG
into MEM if the initial values the two signals were different.
Once the current value of FLAG has been copied, an XOR
gate is triggered which causes either the CSC3 (FWD) or
CSC1 (REV) signal observed at interrupt module 2 to be
forced low, shutting down the system. The only signals which
persist after the reconfiguration process is complete are the
RDATA outputs and the FLAG signal. It should be noted that
the FLAG signal (whether ‘0’ or ‘1’) must persist after the
reconfiguration operation is complete in order to guarantee
that the system remains off.
Fig. 11 depicts the circuit of an individual interrupt module.
In the following example, it is responsible for the signals
CSC1, RQ1, and the reconfiguration data, RDATA, associated
with the leftmost interrupt module in Fig. 10. The UDATA
arrow represents the unencoded 16-bit reconfiguration input
data, while the CSC OUT (CSC1) signal depicts the CSC
thread which is generated inside the module by the CSC
GENERATE block. It is a function of itself, CSC FB, the
request signal specific to that module, RQ FB (RQ1), the
external CSC thread, CSC EXT, and the external request line,
RQ EXT, as defined by Fig. 10 (which are selected via the
FLAG signal). The output of the CSC generation module for
the signal CSC1 is defined in (11).
CSC1 =
￿
CSC1 ∗ CSC2￿+RQ1
+RQ2 (FWD)
CSC1 =
￿
CSC1 ∗ CSC3￿+RQ1
+RQ3 (REV )
(11)
The request signal specific to the module RQ OUT (RQ1)
is generated from the REQUEST block. The block uses
the acknowledgement signal generated from the output of
the interrupt module (RC ACK1), the internally generated
CSC signal CSC OUT (CSC1), the externally generated CSC
signal used above (CSC EXT), and its own feedback signal
(RQ FB). The output of the request module for the signal
RQ1 is characterized in (12).
RQ1 =
RC ACK1(CSC1 ∗ CSC2
+RQ1) +
￿
RQ1 ∗ CSC1￿ (FWD)
RQ1 =
RC ACK1(CSC1 ∗ CSC3
+RQ1) +
￿
RQ1 ∗ CSC1￿ (REV ) (12)
The rest of the system follows the reconfiguration proto-
col defined in Section III-D. The request and grant signals
RQ PTH1 and GR PTH1 represent the path of the control to-
ken taken during reconfiguration, while the signals RQ PTH2
and GR PTH2 define the path taken during normal system
operation. The signal RCMOD16 EN (RC EN1) represents
the enable signal used to update the 16-bit reconfiguration
data, RDATA, of the module while RC ACK (RC ACK1)
represents acknowledgement signal used to indicate when the
update operation has completed. Interrupt modules 2 and 3 are
constructed similarly.
Fig. 12 shows the transient response of the control signals
in the circuit when the three interrupt modules are con-
nected together in a ring of 15 inverting elements, via their
TKN IN/TKN OUT ports, with each interrupt module being
placed a uniform distance of 5 inverting elements apart from
each other. It illustrates how the request and acknowledgement
signals in the interrupt devices can be controlled to both fire
in a specific order and then terminate their operations, as
discussed previously. The duration of the simulation is 33
ns, testing 4 cases. The initialization time is 2 ns for the
first configuration and 1 ns for each configuration thereafter.
Cases FWD3 (FLAG = ‘1’, CSC = ‘110’, MEM = ‘0’) and
REV3 (FLAG = ‘0’, CSC = ‘011’) simulate the operation
of the system in the forward and reverse directions where
intermediate reconfiguration is necessary. Similarly, cases
FWD1 (FLAG = ‘1’, CSC = ‘011’) and REV1 (FLAG = ‘0’,
CSC = ‘110’) simulate the operation of the system in the
forward and reverse directions where it is not. The duration
of each reconfiguration operation is characterized in Table III
while the energy consumed is depicted in Table IV.
IEEE-TVLSI-00063-2013, VOL. X, NO. X, FEB. 2014 13
Fig. 12. Transient response of the control signals for the subsystem of three interrupt devices, which demonstrates the firing behavior of the design.
Reconfiguration data, CSC, FLAG, and MEM signals have been omitted.
TABLE III
DURATION OF THE ACTIVE REGION OF THE INTERRUPT SUBSYSTEM FOR
DIFFERENT CONFIGURABLE MODES AT A TN-TP CORNER WITH VDD =
1.0 V
Temp. FWD3 FWD1 REV 3 REV 1
( ◦C) (ns) (ns) (ns) (ns)
0 3.669 1.391 4.822 1.686
27 3.817 1.479 5.009 1.775
100 4.068 1.657 5.473 1.982
TABLE IV
AVERAGE POWER CONSUMPTION OF THE ACTIVE REGION OF THE
INTERRUPT SUBSYSTEM FOR DIFFERENT CONFIGURABLE MODES AT A
TN-TP CORNER WITH VDD = 1.0 V
Temp. FWD3 FWD1 REV 3 REV 1
( ◦C) (µW ) (µW ) (µW ) (µW )
0 291.93 267.64 232.19 223.72
27 290.39 260.12 227.88 214.10
100 281.04 245.86 226.40 211.82
As shown in Table III, the duration of the reconfiguration
operation varies based on the number of interrupt devices
required to complete the reconfiguration process as well as
whether the directionality of the control token opposes the
flow of the CSC threading in the system. Table III assumes
that the forward (i.e. FWD1, FWD3) direction corresponds to
when the flow of both the CSC threads and the control token
operate in the same direction, and vice versa in the reverse
(i.e. REV1, REV3) direction. Consequently, the control token
always “hits” in the forward direction (the token arrives after
the request has been issued to the next interrupt device), while
in the opposite case it always “misses” (it arrives before),
resulting in higher latencies in the reverse direction.
Table IV characterizes the average power consumption of
the system during the active portions of the reconfiguration
process across varying temperature parameters. The system
consumes an average power which varies between 212 and 292
µW with a maximum variance of 72.5% when tested using a
corner analysis. The worst case corner was the fast-fast case at
a temperature of 100◦C. Durations similar to those supplied
in Table III were used in order to adjust the intervals over
which the average power consumption was measured in order
to ensure the fairness of the testing procedure. However, these
results are confounded in the following way.
The interrupt subsystem used in the test bench of Fig. 12
requires the interaction of three separate interrupt devices.
Therefore, the power consumption of the subsystem is spread
across every interrupt device, even in single mode operation.
Dividing by three, the average power consumption of each
interrupt module during active operation is found to range
from 70.7 to 97.3 µW across process corners at a nominal
temperature of 27◦C.
When the system is inactive (i.e. not receiving requests) the
interrupt devices only consume an average power equal to the
cumulative summation of the standby leakage currents across
the individual transistors in the module multiplied by the
supply voltage applied to them. Thus, the power consumption
of the interrupt subsystem will vary in proportion to the
frequency of reconfiguration requests received.
VI. CONCLUSION
In closing, a self-timed reconfigurable controller for a paral-
lel synchronizer has been proposed, which allows the designer
to manipulate the MTBF of the same via the application of a
one-hot control codes. Relevant topics regarding the principles
of adjacency within the context of an embedded graph have
been covered. The effects of one-hot codes on the synchro-
nization time alloted by the controller have been discussed.
Lastly, an interrupt based protocol for reconfiguration has been
presented, focusing on how CSC threads can be used to control
cyclic graphs.
A sequential implementation for an embedded token ring
controller has been analyzed. A reconfigurable interrupt mod-
ule has been studied, with regard to operating frequency
IEEE-TVLSI-00063-2013, VOL. X, NO. X, FEB. 2014 14
and energy consumption per operation. The device demon-
strated speeds ranging from 505 to 818 MHz per module at
temperatures ranging from 0 to 100◦C, and average power
consumptions per operation ranging from 70.7 to 97.3 µW at
a nominal temperature of 27◦C across all process corners.
Future studies will focus on further exploring the properties
of controlling digital systems via the use of cyclic graphs,
while remedying the limitations previously discussed in this
work, while also extending the discussion to incorporate
topics such as robustness, as well as further integrating the
proposed design into the current literary framework of adaptive
synchronization methods.
REFERENCES
[1] E. G. Friedman, “Clock distribution networks in synchronous digital
integrated circuits,” Proceedings of the IEEE, vol. 89, no. 5, pp. 665–
692, 2001.
[2] D. S. Bormann and P. Y. Cheung, “Asynchronous wrapper for het-
erogeneous systems,” in Computer Design: VLSI in Computers and
Processors, 1997. ICCD’97. Proceedings., 1997 IEEE International
Conference on. IEEE, 1997, pp. 307–314.
[3] W. J. Dally and B. Towles, “Route packets, not wires: On-chip
interconnection networks,” in Design Automation Conference, 2001.
Proceedings. IEEE, 2001, pp. 684–689.
[4] E. Beigne´, F. Clermidy, P. Vivet, A. Clouard, and M. Renaudin, “An
asynchronous noc architecture providing low latency service and its
multi-level design framework,” in Asynchronous Circuits and Systems,
2005. ASYNC 2005. Proceedings. 11th IEEE International Symposium
on. IEEE, 2005, pp. 54–63.
[5] D. J. Kinniment, Synchronization & Arbitration in Digital Systems.
Wiley Online Library, 2007.
[6] T. Chelcea and S. M. Nowick, “Robust interfaces for mixed-timing sys-
tems,” Very Large Scale Integration (VLSI) Systems, IEEE Transactions
on, vol. 12, no. 8, pp. 857–873, 2004.
[7] J. Jex and C. Dike, “A fast resolving binmos synchronizer for parallel
processor interconnect,” Solid-State Circuits, IEEE Journal of, vol. 30,
no. 2, pp. 133–139, 1995.
[8] J. Zhou, D. Kinniment, G. Russell, and A. Yakovlev, “A robust synchro-
nizer,” in Emerging VLSI Technologies and Architectures, 2006. IEEE
Computer Society Annual Symposium on. IEEE, 2006, pp. 442–443.
[9] C. van Berkel, “Handshake circuits: An intermediary between com-
municating processes and VLSI,” Ph.D. dissertation, Technische Univ.,
Eindhoven (Netherlands)., 1992.
[10] J. Ebergen, “Squaring the fifo in gasp,” in Asynchronous Circuits and
Systems, 2001. ASYNC 2001. Seventh International Symposium on.
IEEE, 2001, pp. 194–205.
[11] C. Brej, “Wagging logic: Implicit parallelism extraction using asyn-
chronous methodologies,” in Application of Concurrency to System
Design (ACSD), 2010 10th International Conference on. IEEE, 2010,
pp. 35–44.
[12] M. Alshaikh, D. Kinniment, and A. Yakovlev, “A synchronizer design
based on wagging,” in Microelectronics (ICM), 2010 International
Conference on. IEEE, 2010, pp. 415–418.
[13] J. U. Horstmann, H. W. Eichel, and R. L. Coates, “Metastability behavior
of cmos asic flip-flops in theory and test,” Solid-State Circuits, IEEE
Journal of, vol. 24, no. 1, pp. 146–157, 1989.
[14] J. Cortadella, Logic Synthesis for Asynchronous Controllers and Inter-
faces. Springer, 2002, vol. 8.
[15] I. E. Sutherland and R. F. Sproull, “Logical effort: designing for speed
on the back of an envelope,” in Proceedings of the 1991 University of
California/Santa Cruz conference on Advanced research in VLSI. MIT
Press, 1991, pp. 1–16.
[16] U. Frank, T. Kapshitz, and R. Ginosar, “A predictive synchronizer for
periodic clock domains,” Formal Methods in System Design, vol. 28,
no. 2, pp. 171–186, 2006.
[17] A. Martin, “Synthesis of asynchronous vlsi circuits,” Formal Methods
for VLSI Design, pp. 237–283, 1990.
[18] T.-A. Chu, “Synthesis of self-timed vlsi circuits from graph-theoretic
specifications,” Ph.D. dissertation, Massachusetts Institute of Technol-
ogy, Dept. of Electrical Engineering and Computer Science, 1987.
[19] A. Kondratyev, J. Cortadella, M. Kishinevsky, E. Pastor, O. Roig,
and A. Yakovlev, “Checking signal transition graph implementability
by symbolic bdd traversal,” in Proceedings of the 1995 European
conference on Design and Test. IEEE Computer Society, 1995, pp.
325–332.
[20] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and
A. Yakovlev, “Petrify: a tool for manipulating concurrent specifications
and synthesis of asynchronous controllers,” IEICE Transactions on
information and Systems, vol. 80, no. 3, pp. 315–325, 1997.
[21] I. Poliakov, D. Sokolov, and A. Mokhov, “Workcraft: a static data flow
structure editing, visualisation and analysis tool,” in Petri Nets and Other
Models of Concurrency–ICATPN 2007. Springer, 2007, pp. 505–514.
[22] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital integrated
circuits: A Design Perspective. Prentice Hall Upper Saddle River, NJ,
2003.
[23] D. E. Muller and W. S. Bartky, “A theory of asynchronous circuits,” in
Theory of Switching, Proceedings. International Symposium on, 1959,
pp. 204–243.
[24] D. J. Kinniment, A. Bystrov, and A. V. Yakovlev, “Synchronization
circuit performance,” Solid-State Circuits, IEEE Journal of, vol. 37,
no. 2, pp. 202–209, 2002.
[25] D. Sokolov, “Automated synthesis of asynchronous circuits using direct
mapping for control and data paths,” Ph.D. dissertation, University of
Newcastle upon Tyne, 2006.
James S. Guido is a postgraduate researcher in the
Microelectronics System Design research group at
Newcastle University in the United Kingdom and is
currently in pursuit of a Ph.D in electrical, electronic,
and computer engineering. He received a Bachelor
of Science (B.Sc) in electrical and computer engi-
neering from the University of Rochester in 2006,
and a Master of Science (M.Sc) in electrical engi-
neering from the Univesity of Rochester in 2008.
His current research interests include the design
of robust interfaces utilizing asynchronous circuits,
with special interest on the reliable synchronization of such interfaces across
different interconnect technologies.
Professor Alexandre (Alex) Yakovlev is a Dream
Fellow of Engineering and Physical Sciences Re-
search Council (EPSRC), United Kingdom, to inves-
tigate different aspects of energy-modulated comput-
ing. He received D.Sc. from Newcastle University
in 2006, and M.Sc. and Ph.D. from St. Petersburg
Electrical Engineering Institute in 1979 and 1982
respectively, where he worked in the area of asyn-
chronous and concurrent systems since 1980, and in
the period between 1982 and 1990 held positions of
assistant and associate professor at the Computing
Science department.
Since 1991 he has been at the Newcastle University, where he held a
professorial position head the Microelectronic Systems Design research group
at the School of Electrical and Electronic Engineering. His current interests
and publications are in the field of modelling and design of asynchronous,
concurrent, real-time and dependable systems on a chip.
He has published four monographs and more than 300 papers in academic
journals and conferences, and has managed over 25 research contracts. He has
chaired program committees of several international conferences, including the
IEEE Int. Symposium on Asynchronous Circuits and Systems (ASYNC), Petri
nets (ICATPN), Applications of Concurrency to Systems Design (ACSD),
and he has been Chairman of the Steering committee of the Conference on
Application of Concurrency to System Design since 2001. He is a Senior
Member of the IEEE and Member of IET.
