A Scheduling Discipline for Latency and Bandwidth Guarantees in Asynchronous Network-on-Chip by Bjerregaard, Tobias & Sparsø, Jens
A Scheduling Discipline for Latency and Bandwidth Guarantees
in Asynchronous Network-on-Chip
Tobias Bjerregaard and Jens Sparsø
Technical University of Denmark (DTU)
Informatics and Mathematical Modelling
2800 Lyngby, Denmark
Email: {tob, jsp}@imm.dtu.dk
Abstract
Guaranteed services (GS) are important in that they pro-
vide predictability in the complex dynamics of shared com-
munication structures. This paper discusses the implemen-
tation of GS in asynchronous Network-on-Chip. We present
a novel scheduling discipline called Asynchronous Latency
Guarantee (ALG) scheduling, which provides latency and
bandwidth guarantees in accessing a shared media, e.g. a
physical link shared between a number of virtual channels.
ALG overcomes the drawbacks of existing scheduling disci-
plines, in particular the coupling between latency and band-
width guarantees. A 0.12 µm CMOS standard cell imple-
mentation of an ALG link has been simulated. The opera-
tion speed of the design was 702 MDI/s.
1. Introduction
Physical issues of deep submicron technologies, as well
as design complexity issues of large scale chip designs,
make current poorly scalable solutions for global on-chip
communication such as busses, unsuited for future system-
on-chip (SoC). There is a general consensus that the com-
munication requirements, as well as the design ﬂow, of bil-
lion transistor SoC are best accommodated by shared, seg-
mented interconnection networks [3][13][8]. Recent years
have seen the development of such dedicated SoC com-
munication structures, known as Network-on-Chip (NoC).
NoC facilitates a truly modular and scalable design ap-
proach, allowing the easy integration of a variety of cores
in a SoC.
Chip-wide synchrony is becoming prohibitively difﬁcult
to achieve in large chips [1]. Possible solutions lie in the
concept of global asynchronous locally synchronous sys-
tems (GALS) [7][16]. With GALS, the use of fully asyn-
chronous circuits for implementing NoC seem an obvious
possibility. The problem of distributing a global clock can
be entirely avoided, and the integration of cores with dif-
ferent timing speciﬁcations becomes an integral part of the
design ﬂow. Only the network has a global span, and ben-
eﬁts from advantages of asynchronous circuits such as dis-
tributed control and zero dynamic idle power.
Traditionally multicomputer networks support best-
effort (BE) routing, for which no performance guarantees
are given. Much NoC research builds on this by im-
proving BE routing efﬁciency under the constraints of
single-chip system design [17][15]. In [18] the authors ar-
gue for the necessity of a combination of BE routing as
well as guaranteed service (GS) routing in NoC. Basi-
cally BE improves the average resource utilization while
GS incur predictability, a quality which is often desir-
able, in particular in real-time systems.
Previously published NoCs which provide GS are
ÆTHEREAL [18][9] and NOSTRUM [14]. Both are syn-
chronous and employ variants of time division multiplexing
(TDM) for providing per connection bandwidth (BW) guar-
antees. TDM has the drawback of the connection latency
being inversely proportional to the BW, thus connec-
tions with low BW and low latency requirements, e.g.
interrupts, are not supported. Also, TDM is not possi-
ble in asynchronous systems, which have no explicit notion
of time. In [12], an asynchronous NoC providing differen-
tiated services by prioritizing VCs was presented. Though
this approach delivers improved latency on prioritized con-
nections, no hard guarantees are given.
This work presents a novel scheduling discipline called
Asynchronous Latency Guarantee (ALG) scheduling, that
provides hard per connection latency and BW guaran-
tees. The guarantees are not inversely dependent on each
other, thus ALG overcomes the limitations of BW al-
location schemes based on TDM, and supports a wide
range of trafﬁc types characterized by different GS re-
quirements. At opposite ends of the GS spectrum; ALG
supports both latency critical, low BW trafﬁc such as inter-
rupts, but also streaming data, which does not have strin-
Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’05) 
1522-8681/05 $20.00 © 2005 IEEE 
Figure 1. An asynchronous network connect-
ing independently clocked cores facilitates
modularity in large scale SoC designs.
gent latency requirements but requires GS in terms of
BW. In addition, ALG operates in a completely asyn-
chronous environment. We demonstrate with a low area
0.12 µm CMOS standard cell implementation.
The rest of the paper is organized as follows. In Section 2
we introduce our asynchronous NoC MANGO (Message-
passing Asynchronous Network-on-chip providing Guaran-
teed services over OCP interfaces) for which ALG will con-
stitute the scheduling discipline. Section 3 looks at the back-
ground of GS, addressing current solutions, and deﬁning the
requirements for an optimal solution in NoC. Section 4 ex-
plains the concepts of ALG scheduling and provides proof
of its functionality, and Section 5 extends the proof given in
Section 4 to account for buffer limitations as well. In Sec-
tion 6 we describe our implementation of an on-chip ALG
link, and in Section 7 we present some simulation results.
Finally Section 8 provides a conclusion.
2. MANGO Overview
This section provides a brief overview of MANGO. As
shown in Figure 1, a MANGO NoC consists of network
adapters (NA), routers and links. Each core is connected to
the network through an NA, which provides high level com-
munication services, in the form of OCP (Open Core Proto-
col) transactions [2], on the basis of primitive packet rout-
ing services implemented by the network. Each NA is con-
nected to a router, and also performs the synchronization be-
tween the clocked core and the asynchronous network. The
routers are connected by links in a grid-type structure, ei-
ther homogeneous or heterogeneous.
A link in MANGO, as illustrated in Figure 2, implements
a number of independently buffered, logically separated vir-
Figure 2. Basic link in which virtual channels
A to D share a physical link.
tual channels (VCs). These VCs contend for access to the
shared physical link. VC control ensures that no ﬂit (ﬂow
control unit) stalls while making use of the shared media,
causing ﬂits from other VCs to be blocked. The implemen-
tation of VCs in asynchronous systems was the topic of our
work in [4] and will be further discussed in Section 6.
In [5] details of the MANGO router architecture were
presented. The MANGO router provides connectionless BE
routing as well as connection-oriented GS routing. A GS
connection is a logical point-to-point circuit between two
different NAs in the network. Such a virtual circuit is es-
tablished by reserving a sequence of VCs. A connection is
thus in effect a logical FIFO, distributed across the network.
This is a key feature of MANGO, and helps simplify the use
of connections, in particular with regard to end-to-end ﬂow
control. Connections are established using BE packets.
To enable modularity in instantiating the MANGO
router, BE and GS routing are implemented by two sep-
arate modules. The GS router is implemented by a
switching fabric which provides non-blocking switch-
ing between the input ports and the output buffers; any ﬂit
arriving at any input port will be routed to the appropri-
ate VC output buffer, immediately and without conges-
tion. Since the router is non-blocking, the only point at
which congestion between different connections can oc-
cur is during link access. Therefore GS can be real-
ized purely on the basis of link access arbitration. This is
a key concept of MANGO. The fact that MANGO imple-
ments output buffers furthermore makes the link access
arbitration circuits simple, and facilitates a modular ap-
proach to the implementation of GS; new GS schemes
can be instantiated simply by plugging a new link ar-
biter module into the router. Please refer to [5] for further
details.
3. Guaranteed Services
In the following we ﬁrst discuss network performance
parameters and establish a taxonomy. We then argue for
the need for connection-oriented routing and discuss GS
schemes used in current NoC and in macro networks, and ﬁ-
nally we propose a set of requirements for GS in NoC.
Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’05) 
1522-8681/05 $20.00 © 2005 IEEE 
3.1. Performance Parameters
Service guarantees are quantiﬁed in terms of one or more
performance parameters. Aspects of network performance
analysis are discussed in detail in [10], the basic parame-
ters being BW and latency. In order to appropriately spec-
ify service bounds, both BW and latency must be indicated.
A latency guarantee is useless, if the throughput which can
be sustained is too small, likewise a BW guarantee is use-
less without bounds on the latency incurred.
While the BW bound of a stream of ﬂits is determined
by the bottleneck in its path, the total latency of a ﬂit in a
network is characterized by the sum of latencies encoun-
tered. These include the network admission latency tadmit,
during which the required network connection is accessed,
and a number of hop latencies, a hop being the ﬂit move-
ment from the buffer in one routing node, across a link and
into the buffer in the neighboring routing node. The hop la-
tency consists of an access latency taccess, the time it takes
for the ﬂit to be granted access to the shared routing re-
sources, e.g. the link, plus a transmission latency tlink , the
time it takes to transmit the ﬂit to the buffer in the next rout-
ing node, once access has been granted. The total latency,
of a ﬂit traversing a path which is X hops long, is thus
ttotal = tadmit+ taccess1+ tlink1+ . . .+ taccessX + tlinkX .
3.2. Connection-Oriented GS
In order to provide hard service guarantees, connection-
oriented routing is absolutely essential. In connection-less
routing, all data travels on the same logical network, and
any transmission can potentially stall another. GS trafﬁc
must be logically independent of other trafﬁc in the net-
work. In MANGO (see Section 2) we use VCs to estab-
lish connections on logically independent virtual circuits.
Hard bounds on service guarantees are beneﬁcial from
a system-level point of view in that they promote a mod-
ular design ﬂow. Without such guarantees, a change in the
system may require extensive top-level re-veriﬁcation. Thus
GS in NoC holds the potential to reduce turn-around-time
of large SoC designs. Also, while formal veriﬁcation of the
performance of BE routing networks is often not possible –
as desirable in critical real-time systems – GS makes it so.
3.3. GS Schemes
An overview of scheduling disciplines for GS in packet-
switched networks is given in [19]. The basic solution to
providing BW guarantees is based on fair ﬂuid queuing
(FFQ). FFQ is a general form of head-of-line processor
sharing (HOL-PS), which implements separate queues for
each connection. The heads of the queues are serviced in
a manner such as to provide fair-share access to the shared
media, e.g. a link.
In an asynchronousNoC, FFQ-type access schemes have
the unpleasant drawback of a very high worst case latency.
A tree-arbiter can approximate FFQ, however since noth-
ing can be said concerning the timing of the inputs with re-
spect to each other, any packet arriving on a given chan-
nel, potentially has to wait for all other inputs before be-
ing serviced. Thus the worst case latency accumulated in an
asynchronous NoC implementing this link access scheme is
very high. Also, the access time is inversely proportional to
the BW reservation. To get low latency, a large portion of
the BW must be reserved. The TDM-based GS solutions of
ÆTHEREAL and NOSTRUM are also examples of FFQ-
type schemes. Since these NoCs are globally synchronous,
the latency through the network can be guaranteed to one
clock cycle per hop, however the latency in accessing a con-
nection at the source is still inversely proportional to the re-
served BW. Also, in order to realize such a low per-hop la-
tency, explicit end-to-end ﬂow control mechanisms are re-
quired, as the connections are not independently buffered.
To provide better bounds on the latency, decoupled from
the BW guarantees, a different scheme is needed.
Macro networks are of a globally asynchronous nature,
since it is obviously not possible to implement clock level
synchronization among network nodes in a wide area net-
work. This makes them somewhat similar to asynchronous
NoC. In employing FFQ-type solutions to GS, latency prob-
lems as described above for asynchronous NoC are a well
known drawback. In [20] a service discipline called rate-
controlled static-priority (RCSP) queueing was introduced
to overcome these drawbacks. An admission controller as-
signs an eligible transmission time to all incoming pack-
ets. When this point of time arrives, the packets are queued
in a static priority queue (SPQ). This way not only BW
guarantees but also latency guarantees can be provided,
independently of each other. The admission control how-
ever requires that the node has a local notion of time. This
makes it unsuitable for implementation in an asynchronous
NoC. Another drawback of the method is that it is non-
work-conserving, meaning that the router may be idle, even
though there are packets in the channel queues, waiting for
their eligible transmission time. This reduces the efﬁciency
of using the available network resources, and even if the la-
tency bounds are respected, the average connection latency
and the link utilization are reduced. A work-conserving ver-
sion was also proposed in [20], but this introduces the over-
head of an extra stand-by queue, which is a BE queue work-
ing in parallel with the RCSP queues.
3.4. Requirements for GS in NoC
Our proposal to the requirements of a solution for GS
in a NoC is that it (i) is simple in order to facilitate high
operation speed and low hardware overhead, (ii) is work-
conserving in order to make efﬁcient use of network re-
sources, and (iii) can provide bounds on latency and BW
which are decoupled, or at least not inversely dependent on
Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’05) 
1522-8681/05 $20.00 © 2005 IEEE 
Figure 3. A complete ALG link. The static priority queue (SPQ) prioritizes access to the link, provid-
ing latency guarantees, the admission control makes sure that the ﬂow of ﬂits adheres to the condi-
tions required by the SPQ, and the VC control ensures non-blocking behaviour.
each other. Additionally a requirement to a solution for GS
in an asynchronous NoC is that it (iv) does not require a no-
tion of time, neither local nor global. ALG conforms to all
of these requirements, and is thus a valid solution to provid-
ing GS in both synchronous and asynchronous NoC.
In Section 4 we explain the ALG scheduling discipline,
demonstrating its use in providing latency and BW guar-
antees on a shared link. As described in Section 2, in the
MANGO router architecture, link access guarantees are suf-
ﬁcient to provide end-to-end guarantees for a connection.
Note however that ALG-based access can be applied to any
shared media. Also, though our implementation is based on
asynchronous circuits, ALG is not restricted to such. How-
ever, the fact that ALG does not require a notion of time
makes it particularly suitable for asynchronous systems.
4. ALG Scheduling
In this section we explain the ALG scheduling discipline.
We ﬁrst provide an intuitive understanding of its workings,
and thereafter prove formally that it works. All indications
of time in the following are quantized using the time unit
ﬂit-time. A ﬂit-time is deﬁned as the time it takes to com-
plete one handshake cycle on the physical link. VC con-
trol measures ensure that no ﬂits will stall on the link, thus
the duration of such a handshake is well deﬁned. The cir-
cuits being asynchronous, naturally the ﬂit-time is not con-
stant throughout the entire network. However we assume
the ﬂit-time to be fairly uniform.
Figure 3 shows the complete ALG link. The ALG admis-
sion control and the static priority queue (SPQ) implement
the ALG scheduler. The VC control wraps around these.
How these three sub-systems work together to provide la-
tency and BW guarantees across multiple VCs sharing a
physical link will become clear in the following. The princi-
ples of ALG scheduling are best understood from the inside
out. The SPQ prioritizes VCs, providing latency guarantees
accordingly, but only under certain conditions. The admis-
sion control makes sure that these conditions are met. The
VC control mechanism ensures that ﬂits are transmitted on
the shared link only if there is free buffer space at the re-
ceiving end, thus preventing ﬂits from stalling on the link
and invalidating the latency and BW guarantees.
4.1. Prioritized Channels
In order to provide a latency guarantee, it is necessary
to provide bounds on the link access time. Looking at Fig-
ure 2, envision ﬂits arriving on channels A to D at random
but large intervals. Now consider the channels being ser-
viced by priority, A having the highest priority. Flits arriving
on A will always be serviced immediately, thus it is guaran-
teed that the maximum link access time is one ﬂit-time, i.e.
the time it would take to ﬁnish a potentially on-going trans-
mission. Since we – at this point – make the simplifying
assumption that there is a large interval between ﬂits arriv-
ing on A, ﬂits arriving on B will wait for A no more than
once before they are serviced. Thus ﬂits on B will be de-
layed a maximum of two ﬂit-times, since they will maxi-
mally wait for an on-going transmission to ﬁnish and for
a transmission on A. Likewise, C will wait a maximum of
three ﬂit-times, etc. As a result, the maximum link access
time is proportional to the priority of the channel. This –
the basic foundation of ALG – is the functionality that is
implemented by the SPQ in Figure 3.
4.2. Admission Control
The discipline explained above requires a large ﬂit inter-
val. This is not always possible to guarantee, in particular
in an asynchronous network with distributed routing con-
trol. Even if a speciﬁc ﬂit interval is provided at the source,
the network may introduce jitter to the data stream, caus-
ing the interval requirement to be invalidated somewhere in-
side the network [19]. This necessitates an admission con-
trol stage, which regulates admission to the SPQ. In Fig-
ure 3, the ALG admission control is illustrated as boxes in
Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’05) 
1522-8681/05 $20.00 © 2005 IEEE 
front of the SPQ. This is similar to RCSP used in macro net-
works, in that it also implements an admission control stage
and an SPQ. In RCSP however, admission is based on the
local timing of the channels. This is not possible in a fully
asynchronous system, which has no notion of time at all.
The condition, to be implemented by the ALG admis-
sion control for the latency bounds of the SPQ not to be in-
validated, is that a ﬂit on a given (higher priority) VC can
stall a ﬂit in another (lower priority) VC only once. This
can be achieved by looking at the ﬂits waiting in the SPQ
when a ﬂit is contending for access on a given VC. In order
not to invalidate the latency guarantee of ﬂits on lower pri-
ority VCs, all ﬂits that have been waiting while the preced-
ing ﬂit on the given VC was being prioritized in the SPQ,
must be serviced before a new ﬂit is admitted. This is en-
sured by sampling the occupation of the SPQ when a ﬂit is
being transmitted on the link. Once all the ﬂits waiting in
the SPQ at this point of time have been serviced, a new ﬂit
on the same VC can be admitted. Thus ﬂits on VCs of lower
priority will be stalled a maximum of one ﬂit-time by ﬂits
on each higher priority VC. Note that when a given ﬂit is
granted access to the link there will only be ﬂits waiting on
lower priority VCs, since by deﬁnition of the SPQ function-
ality, all ﬂits on higher priority VCs will have already been
serviced.
Figure 4 illustrates ALG by example. It is seen how the
latency guarantee of the B and C queues are being met. The
A queue has too many ﬂits coming in, and is thus being in-
hibited by the admission control. The reason for the burst on
A might be found at an earlier point in the network, due to
A ﬂits being transmitted very quickly (faster than the guar-
anteed latency bound) on previous links.
4.3. Latency and Bandwidth Guarantees of ALG
In this section we will state the latency and BW guar-
antees provided by an ALG connection. The results will be
deducted formally in Section 4.4.
The service guarantees of ALG are characterized by
the priority level of each VC reserved for the connection,
as well as the total number of VCs on each link. Con-
sider a connection for which the VCs with priority lev-
els Q1, Q2, . . . , QX have been reserved on a sequence of
ALG links 1, 2, .., X . Each link implements N VCs. The
links provide a bound of Q1, Q2, . . . , QX ﬂit-times on the
link access time. This is so, under the condition that a ﬂit
interval of tinterval ≥ N + Qmax − 1 ﬂit-times is re-
spected at the source, Qmax being the maximum Q value
on the sequence of VCs. This is the so called interval con-
dition, which will be derived in Section 4.4. The interval
condition is also an access rate guarantee for the connec-
tion, and as such characterizes the BW guarantee of the
connection, in terms of a fraction of the full link capacity:
BWmin = BWmin[Qmax] = BWlink/(N + Qmax − 1).
Figure 4. ALG operational example.
Note that the sum of the BW guarantees, on each of the
N VCs on a given link, results in less than 100% of the to-
tal link capacity. For a link implementing 8 VCs, a maxi-
mum of BWmin[1] + BWmin[2] + .. + BWmin[8] = 73%
of the total link BW can be reserved. This is acceptable
since most networks will need to allocate BW for the sup-
port of BE trafﬁc, alongside the GS connections. The im-
portant point is to note the fact that the latency guarantee
provided by ALG is decoupled from the BW guarantee. In-
creasing N , the number of VCs on a link, the BW guaran-
tee can be made arbitrarily small while still maintaining a
link access time of down to one ﬂit-time. Thus latency criti-
cal connections with low BW needs, e.g. interrupts, are sup-
ported without the need to over-allocate BW.
Although existing (synchronous) GS disciplines for NoC
based on TDM-type BW allocation realize a one ﬂit-time
per hop latency, the initial connection access latency still
causes the total end-to-end latency to be inversely propor-
tional to the BW guarantee. ALG provides instant access
to the GS connection, as long as the interval condition is
met. Also one may note that the forward latency per stage
in an asynchronous network can be made very small, much
less than a clock cycle of a comparable synchronous cir-
cuit. Thus while ALG guarantees a bound on the latency, an
asynchronous NoC also potentially has a much lower min-
imum latency. In this lies a major advantage of implement-
ing NoC using asynchronous versus synchronous circuits.
Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’05) 
1522-8681/05 $20.00 © 2005 IEEE 
4.4. Proof
In the following we will prove that if the interval con-
dition is respected at the source, a bound on the end-to-
end connection latency can be made. The admission con-
trol might hold a ﬂit, but only if the ﬂit is ahead of its global
schedule, causing the ﬂit intervals observed locally to be
shortened. The proof consists of two parts. In the ﬁrst part
we prove that the ALG discipline works for a single link.
We ﬁrst show that the ﬁrst ﬂit transmitted on a connection
meets its latency requirements, or makes its deadline. Then
we show that any ﬂit following a ﬂit that made its deadline,
and which adheres to the interval condition, will also make
its deadline, and from this we reach the value of the inter-
val. By induction, all ﬂits adhering to the interval condition
make their deadlines. In the second part of the proof, we
prove that for a sequence of ALG links a ﬂit will make its
deadline at each link, if the interval condition is respected
at the source. Thus the end-to-end latency for the connec-
tion is bounded by the sum of the latency guarantees on
each link, regardless of the interval condition being invali-
dated inside the network.
The Single Link Theorem: On an ALG link implement-
ing N VCs, all ﬂits on VC Q will be guaranteed a maximum
link access time of Q ﬂit-times under the ﬂit interval condi-
tion of tinterval ≥ N + Q− 1 ﬂit-times.
Proof: Take a given link implementing N VCs each cor-
responding to a priority level 1, 2, 3, . . . , N in the SPQ. The
ﬁrst ﬂit arriving on a given VC Q ∈ {1, . . . ,N} will be
granted access to the SPQ immediately. In the SPQ it will
wait for a maximum of Q ﬂit-times before being granted
access to the link. Thus it makes its deadline, which is
bounded by a maximum link access time of Q ﬂit-times.
Now consider on Q, a ﬂit A which was granted access to
the SPQ immediately thus making its deadline, and a ﬂit B
following ﬂit A, also on Q. Flit B arrives tinterval ﬂit-times
after ﬂit A. Flit A was waiting 0 ﬂit-times for access to the
SPQ, and a maximum of Q ﬂit-times in the SPQ. A max-
imum of N − Q ﬂits, the number of VCs of lower prior-
ity than Q, were waiting in the SPQ when ﬂit A was granted
access to the link. According to the ALG discipline, these
must all be transmitted before the next ﬂit on Q is granted
access to the SPQ. In a worst case scenario, a maximum of
Q−1 ﬂits, the number of VCs of higher priority than Q, can
take priority over the N−Q ﬂits that must be transmitted be-
fore the admission control of Q admits ﬂit B to the SPQ. The
sum of these partial delays indicate the maximum time that
can pass between one ﬂit on Q and the following being ad-
mitted to the SPQ, Q + (N −Q) + (Q− 1) = N + Q− 1.
This means that if tinterval ≥ N +Q−1 ﬂit-times, then the
ﬂit B will be sure to be granted access to the SPQ immedi-
ately, and waiting a maximum of Q ﬂit-times in the SPQ, it
too will make its deadline.
Thus, under the interval condition, since any ﬂit follow-
ing a ﬂit which made its deadline will itself make its dead-
line, and since the ﬁrst ﬂit makes its deadline, by induction
all ﬂits will make their deadlines. 
We now show that, for a sequence of ALG links, even if
the interval condition is invalidated locally, due to jitter be-
ing introduced in the network, ALG ensures that all ﬂits
make their deadlines at each link. Thus the end-to-end la-
tency bound is the sum of the latency bounds at each link. At
this point, we are still assuming that there is always enough
buffer space in the nodes. In Section 5 we strengthen the
proof, calculating the buffer requirement.
The Sequence of Links Corollary: Under the assump-
tion that there is always enough buffer space, for a given
connection having reserved VCs Q1, Q2, . . . , QX on a se-
quence of X ALG links each implementing N VCs, the
latency bound is the sum of the latency bounds at each
link, under the condition that a ﬂit interval of tinterval ≥
N + Qmax − 1 ﬂit-times is respected at the source. Here,
Qmax is the maximum of {Q1, Q2, . . . , QX}.
Proof: Consider a link on the connection in question, on
which VC Q ∈ {Q1, Q2, . . . , QX} has been reserved, and
a ﬂit A on the connection, which has made its deadline on
that link. Since the ﬂit made its deadline, according to the
proof of the Single Link Theorem above, the admission con-
trol will open for admission to the SPQ of a proceeding ﬂit
B, on the same VC, a maximum of N +Q− 1 ﬂit-times af-
ter ﬂit A was granted access to the SPQ. Since ﬂit A made
its deadline, it was on or ahead of its schedule. If ﬂit B is fur-
ther ahead of its schedule than ﬂit A, it will arrive less than
N + Q − 1 ﬂit-times after ﬂit A was granted access to the
SPQ, and the admission control might not grant it access to
the SPQ immediately. At the latest N +Q− 1 ﬂit-times af-
ter ﬂit A was granted access, ﬂit B will be sure to be granted
access. Their separation at the source was N + Qmax − 1
ﬂit-times. It will be this or less now, so ﬂit B will be at least
as far ahead of its schedule as ﬂit A. Thus it will also make
its deadline. If ﬂit B on the other hand is less ahead of its
schedule than ﬂit A – due to congestion at an earlier stage
in the network – it will arrive more than N + Qmax − 1
ﬂit-times after ﬂit A was granted access to the SPQ. It will
thus be granted access immediately, and make its deadline.
The ﬁrst ﬂit transmitted on a connection makes its dead-
line, since it is not stalled in the admission control of any
link. Since any ﬂit, following a ﬂit making its deadline, will
itself make its deadline, by induction all ﬂits on the connec-
tion will make their deadlines, at all links. 
The minimum sustainable BW follows from this:
The Minimum Bandwidth Corollary: On a given
connection which has reserved VCs Q1, Q2, . . . , QX
on a sequence of X ALG links each providing a to-
tal bandwidth of BWlink , and each implementing
N VCs, the minimum bandwidth sustained will be
BWmin = BWlink/(N + Qmax − 1). Here, Qmax is the
Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’05) 
1522-8681/05 $20.00 © 2005 IEEE 
Figure 5. Model of a section of a sequence of VCs reserved by a connection.
maximum of {Q1, Q2, . . . , QX}.
Proof: According to the Sequence of Links Corollary, all
ﬂits on an ALG connection, adhering to the interval condi-
tion of tinterval ≥ N +Qmax−1 ﬂit-times, have a bounded
latency. Thus a stream of ﬂits can be transmitted at a ﬂit
rate of at least 1/(N + Qmax − 1) of the total ﬂit rate
supported by a link, without causing congestion. From this
follows directly that the sustainable bandwidth is at least:
BWmin = BWlink/(N + Qmax − 1). 
5. Buffers
In the previous section, we have assumed that ﬂits ﬂow
freely in the network, constrained only by the ALG link ac-
cess scheduling discipline. Since this work targets lossless
networks, in which ﬂits are never dropped, each link must
also implement back pressure ﬂow control, ensuring that a
ﬂit can only be transmitted on a VC if the receiving end
has free buffer space. This introduces an extra layer of ad-
mission control, the VC control shown in Figure 3. The VC
control wraps around the ALG admission control and the
SPQ, only letting ﬂits through if the receiving VC buffer in-
dicates that it has free space. A ﬂit must only be presented to
the ALG admission control if it can move freely to the re-
ceiving end of the link. Otherwise the latency guarantees
provided by the ALG discipline may be invalidated, by ﬂits
stalling on the link. On the other hand, a ﬂit must not be un-
duly delayed by the VC control, so that it is caused to miss
its deadline, again invalidating the ALG latency guarantees.
In this work we employ share-based VC control, which
we have described in [4]. The scheme, illustrated in Fig-
ure 6, uses a single wire per VC to implement non-blocking
access to a shared media, e.g. a link. After admitting a ﬂit
the sharebox locks, not allowing further ﬂits to pass. The ﬂit
passes across the media, to the unsharebox at the far side.
The unsharebox implements a latch, into which the ﬂit is ac-
cepted. When the ﬂit in turn leaves the unsharebox, the un-
lock control wire toggles. This unlocks the sharebox, admit-
ting another ﬂit to the media. As long as the media is dead-
lock free, no ﬂit will stall within it.
As illustrated in Figure 5, we model a connection as a
sequence of ALG links, with a direct circuit between the
input port and the VC buffer reserved for the connection.
As explained in Section 2, this assumption is valid for the
MANGO router architecture in which the GS router imple-
ments output buffers and non-blocking switching [5]. The
VC buffers in the ﬁgure implement unshare- and shareboxes
on their inputs and outputs respectively. Latencies involved
are the link access latency taccess which is the time it takes
for a ﬂit to be granted access to the link, the link forward-
ing latency tlink which is the latency of a ﬂit across the
link, through the GS router and into the next VC buffer,
once link access has been granted, and the unlock latency
tunlock which is the time it takes for the unlock signal to
travel back to the sharebox in the previous VC buffer, indi-
cating that another ﬂit can be granted access to the link. All
latencies apart from the link access latency are constant, as
no congestion occurs.
The end-to-end latency bound of a connection con-
sisting of a sequence of X ALG links, each imple-
menting N VCs, is similar to ttotal introduced in Sec-
tion 3: tend2end =taccess1 + tlink1 + taccess2 + tlink2 +
. . . + taccessX + tlinkX . For simplicity N is herein con-
sidered to be the same on all links. The link access time is
determined by the priority, Q1, Q2, . . . , QX , of the VC re-
served at each link: taccess1 = Q1 ﬂit-times, taccess2 = Q2
ﬂit-times, etc. The maximum Q on the connection, Qmax,
dictates the BW guarantee of the connection, accord-
ing to Section 4.3, since this is the bottleneck of the path:
BWmin = BWlink/(N + Qmax − 1).
We now need to determine the requirements for the
Figure 6. Share-based VC control.
Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’05) 
1522-8681/05 $20.00 © 2005 IEEE 
Figure 7. Share- and unsharebox schematic.
sharebox to always be unlocked when a ﬂit matures for ac-
cess to the SPQ, i.e. when it is 0 time ahead of its schedule.
If this is so, the ﬂit will be presented to the ALG admis-
sion control, and according to the ALG discipline, it will
make its deadline. In the following, we will prove that un-
der the ﬂit interval condition and the link cycle condition
that tlink + tunlock < N −1 ﬂit-times, a single element VC
buffer is enough to allow the ALG scheduling discipline to
function properly.
The Single Buffer Theorem: Under the ﬂit interval con-
dition tinterval ≥ N+Qmax−1 ﬂit-times and the link cycle
condition tlink + tunlock < N − 1 ﬂit-times, a single ele-
ment ﬂit buffer, for each VC in each node, is enough to en-
sure the validity of the Sequence of Links Corollary.
Proof: As illustrated in Figure 5, consider a section of
a connection having reserved VCs (.., Qi, Qj, ..) on a se-
quence of ALG links, each implementing N VCs. The VC
buffers VCbufi and VCbufj each have buffer space for one
ﬂit. At reset they are empty, thus the ﬁrst ﬂit transmitted on
the connection is not limited by VC control, and will ac-
cording to the ALG discipline make its deadline.
Now consider a ﬂit B following a ﬂit A which is making
its deadlines. Since ﬂit A is making its deadlines, it will gain
access to SPQj latest at a time 0 which corresponds to it be-
ing 0 time ahead of its schedule. At this time VCbufj will
signal VCbufi that it is ready to accept another ﬂit. Thus
VCbufi will open its output for the next ﬂit, ﬂit B, at a time
tunlock later. Flit A must have left SPQi no later than at time
0− tlink , thus adm1 will allow ﬂit B to enter SPQi no later
than 0−tlink+N−1 = N−1−tlink ﬂit-times. If this time
is later than the time VCbufi lets ﬂit B through, then VCbufi
will not be the limiting agent of the ﬂow. The requirement
for the VC control not to be the limiting agent in the system
is thus: N−1−tlink > tunlock => tlink+tunlock < N−1
ﬂit-times. This constitutes the link cycle condition. If the
link cycle condition holds, ﬂit B will arrive at VCbufj at
a time: Q1 + tlink + N − 1 − tlink = Q1 + N − 1 ﬂit-
times, which is less or equal to the required ﬂit interval of
Qmax+N−1 ﬂit-times. Thus ﬂit B made its schedule in ar-
riving at admj .
Under the interval condition of a minimum ﬂit interval of
Qmax+N−1 ﬂit-times at the source, and under the link cy-
cle condition that tlink + tunlock < N − 1 ﬂit-times, any
ﬂit following a ﬂit which made its deadline will also make
its deadline. Since the ﬁrst ﬂit makes its deadline, by induc-
tion all ﬂits on the connection make their deadlines. 
6. Implementation
According to Figure 3, an ALG link consists of three ba-
sic subsystems: VC control, admission control and SPQ.
6.1. VC Control
The functionality of the VC control scheme that we em-
ploy, which we ﬁrst presented in [4], was described in Sec-
tion 5. Figure 7 shows the schematic for our implementa-
tion of the share- and unshareboxes of one VC. The sin-
gle wire unlock signal functions as a 2-phase acknowledge.
The pulse generated by pulse gen must be long enough to
reset the C-element c lock. The output decouple circuit at
the output of the sharebox decouples the shared media from
the VC. Thus a free ﬂow of ﬂits is ensured, regardless of in-
dividual VCs being slow.
6.2. Admission Control
The novelty of ALG scheduling lies in the admission
control stage, which controls the ﬂow of ﬂits, allowing the
SPQ to provide appropriate latency bounds.
Each channel of the admission control implements a sta-
tus register of one bit for each channel of lower priority.
When one or more of the status bits of a given channel are
set, the admission control stops admission of ﬂits, to the
SPQ, on that channel. When a ﬂit on the channel is granted
access to the link, the status bits are set according to a snap-
shot of the occupancy of the SPQ. The occupancy indicates
which channels that have ﬂits waiting in the SPQ, while the
given channel is granted access to the link (being priori-
tized). The status bits are subsequently reset as these wait-
ing ﬂits are granted access to the link. When all have been
transmitted, the status bits are all clear, and the admission
control admits another ﬂit on the given channel.
Figure 8 shows the schematic of the admission control
for channel n. The status bit registers [n − 1..0], one for
each lower priority channel, are implemented as RS-latches.
Consider channel n as being the highest priority channel
contending for access to the link at a given time. The SPQ
generates the occupancy vector, and its value is stable while
the acknowledge of n is high. The set inputs of a status bit
registers is a logical AND of n’s acknowledge and the oc-
cupancy vector. This way the appropriate status bits are set
according to the occupancy of the SPQ when the channel
is granted access to the link (indicated by its acknowledge
going high). The reset inputs of the status bit registers are
simply connected to the acknowledge signals of the corre-
sponding channels. When a channel is granted access to the
Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’05) 
1522-8681/05 $20.00 © 2005 IEEE 
Figure 8. Admission control schematic.
link, its acknowledge goes high, and accordingly the sta-
tus bit corresponding to this channel, in the admission con-
trol of each of the higher priority channels, is reset. When
all status bits are low, the input request is allowed to prop-
agate to the output. Since the local acknowledge causes the
status bits to be set, a C-element – rather than an AND-gate
– is needed in the request path. This ensures that the out-
put request is not lowered until the input request is lowered.
Note that set and reset of the status bit registers are mu-
tually exclusive, since only one channel can gain access to
the link at a given time.
6.3. Static Priority Queue
The SPQ prioritizes the VC queues. At any time, the in-
put with the highest priority is serviced before any queue of
lower priority. Our SPQ is based on the SPQ in [11], which
is based on ideas ﬁrst presented in [6].
7. Results
We have implemented a 16-bit, 8-VC ALG link us-
ing commercially available 0.12 µm CMOS standard cells.
Applying typical timing parameters, the design simulates
at a speed of 702 MDI/s, corresponding to a ﬂit-time of
1.42 ns. The shared, physical link implements a 3-stage
pipeline. The cell area, i.e. prelayout, of the entire link was
0.012 mm2, the core of the ALG scheduler (the admission
control and the SPQ) taking only 0.004 mm2. This is less
than the area used in the simple fair BW allocation scheme
presented in [4], implemented using an arbiter tree. This
shows that the beneﬁts of ALG are not at all costly in terms
of area. But due to the priority arbiter there is a slight per-
formance degradation (702 MDI/s as opposed to 795 MDI/s
in an improved version of the solution presented in [4]).
A test setup emulating connections on a sequence of
three ALG links was simulated. Two connections were ob-
served; a fast connection reserving high priority VCs, and
a slow connection reserving low priority VCs. The laten-
cies of ﬂits were recorded while random background trafﬁc
was induced on all other VCs. Figures 9 shows the distri-
bution of ﬂit latencies for different network loads, recorded
Figure 9. Flit latency distribution versus net-
work load: (a) slow path, (b) fast path.
over 10000 ﬂits. It is seen how even at 100% network load,
the ﬂits on the connections make their deadlines. As the
network load is increased, the latency distribution graph
pushes up against the latency bound, but never crosses it.
Forward latency bounds from 3.6 ns/hop upwards, in in-
crements of 1.42 ns (one ﬂit-time), are obtainable. This in-
cludes the ALG access latency and the constant forward la-
tency across the link (sharebox, merge, pipeline, split and
unsharebox – app. 2.2 ns). The BW guarantee on the fast
connection was 1/8 ∗ 702 MDI/s = 88 MDI/s, while it
was 1/15 ∗ 702 MDI/s = 47 MDI/s on the slow one.
Table 1 compares the guarantees of ALG to those of ex-
isting scheduling schemes used in NoC. In the table, N is
the number of VCs on each link and h is the number of
hops spanned by a given connection. TDM is used in syn-
chronous NoCs, and provides bounds on the connection la-
tency down to N + h, given that some sort of end-to-end
ﬂowcontrol is implemented. If not, the latency bound is re-
duced to the level of asynchronous fair-share, i.e. (N+1)∗h.
The table shows that ALG provides far better bounds on la-
tency, and that it is generally more ﬂexible in terms of vari-
ety of types of connections that can be instantiated.
8. Conclusion
We have presented a new ﬂit scheduling discipline
called Asynchronous Latency Guarantee (ALG) schedul-
ing. ALG allows the reservation of GS connections in both
synchronous and asynchronous networks, for which la-
tency and bandwidth guarantees that are not inversely de-
pendent on each other can be provided. It thus overcomes
Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’05) 
1522-8681/05 $20.00 © 2005 IEEE 
Synchronous Asynchronous
TDM fair-share ALG fast path ALG slow path
tadmit N 0 0 0
taccess 1 N 1 N
tlink 1 1 1 1
Latency N + h (N + 1) ∗ h h (N + 1) ∗ h
Bandwidth 1/N 1/N 1/N 1/(2N − 1)
Table 1. Latency and bandwidth guarantees of different GS schemes.
the drawbacks of present solutions based on fair-ﬂuid queu-
ing (FFQ): time division multiplexing (TDM) in syn-
chronous and fair-share in asynchronous networks. A
fully asynchronous 16-bit, 8-VC ALG link was imple-
mented using commercial 0.12 µm CMOS standard cells.
Simulations show the correct functionality of the cir-
cuit, in accordance with the theoretical ground work.
The results clearly indicate that the ALG discipline is a
valid scheme for providing hard latency and BW guarantees
in shared interconnection networks.
Acknowledgment
The authors thank Tiberiu Chelcea for a standard cell
mutex, Thomas Bolander for reviewing the proofs and sev-
eral anonymous reviewers for helpful comments.
References
[1] International technology roadmap for semiconductors
(ITRS) 2001. Technical report, International Technol-
ogy Roadmap for Semiconductors, 2001.
[2] Open Core Protocol Speciﬁcation, Release 2.0.
www.ocpip.org, 2003.
[3] L. Benini and G. D. Micheli. Networks on chips: A new SoC
paradigm. IEEE Computer, 35(1):70 – 78, January 2002.
[4] T. Bjerregaard and J. Sparsø. Virtual channel designs for
guaranteeing bandwidth in asynchronous network-on-chip.
In Proceedings of the IEEE Norchip Conference, 2004.
[5] T. Bjerregaard and J. Sparsø. A router architecture for
connection-oriented service guarantees in the MANGO
clockless network-on-chip. In Proceedings of Design, Au-
tomation and Testing in Europe Conference (DATE05), 2005.
[6] A. Bystrov, D. J. Kinniment, and A. Yakovlev. Priority ar-
biters. In Proceedings of the Sixth International Symposium
on Advanced Research in Asynchronous Circuits and Sys-
tems, pages 128–137. IEEE Comput. Soc., 2000.
[7] D. Chapiro. Globally-Asynchronous Locally-Synchronous
Systems. PhD thesis, Stanford University, 1984.
[8] W. J. Dally and B. Towles. Route packets, not wires: On-
chip interconnection networks. In Proceedings of the 38th
Design Automation Conference, pages 684–689, June 2001.
[9] J. Dielissen, A. Radulescu, K. Goossens, and E. Rijpkema.
Concepts and implementation of the phillips network-on-
chip. In Proceedings of the IPSOC’03, 2003.
[10] J. Duato, S. Yalamanchili, and L. Ni. Interconnection Net-
works - an Engineering Approach, chapter 9, pages 475–558.
Morgan Kaufmann, 2003.
[11] T. Felicijan, J. Bainbridge, and S. Furber. An asynchronous
low latency arbiter for quality of service (QoS) applications.
In Proceedings of the 15th International Conference on Mi-
croelectronics, pages 123–126. IEEE, December 2003.
[12] T. Felicijan and S. B. Furber. An asynchronous on-chip net-
work router with quality-of-service (QoS) support. In Pro-
ceedings IEEE International SOC Conference, pages 274–
277. IEEE, 2004.
[13] A. Jantsch and H. Tenhunen. Networks on Chip. Kluwer
Academic Publishers, 2003.
[14] M. MillBerg, E. Nilsson, R. Thid, and A. Jantsch. Guaran-
teed bandwidth using looped containers in temporally dis-
joint networks within the Nostrum network on chip. In Pro-
ceedings of the Design, Automation and Testing in Europe
Conference (DATE’04). IEEE, 2004.
[15] R. Mullins, A. West, and S. Moore. Low-latency virtual-
channel routers for on-chip networks. In Proceedings
of the International Symposium on Computer Architecture
(ISCA2004). IEEE, 2004.
[16] J. Muttersbach, T. Villiger, K. Kaeslin, N. Felber, and
W. Fichtner. Globally-Asynchronous Locally-Synchronous
Architectures to Simplify the Design of On-CHIP Systems.
In Proc. 12th International ASIC/SOC Conference, pages
317–321, Sept. 1999.
[17] L.-S. Peh and W. J. Dally. Flit-reservation ﬂow con-
trol. In Proceedings of HPCA: 6th International Symposium
on High-Performance Computer Architecutre, pages 73–84.
IEEE Computer Soc., 1999.
[18] E. Rijpkema, K. G. W. Goossens, A. Radulescu, J. Dielissen,
J. V. Meerbergen, P. Wielage, and E. Waterlander. Trade offs
in the design of a router with both guaranteed and best-effort
services for networks on chip. In Proceedings of the De-
sign, Automation and Test in Europe Conference (DATE’03),
pages 350–355. IEEE, 2003.
[19] H. Zhang. Service disciplines for guaranteed performance
service in packet-switching networks. Proceedings of the
IEEE, 83:1374–1396, 1995.
[20] H. Zhang and D. Ferrari. Rate-controlled static-priority
queueing. In Proceedings of the Twelfth Annual Joint Con-
ference of the IEEE Computer and Communications Soci-
eties, INFOCOM’93, pages 227–236. IEEE Comnput. Soc.
Press, 1993.
Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC’05) 
1522-8681/05 $20.00 © 2005 IEEE 
