Evaluation of Multiple-Valued Packet Multiplexing Scheme for Network-on-Chip Architecture by 亀山 充隆
Evaluation of Multiple-Valued Packet Multiplexing Scheme for
Network-on-Chip Architecture
Haque Mohammad Munirul†, Tomoaki Hasegawa† and Michitaka Kameyama‡
Graduate School of Information Sciences, Tohoku University
Aoba-yama 6-6-05, Aoba-ku, Sendai 980-8579, Japan
†{topusumi,hase}@kameyama.ecei.tohoku.ac.jp, ‡kameyama@ecei.tohoku.ac.jp
Abstract
This paper presents an evaluation of multiple-valued
packet multiplexing scheme for a Network-on-Chip (NoC)
architecture. In the NoC architecture, data is transferred
from one Processing Element (PE) to another PE through
the routers in the form of a packet. A router, suitable for
both the binary and the multiple-valued packets, is con-
structed using the Multiple Valued Source-Coupled Logic
circuits. A packet is composed of ﬂag, destination PE ad-
dress and data ﬁelds. In the NoC architecture, packets
are generated by microprogram control. In the proposed
scheme, two binary packets are multiplexed if the destina-
tion PE addresses are the same. Based on address match-
ing, packets are transferred from a source PE to a destina-
tion PE autonomously. As a result, the total number of pack-
ets can be reduced. The router is designed using 0.18μm
CMOS design rule. HSPICE simulation results show that
the delay of the router is signiﬁcantly small for high speed
packet transfer. Reduction of microprogram control stor-
age is remarkable in the proposed scheme, because the data
transfer can be done autonomously. The advantage is eval-
uated by simple analysis, and comparison with a conven-
tional pipelined bus architecture is done.
1 Introduction
System-on-Chip (SoC) platforms, integrating a large
number of computational logic and storage blocks on a sin-
gle chip, are already into existence [1]. Because of the on-
chip physical interconnection complexity of such a com-
plex system, the on-chip bus architecture evolved into a
network architecture, namely Network-on-Chip (NoC) [2].
Researchers are mainly focussing on the NoC to meet the
distinctive challenges of providing functionally correct, re-
liable operation for the interacting SoC macro modules.
As the components i.e the macro modules of the SoC
are also becoming complex, the interconnection topology
between Processing Elements (PEs) is also likely to face
similar challenges. In this paper, we present a multiple-
valued packet multiplexing scheme for a NoC architecture
consisting a micronetwork. It has double transmission lines
and routers, where each PE is connected to a router. Two
binary packets are multiplexed into a single multiple-valued
(MV) packet if the destination PE addresses are the same.
The multiplexed packet is transferred between the PEs using
a single transmission line. Thus, the total number of packets
in the micronetwork can be reduced and the throughput can
be increased. A simple routing protocol is used to make the
router circuit simple. The router is designed using Multiple-
Valued Source-Coupled Logic (MVSCL) circuits [3].
This paper describes the VLSI implementation and eval-
uation of the router using 0.18μm CMOS standard design
rule. The contribution of the proposed scheme to reduce
the size of the microprogram control storage is discussed
using the above evaluation result. The condition for area
reduction in comparison with a conventional pipelined bus
architecture is derived mathematically and different cases
are analyzed. Comparison results show that the size of the
microprogram control storage can be reduced remarkably in
all the cases using the proposed scheme when a large num-
ber of PEs are used.
2 NoC architecture
In the proposed NoC architecture [4] the micronetwork
is constructed using double transmission lines (left→right
and right→left) and routers, as shown in Fig. 1. Each
router is directly connected to a Processing Element (PE).
To achieve a simple router design, we introduce a linearly
ordered node number according to the layout distance for
each node on the micronetwork. Each PE and the directly
connected router have the same address. Data is transferred
between the PEs in the form of a packet.
The processing sequence is given by a Control/Data
Proceedings of the 36th International Symposium on Multiple-Valued Logic (ISMVL’06) 
0-7695-2532-6/06 $20.00 © 2006 IEEE 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 21:53 from IEEE Xplore.  Restrictions apply.
R1 R2 RN
PE1 PE2 PEN
P
P Packet R Router
PE Processing Element
Micronetwork
A B
C
Node
Edge
CDFG
Figure 1. NoC architecture
Microprogram Memory (MM)
Control
Signal N
..Control
Signal 2
Control
Signal 1
Control
Signal N
..Control
Signal 2
Control
Signal 1
Control
Signal N
..Control
Signal 2
Control
Signal 1
Control circuit
PE1 PE2 PEN
Address
(log2N bits)
Flag
(2 bit)
NoC Architecture
Header 
information
Control signals
C3C2 C4C1
Pipelined bus architecture
Switch control
bits
… … …
Figure 2. Block diagram of the microprogram
control unit
Flow Graph (CDFG) as shown in Fig. 1, where each node
of the CDFG corresponds to an arithmetic operation and
each edge corresponds to node-to-node packet transfer. For
a given CDFG, let us assume that scheduling and allocation
are done in advance in order to avoid packet collision. Each
node of the CDFG is allocated to a PE. Node-to-node i.e
PE-to-PE packet transfer is done through the micronetwork.
In a source PE, packets are generated by microprogram
control. The microprogram control unit is composed of
a Microprogram Memory (MM) (for control signal stor-
age) and a control circuit (for generating control signals)
as shown in Fig. 2. The packet information is stored in the
MM. Once a PE sends a packet in the micronetwork, the
direction of the packet transfer is determined autonomously
by magnitude comparison of the addresses. If the addresses
of the packet and the router match, then the packet is re-
ceived by the corresponding PE or else the packet is trans-
ferred to an adjacent router in a pipelined manner.
Address
(log2N bits)
DataFlag
(2 bit)
Header
Multiplexed Packet3
Component Packet22
Component Packet11
Invalid Packet0
Packet typeValue
Flag information
Figure 3. Packet format
3 Multiple-valued packet multiplexing
In order to avoid a packet collision, two packets are
scheduled such that they do not reach a router simultane-
ously. However, if the destination PE addresses of the pack-
ets are the same, they are multiplexed in the router. The
packet, which reaches the router earlier, is scheduled to wait
for the other packet to be multiplexed inside the router. The
multiplexing is done by the linear summation of the packets
and as a result the multiplexed packet becomes a multiple-
valued (MV) one. The MV packet is transferred using only
one transmission line. The multiplexing is implemented by
linear summation just by wiring without any active devices
in current-mode logic. However, linear summation of the
two packets must hold the arithmetic and logic information
in the destination PE.
3.1 Packet format
The packet consists of data and header ﬁelds. The header
contains ﬂag and destination address information. The ﬂag
determines the type of the packet as shown in Fig. 3. The
Component Packet1, having the ﬂag value 1, is scheduled
to wait inside the router to be multiplexed. The Component
Packet2, having the ﬂag value 2, is scheduled to be trans-
ferred to the router where the Component Packet1 is wait-
ing. If the packets are multiplexed, the ﬂag value is changed
to 3. The simple packet format leads to simple router de-
sign. A total of (log2 N + 2) bits are required to generate
the header ﬁeld, where N is the total number of the PEs.
3.2 Example of packet multiplexing for
FIR ﬁlter
As an example of parallel processing, let us consider par-
allel FIR ﬁlter operation. The CDFG of an FIR ﬁlter is
shown in Fig. 4(a). Let us consider two operations, O1
and O2 that are to be done in a parallel manner, where O1
and O2 are denoted by white nodes and black nodes, respec-
tively. We assume that, the PEs are arranged in a manner as
shown in Fig. 4(b) and each operation is scheduled to be
Proceedings of the 36th International Symposium on Multiple-Valued Logic (ISMVL’06) 
0-7695-2532-6/06 $20.00 © 2006 IEEE 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 21:53 from IEEE Xplore.  Restrictions apply.
++
+
+
+
+
CDFG of 
FIR filter
O1 O2
Fig. 4(a)  CDFG of FIR filter 
PE1 PE2 PE3 PE4 PE1 PE2 PE3 PE4
Step1
Step2
Step3
Step4
Step5
Step6
Step7
Step8
Step9
Step10
Step11
Step12
 



+
+ +
+
+

+
Transmission lines 
must be increased
 
 


+
+
+
+
+
+


Packet 
multiplexing
No multiplexing Multiplexing
Fig. 4(b)  FIR CDFG mapping example
  + +
PE1 PE2 PE3 PE4
Figure 4. Example of parallel processing
based on packet multiplexing
performed within 10 steps. The left side of Fig. 4(b) shows
a mapping example of the FIR ﬁlter on to the PEs under
time constraint for a non-multiplexing scheme. The dotted
circles on the left ﬁgure represent simultaneous data transfer
which is necessary to satisfy the time constraint. However,
without increasing the transmission lines, this kind of map-
ping is impossible. Any increase in the timing constraint or
the transmission lines will directly lead to the increase of
the MM area. On the other hand, the right side of Fig. 4(b)
shows a mapping example of the FIR ﬁlter on to the PEs
under the time constraint for a packet multiplexing scheme.
The dotted circles on the right ﬁgure show that two packets
having the same destination are multiplexed and thus the
timing constraint is not violated. As there is no requirement
of an extra transmission line, the MM area becomes smaller
than that of a non-multiplexing scheme.
3.3 Router implementation
A router suitable for both the binary and the MV pack-
ets is shown in the block diagram of Fig. 5. The circuit
design of a Multiple-Valued Latch (ML), a Multiple-Valued
Pass Switch (MPS) and a Functional Pass Switch (FPS) of
Fig. 5 are described in [3]. Comparison with a binary router
based on HSPICE simulation was also discussed in [3]. The
router is implemented using 0.18μm CMOS standard de-
Controller
Controller
To PE From PE
Router i
To
Router i+1
From
Router i-1
From
Router i+1
To
Router i-1
Packet
Control signal
AC = Address comparator
FC = Flag comparator
ML = Multiple-valued latch
MPS = Multiple-valued pass switch
FPS = Functional pass switch
ML MPS ML MPS
AC
MPS
FC
ML MPS
MPS
FC
MPSMLMPS FPS
ML MPS
FC
FC
+
+
yes
no Flag f f>0.5
f>2.5
f=2 f=2
f>1.5 f<1.5
From PE
To
PE
Figure 5. Router circuit block diagram
sign rule. The packet header size is 6 bit (2 bits for the ﬂag
and 4 bits for the address ﬁelds). Figure 6 shows the layout
of the router. The router area is 95μm×130μm, which is
used for evaluation purpose in the following section. As a
packet is composed of data and header ﬁelds, both the data-
and header-related circuits are provided in the router. The
Header-Related Circuits (HRC) are address and ﬂag com-
parators, latches and switches. It is evaluated from the de-
signed layout that the HRC area is 102M, where M is the
area of a binary SRAM. Figure 7 shows the simulated wave-
forms of the router. In the ﬁgure, the router input is one bit
of the address and the output is the address comparison re-
sult.
4 Comparison and advantages
This section discusses the advantage of the NoC archi-
tecture based on multiple-valued packet multiplexing over
a pipelined bus architecture using the above evaluation re-
sult.
4.1 Pipelined bus architecture
Conventionally, a pipelined bus architecture is used to
implement a parallel VLSI processor. Let us consider the
Proceedings of the 36th International Symposium on Multiple-Valued Logic (ISMVL’06) 
0-7695-2532-6/06 $20.00 © 2006 IEEE 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 21:53 from IEEE Xplore.  Restrictions apply.
2-bitFlag
4-bitAddress
16-bitData
95um X 130umRouter size
13
0 

m
95 m
HRC
Figure 6. Layout of the router
10ns 10.5ns 11ns 11.5ns
Router 
output
1.02ns
time
0.8V
0.9V
1.0V
1.1V
1.2V
1.3V
1.4V
1.5V
vo
lta
ge
Router 
input
Figure 7. Simulated waveforms of the router
architecture shown in Fig. 8. Each PE is directly con-
nected to a Switch Block (SB), which consists of switches
and pipeline latches. The SBs are arranged in a linear ar-
ray. There are two transmission lines,which are used for
left→right and right→left data transmission. Data is trans-
ferred according to microprogram control in a bit-parallel
manner. All the pipeline latches are synchronized with a
system clock. Scheduling and allocation are determined in
advance so that data collision can be avoided.
4.2 Area comparison between the router
and the SB
The block-diagram of the SB, composed of 4 switches
SW1∼SW4 and 2 pipeline latches, is shown in Fig. 8.
The switches are controlled by the control signals C1, C2,
C3 and C4, which are stored in the Microprogram Memory
(MM) as shown in Fig. 2.
In the NoC architecture a router is used in stead of an
SB. The router has switches, which are controlled by the
address and the ﬂag comparators. The router becomes large
SB1 SB2 SBN
PE1 PE2 PEN
Latch
From PE
C1SW1
SW2 SW3
SW4
C2 C3C4Latch
Switch control signals
Pipelined bus
Figure 8. Pipelined bus architecture
because of HRC in comparison with an SB. If the total num-
ber of the PEs increases, the HRC area also increases. For a
total of N PEs, the total area of the header-related circuits,
AHRC total is given by the following equation:
AHRC total = N × log4 N × 102M. (1)
4.3 MM area comparison
Let the total area of the MM in the pipelined bus archi-
tecture is Adata. The total area of the MM in the NoC ar-
chitecture with Packet Multiplexing is APM . Overall area
reduction is possible if the following equation is satisﬁed:
AHRC total + APM ≤ Adata (2)
Figure 9 shows a single packet transfer between the PEs.
In step 1, a single packet from PE1 is transferred to the
router R1. The initialization timing control is required in
this step. In the next step and onwards, no further timing
control is required. Once the packet is in the micronetwork,
the direction of the packet transfer is controlled by magni-
tude comparison of the addresses. The packet is transmitted
towards the left or the right direction in a pipelined man-
ner. A (log2 N + 2)×N -bit control signal must be stored in
the MM for this purpose. We assume that 1-bit control sig-
nal is stored in a binary SRAM. If the area of the SRAM is
denoted as M , the APM is given by the following equation:
APM = MN(log2 N + 2) (3)
On the other hand, timing control is required for every
single step in the pipelined bus architecture. A 4-bit control
Proceedings of the 36th International Symposium on Multiple-Valued Logic (ISMVL’06) 
0-7695-2532-6/06 $20.00 © 2006 IEEE 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 21:53 from IEEE Xplore.  Restrictions apply.
PE1
R1
PE2
R2
PE3
R3
PEN
RN…
N
PE1
R1
PE2
R2
PE3
R3
PEN
RN…
PE1
R1
PE2
R2
PE3
R3
PEN
RN…
…
2
3
Step
PE1
R1
PE2
R2
PE3
R3
PEN
RN…
1
Initialization timing control is required
N
o 
tim
in
g 
co
nt
ro
l is
 re
qu
ire
d
… …
N
o 
tim
in
g 
co
nt
ro
l is
 re
qu
ire
d
…
Figure 9. Single packet transfer
signal must be stored in the MM for each PE to control the
4 switches of Fig. 8. In the worst case, a data is transferred
from PE1 to PEN in a pipelined manner. Thus, the Adata is
given by the following equation:
Adata = 4MN2 (4)
4.4 Eﬀect of packet multiplexing
In the packet multiplexing scheme, two binary packets
can be multiplexed if the destination PE addresses of both
are the same. Let us assume that the ratio between such
packets and all the packets in the micronetwork is x. The
range of x is 0∼1. Thus, the number of the packets in the
micronetwork is reduced and the throughput is increased.
In other words, the provided transmission lines are able to
ﬁt more packets. In the best case, when x = 1, the capacity
of the transmission lines is doubled.
4.5 Condition for area reduction
Let us assume that in the NoC architecture, the through-
puts of the micronetwork and the pipelined bus are P and
D, respectively. In the Pipelined Bus (PB) architecture, the
number of data on the bus is D. If there is no packet to be
multiplexed in the micronetwork (x = 0), then P becomes
equal to D. In the best case, when x = 1, P becomes twice
of D. Thus, the relation between P and D can be given
with the following equation:
P = (1 + x)×D (5)
Under normalized throughput, the condition for area reduc-
tion of the MM is given using Eqs. (1)∼(5) as follows:
0
0.2
0.4
0.6
0.8
1
x=0 (No Multiplexed Packet)
x=0.5
x=0.75
x=1(All are Multiplexed Packets)
Ar
e
a P
ro
po
se
d/A
re
a P
B
N=125 N=195 N=255
N = Number of PEs
PB = Pipelined Bus
Figure 10. MM area comparisons
⇒102MN(log4N)+(log2N+2)MN ≤ 4MN2×PD
⇒ 4(1+x)MN-104M(log4N)-2M ≥ 0; N≥ 2
Figures 10 shows the MM area comparisons for different
values of x. It is clear from the ﬁgure that signiﬁcant area
reduction is possible using the proposed scheme, especially
when N is large. Even if there is no packet to be multiplexed
(x=0) in the micronetwork, signiﬁcant area reduction is still
possible as shown in the ﬁgure. Table 1 shows the minimum
number of the PEs in each case in order to satisfy the above
condition.
Table 1: Minimum number of the PEs to satisfy the area
reduction condition.
Case x Minimum number of PEs
Case1 0.25 70
Case2 0.50 55
Case3 0.75 45
Case4 1.00 40
5 Conclusion
Network-on-Chip (NoC) is emerging as a viable inter-
connection architecture for SoC platforms. Our target is to
extend the NoC concept to a low granularity level, such as
a functional unit level. We believe that in near future the
interconnection complexity within the functional units will
impose serious bottleneck. Therefore, as a start point, we
consider a very simple intra-chip micronetwork, where the
PEs are horizontally arranged on a linear array. In this pa-
per the evaluation of a multiple-valued packet multiplexing
scheme for the proposed NoC architecture is presented. The
proposed scheme has signiﬁcant advantage in reducing the
area of the microprogram control storage and there by in-
crease parallelism. In future we shall extend our proposed
concept to a relatively complex micronetwork such as mesh
array, octagon etc. We believe that in the coming billion-
Proceedings of the 36th International Symposium on Multiple-Valued Logic (ISMVL’06) 
0-7695-2532-6/06 $20.00 © 2006 IEEE 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 21:53 from IEEE Xplore.  Restrictions apply.
transistor era, the packet multiplexing scheme will open up
a new paradigm in SoC design.
References
[1] P. Magarshack and P.G. Paulin, “System-on-Chip beyond the
NanometerWall”, Proc. Design Automation Conf. (DAC), pp.
419-423 (2003).
[2] L. Benini and G. De Micheli, “Networks on Chips: A New
Soc Paradigm”, IEEE Computer, vol. 35,no. 1, pp. 70-80
(2002).
[3] Tomoaki Haswgawa, Yuya Homma and MichitakA Kame-
yama, “Multiple-Valued VLSI Architecture for Intra-Chip
Packet Data Transfer”, Proc. of the 35th IEEE Intl. Symp. on
Multiple-Valued Logic, pp. 114-119 (2005).
[4] Y.Homma, M.Kameyama,Y.Fujioka and N.Tomabechi,
“VLSI Architecture Based on Packet Data Transfer Scheme
and Its Application”, 2005 IEEE International Symposium
on Circuits and Systems, pp.1786-1789 (2005).
Proceedings of the 36th International Symposium on Multiple-Valued Logic (ISMVL’06) 
0-7695-2532-6/06 $20.00 © 2006 IEEE 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on February 5, 2009 at 21:53 from IEEE Xplore.  Restrictions apply.
