High-Performance, Scalable Optical Network-On-Chip Architectures by Tan, Xianfang
UNLV Theses, Dissertations, Professional Papers, and Capstones
8-1-2013
High-Performance, Scalable Optical Network-On-
Chip Architectures
Xianfang Tan
University of Nevada, Las Vegas, yanshu08@gmail.com
Follow this and additional works at: http://digitalscholarship.unlv.edu/thesesdissertations
Part of the Computer and Systems Architecture Commons, Electrical and Computer Engineering
Commons, and the Optics Commons
This Dissertation is brought to you for free and open access by Digital Scholarship@UNLV. It has been accepted for inclusion in UNLV Theses,
Dissertations, Professional Papers, and Capstones by an authorized administrator of Digital Scholarship@UNLV. For more information, please contact
digitalscholarship@unlv.edu.
Repository Citation
Tan, Xianfang, "High-Performance, Scalable Optical Network-On-Chip Architectures" (2013). UNLV Theses, Dissertations, Professional
Papers, and Capstones. 1955.
http://digitalscholarship.unlv.edu/thesesdissertations/1955
  
HIGH-PERFORMANCE, SCALABLE OPTICAL  
NETWORK-ON-CHIP ARCHITECTURES 
 
By 
 
 
Xianfang Tan 
 
 
Bachelor of Science in Electrical Engineering 
Yanshan University, China 
1997 
 
 
Master of Science in Electrical Engineering 
Tianjin University, China 
2003 
 
 
Master of Science in Mechanical Engineering 
University of Nevada, Las Vegas (UNLV) 
2005 
 
 
A dissertation submitted in partial fulfillment of the requirements for the 
 
 
Doctor of Philosophy in Electrical Engineering 
 
 
Department of Electrical and Computer Engineering 
Howard R. Hughes College of Engineering 
Graduate College 
 
 
University of Nevada, Las Vegas 
August 2013 
ii 
 
 
 
 
THE GRADUATE COLLEGE 
 
 
We recommend the dissertation prepared under our supervision by 
 
Xianfang Tan  
 
entitled 
 
High-Performance, Scalable Optical Network-On-Chip Architectures 
 
 
be accepted in partial fulfillment of the requirements for the degree of 
 
 
 
Doctor of Philosophy in Electrical Engineering 
Department of Electrical and Computer Engineering 
 
Mei Yang, Ph.D., Committee Chair 
 
Yingtao Jiang, Ph.D., Committee Member 
 
Shahram Latifi, P.D., Committee Member 
 
Amei Amei, Ph.D., Graduate College Representative 
 
Kathryn Hausbeck Korgan, Ph.D., Interim Dean of the Graduate College 
 
August 2013 
 
iii 
ABSTRACT 
 
By 
 
Xianfang Tan 
 
Dr. Mei Yang, Examination Committee Chair 
 
Associate Professor of Electrical and Computer Engineering 
University of Nevada, Las Vegas 
 
The rapid advance of technology enables a large number of processing cores to be 
integrated into a single chip which is called a Chip Multiprocessor (CMP) or a 
Multiprocessor System-on-Chip (MPSoC) design. The on-chip interconnection network, 
which is the communication infrastructure for these processing cores, plays a central role 
in a many-core system. With the continuously increasing complexity of many-core 
systems, traditional metallic wired electronic networks-on-chip (NoC) became a 
bottleneck because of the unbearable latency in data transmission and extremely high 
energy consumption on chip. Optical networks-on-chip (ONoC) has been proposed as a 
promising alternative paradigm for electronic NoC with the benefits of optical signaling 
communication such as extremely high bandwidth, negligible latency, and low power 
consumption. This dissertation focus on the design of high-performance and scalable 
ONoC architectures and the contributions are highlighted as follow: 
iv 
 A micro-ring resonator (MRR)-based Generic Wavelength-routed Optical Router 
(GWOR) is proposed. A method for developing any sized GWOR is introduced. 
GWOR is a scalable non-blocking ONoC architecture with simple structure, low cost 
and high power efficiency compared to existing ONoC designs. 
 To expand the bandwidth and improve the fault tolerance of the GWOR, a redundant 
GWOR architecture is designed by cascading different type of GWORs into one 
network.  
 The redundant GWOR built with MRR-based comb switches is proposed. Comb 
switches can expand the bandwidth while keep the topology of GWOR unchanged by 
replacing the general MRRs with comb switches.  
 A butterfly fat tree (BFT)-based hybrid optoelectronic NoC (HONoC) architecture is 
developed in which GWORs are used for global communication and electronic 
routers are used for local communication. The proposed HONoC uses less numbers of 
electronic routers and links than its counterpart of electronic BFT-based NoC. It takes 
the advantages of GWOR in optical communication and BFT in non-uniform traffic 
communication and three-dimension (3D) implementation.  
 A cycle-accurate NoC simulator is developed to evaluate the performance of 
proposed HONoC architectures. It is a comprehensive platform that can simulate both 
electronic and optical NoCs. Different size HONoC architectures are evaluated in 
terms of throughput, latency and energy dissipation. Simulation results confirm that 
HONoC achieves good network performance with lower power consumption. 
v 
TABLE OF CONTENTS
ABSTRACT ...................................................................................................................... III 
TABLE OF CONTENTS ................................................................................................... V 
LIST OF TABLES ........................................................................................................... VII 
LIST OF FIGURES ....................................................................................................... VIII 
CHAPTER 1. INTRODUCTION ................................................................................. 1 
1.1 Why Optical Interconnects? ......................................................................................................... 1 
1.2 Overview of Microring Resonators (MRR)-based Optical NoC ................................................. 2 
1.2.1 Microring Resonator (MRR)-based Optical Switches ......................................................... 3 
1.2.2 Comb Switch ........................................................................................................................ 4 
1.2.3 MRR-based Optical Interconnect Network .......................................................................... 5 
1.3 Optical Networks-on-Chip ........................................................................................................... 8 
1.4 Outline of This Dissertation Research ....................................................................................... 10 
CHAPTER 2. A GENERIC OPTICAL ROUTER DESIGN FOR OPTICAL NOC.. 12 
2.1 Router Architecture OF 4×4 GWOR ......................................................................................... 12 
2.2 GENERALIZATION OF GWOR .............................................................................................. 16 
2.2.1 Generalization of GWOR to Even-Numbered Inputs/Outputs ........................................... 16 
2.2.2 Generalization of GWOR to Odd-Numbered Inputs/Outputs ............................................ 22 
2.3 Comparison and Analysis .......................................................................................................... 25 
2.4 Summary .................................................................................................................................... 28 
CHAPTER 3. REDUNDANT GWOR AND COMB SWITHING GWOR 
ARCHITECTURES .......................................................................................................... 29 
3.1 Redundant GWOR Structure ..................................................................................................... 29 
3.2 ONoC Built with Comb Switches .............................................................................................. 33 
3.3 Determination of Comb MRR Size ............................................................................................ 35 
3.4 Summary .................................................................................................................................... 38 
vi 
CHAPTER 4. HYBRID OPTOELECTRONIC NETWORK-ON-CHIP 
ARCHITECTURE ............................................................................................................ 39 
4.1 HONoC Architecture ................................................................................................................. 39 
4.1.1 Architecture of 16-Core BFT-GWOR-Based HONoC ...................................................... 40 
4.1.2 Architecture of 64-Core BFT-GWOR-Based HONoC ...................................................... 42 
4.1.3 Architecture of 256-Core BFT-GWOR-Based HONoC .................................................... 43 
4.1.4 Properties of HONoC Architecture .................................................................................... 43 
4.2 HONoC Routers ......................................................................................................................... 45 
4.2.1 Electronic Router ............................................................................................................... 45 
4.2.2 Hybrid Electrical-Optical Router ....................................................................................... 46 
4.3 Routing Algorithms.................................................................................................................... 48 
4.3.1 Routing Algorithm of Electronic Router ............................................................................ 48 
4.3.2 Routing Algorithm of Hybrid Router ................................................................................. 49 
4.4 Summary .................................................................................................................................... 50 
CHAPTER 5. SIMULATION AND PERFORMANCE EVALUATION .................. 51 
5.1 Simulator Development ............................................................................................................. 51 
5.2 Simulation setup ........................................................................................................................ 52 
5.2.1 Topologies and Layouts ..................................................................................................... 52 
5.2.2 Simulation Parameters........................................................................................................ 54 
5.3 Performance Evaluation ............................................................................................................. 56 
5.3.1 Throughput ......................................................................................................................... 57 
5.3.2 Latency ............................................................................................................................... 60 
5.3.3 Energy Dissipation ............................................................................................................. 64 
5.4 Summary .................................................................................................................................... 67 
CHAPTER 6. CONCLUSION AND FUTURE WORK ............................................ 68 
6.1 Conclusion ................................................................................................................................. 68 
6.2 Future Work ............................................................................................................................... 69 
REFERENCES ................................................................................................................. 70 
BIBLIOGRAPHY ............................................................................................................. 73 
CURRICULUM VITAE ................................................................................................... 74 
  
vii 
LIST OF TABLES 
Tab. 1. Routing wavelength assignment of the 4×4 GWOR ..................................... 15 
Tab. 2. Wavelength assignment of 8×8 GWOR ........................................................ 19 
Tab. 3. Wavelength assignment of MRR-based 5×5 GWOR router ......................... 23 
Tab. 4. Number of MRRs used in different size routers ............................................ 26 
Tab. 5. Power loss estimation of different size routers.............................................. 27 
Tab. 6. Routing wavelength assignment of the 4×4 4-RGWORs ............................. 32 
Tab. 7. Routing wavelength assignment of the 5×5 4-RGWORs ............................. 32 
Tab. 8. Wavelength assignment of 4×4 general MRR-based GWOR ....................... 34 
Tab. 9. Wavelength assignment of 4×4 4-GWOR ..................................................... 35 
Tab. 10. Example of resonance wavelength set for selected reference frequency. ..... 38 
Tab. 11. Number of routers and links used by HONoCs ............................................. 44 
Tab. 12. Comparison of electronic routers and electronic links in different NoCs ..... 44 
Tab. 13. Comparison of average number of hops in different NoCs ........................... 44 
Tab. 14. Comparison of node degree of routers in different NoCs ............................. 45 
Tab. 15. Parameters of electronic routers .................................................................... 55 
Tab. 16. Parameters of serializers/deserializers ........................................................... 56 
Tab. 17. Modulator and photodector Parameters ........................................................ 56 
Tab. 18. Energy dissipation in routers ......................................................................... 56 
  
viii 
LIST OF FIGURES 
Fig. 1. An on-chip optical signal path (from the source to the destination PEs). ....... 3 
Fig. 2. (a) Basic operation of an MRR, and (b) a 2×2 optical switch. ........................ 4 
Fig. 3. MRR-based comb switch. ............................................................................... 5 
Fig. 4. 4×4 hitless optical router. ................................................................................ 6 
Fig. 5. WRON (a) 4×4; (b) 5×5. ................................................................................. 7 
Fig. 6. Basic 4×4 GWOR structures. ........................................................................ 14 
Fig. 7. Type I N×N GWOR built on Type I (N-2)×(N-2) GWOR. ........................... 17 
Fig. 8. Structure of 8×8 GWORs. ............................................................................. 20 
Fig. 9. Structure of Type I 5×5 GWOR. ................................................................... 24 
Fig. 10. Schematic of an N×N M-RGWOR structure. ................................................ 31 
Fig. 11. Structure of the 4×4 4-RGWOR. ................................................................... 31 
Fig. 12. Structures of basic/WDM-enabled GWOR. .................................................. 34 
Fig. 13. Butterfly fat tree structure. ............................................................................ 40 
Fig. 14. Architecture of 16-core HONoC. .................................................................. 41 
Fig. 15. Architecture of 64-core HONoC. .................................................................. 42 
Fig. 16. Architecture of 256-core HONoC. ................................................................ 43 
Fig. 17. Head flit format. ............................................................................................ 45 
Fig. 18. Six bidirectional port electrical router architecture. ...................................... 46 
Fig. 19. Four electrical three optical bidirectional hybrid router architecture. ........... 47 
Fig. 20. Simulator structure. ....................................................................................... 52 
Fig. 21. Simulation Flow. ........................................................................................... 52 
Fig. 22. Mesh and CMesh topologies. ........................................................................ 53 
Fig. 23. Layout of 16-core hybrid GWOR-BFT-based NoC ...................................... 54 
Fig. 24. Layout of 64-core hybrid GWOR-BFT-based NoC ...................................... 54 
Fig. 25. Average throughput of 16-core NoCs. ........................................................... 58 
Fig. 26. Average throughput of 64-core NoCs. ........................................................... 59 
Fig. 27. Average throughput of 256-core NoCs (75% localized traffic pattern). ....... 60 
Fig. 28. Latency of 16-core architectures (uniform traffic pattern). ........................... 62 
Fig. 29. Latency of received packets in 64-core NoCs. .............................................. 63 
Fig. 30. Latency of received packets in 256-core NoCs (non-uniform traffic). ......... 64 
Fig. 31. Normalized energy dissipation of 16-core NoCs (uniform traffic) ............... 66 
Fig. 32. Normalized energy dissipation of 64-core NoCs (uniform traffic) ............... 66 
 
 1 
CHAPTER 1. INTRODUCTION 
1.1 Why Optical Interconnects? 
The rapid advance of technology continues to push up transistor integration capacity, 
which has enabled a large number of processing cores to be integrated into a Chip 
Multiprocessor (CMP) or a Multiprocessor System-on-Chip (MPSoC) design. The on-
chip interconnection network, which is the communication infrastructure of these many 
cores, plays a central role for the performance of a many-core system [1]. NoC designs 
are facing the ever increasing challenges of meeting the large bandwidth and stringent 
latency requirements while not exceeding tight power budgets [2]. The continuously 
shrinking feature sizes, higher clock frequencies, and the simultaneous growth in 
complexity have made electronic networks-on-chip (NoC) a formidable task to provide 
scalable and power-efficient on-chip communication.  
Optical interconnects on silicon have been long considered as a promising candidate 
to overcome the limitations of electrical interconnects. The recent advances in nanoscale 
silicon photonics and development of silicon photonic devices, such as low-loss 
waveguides[3, 4], high-speed low-power-consumption modulators with up to 10 Gb/s 
speed [5-7], hybrid-integrated evanescent lasers [8, 9] [10], and gigahertz-bandwidth 
SiGe photodetectors [2] [11] have made optical NoC a promising and viable solution to 
meet the ever increasing chip-level interconnect challenges.  
Integrating optical NoC into future communication intensive, many-core-based CMP 
and MPSoC designs can be transformative in the way how chips are designed and 
internally interconnected, as they bring in unparalleled benefits as summarized in the 
 2 
following: 
 Architecture advantage: Optical NoC could deliver performance-per-watt scaling by 
offering Terabits-per-second (Tbps) scale bandwidth with near speed-of-light 
transmission. In addition, the low loss off-chip optical interconnects enable the 
seamless scaling of the optical communication infrastructure to be incorporated into 
multi/many-chip systems. 
 Design and manufacturing simplification: In optical NoC, there is no concern of 
timing skew in signal, predictability of the timing of signals, and precision of the 
timing of the clock signal. These photonic devices/modules thus can be designed and 
manufactured separately from the processing elements, and integrated with the 
electronics at a later appropriate manufacturing stage.  
 Other physical benefits: Optical interconnections relax constraints on thermal 
dissipation and sensitivity, signal interference and distortion. 
1.2 Overview of Microring Resonators (MRR)-based Optical NoC  
A diagram of an on-chip optical NoC system is shown in Fig. 1. The system consists 
of four primary optical elements: light source (either off-chip or on-chip laser, or LED), 
an optical modulator and its driver for electrical-to-optical (E/O) conversion, a photonic 
interconnection network, and a photodetector and its associated transimpedance amplifier 
(TIA) for optical-to-electrical (O/E) conversion.  
Light generated from a light source will be coupled into an on-chip waveguide. The 
waveguide allows the light to past a series of modulators located at the source node, each 
modulating the data into optical signals. The modulated light passes through the on-chip 
photonic interconnection network before it arrives at the destination node. Then the light 
 3 
will be dropped from the waveguide by a filter and received by a series of photodetectors. 
Processing Element 
(PE)
Optical Modulator + 
Driver Photonic 
Interconnection 
Network
(optical waveguides + 
optical switches) Processing Element 
(PE)
Photodetector + TIALaser
On-chip
 
Fig. 1. An on-chip optical signal path (from the source to the destination PEs). 
The heart of a photonic NoC is an on-chip photonic interconnection network, which is 
composed of silicon waveguides and optical routers [12]. An optical router, as its name 
suggests, it optically routes data packets between a set of input and output ports. It is 
generally built upon waveguides and optical switches. Of the many available optical 
switches, micro-ring resonator (MRR)-based optical switches are typically preferred due 
to their ultra-compact size (3-10µm diameter), simple-mode resonances, and ease of 
phase-matching between an MRR and its coupling waveguides [5]. 
1.2.1 Microring Resonator (MRR)-based Optical Switches 
The basic operation of an MRR is shown in Fig. 2(a). The input light signal is 
coupled to the drop port only if the input wavelength 𝜆𝑖 matches one of the resonance 
wavelengths of the MRR, say 𝜆𝑟, which satisfies the following equation: 
 𝜆𝑖             (1) 
where  is an integer,      is the effective index of the optical mode, and   is the length 
of the resonating cavity [12]. Otherwise, the input signal will simply pass to the through 
port.  
Fig. 2(b) shows a 2×2 optical switch constructed by two identical MRRs. If the input 
wavelength 𝜆𝑖  𝜆𝑟   𝜆𝑖 passes through the switch on the straight direction; if 𝜆𝑖  𝜆𝑟, the 
 4 
signal will pass through the switch on the cross direction. By using MRRs of different 
sizes or tuning the refractive index through either thermo-optic (TO) [13] or electro-optic 
(EO) effect [12], an incoming optical signal can be switched to the destined output port 
solely based on the signal wavelength.  
λr
input
λi
drop
through
λr
λr
λi
straight
cross
(a) (b)
λi=λr
λi≠λr
 
Fig. 2.  (a) Basic operation of an MRR, and (b) a 2×2 optical switch. 
1.2.2 Comb Switch  
For a given MRR of size L, there may exist a set of resonance wavelengths (λ0, λ1, 
λ2, …) corresponding to different integers mi (i=0, 1, 2, ….) that satisfy the resonance 
condition of Eqn. (1), i.e.,  
  𝜆    𝜆              (2) 
A MRR-based comb switch (as shown in Fig. 3) is such a special MRR that satisfies 
Eqn. (2), where Ʌ is the set of resonance wavelengths. Compared to basic MRR (as 
shown in Fig. 2(a)), a MRR-based comb switch usually has larger size and higher power 
loss. The size of the MRR in a comb switch shall be carefully chosen so that the desired 
set of wavelengths can be dropped. A broadband 1×2 optical comb switch is presented in 
[14] with a diameter of 200μm for 40 channels in C-band. However, it is not clear how to 
determine the size of a comb switch for a given number of wavelengths.  
 5 
 Drop
λiϵΛ
Through 
λi ÏΛΛ
Input λi
 
Fig. 3. MRR-based comb switch. 
1.2.3 MRR-based Optical Interconnect Network 
According to the type of MRR used in switches, existing optical interconnection 
networks can be classified into two classes [12]: i) active networks using with TO or EO 
tunable MRRs, and ii) passive networks using MRRs with fixed wavelength assignment. 
In existing on-chip active optical networks, the first fabricated MRR-based 8×8 
optical router [12] is a TO-tuned active router made of glass which is developed for the 
sole purpose of experimental demonstration. A 5×5 EO-tuned silicon photonic crossbar 
reported in [12] is an improved crossbar optical router that eliminates the self-routing of 
the traditional crossbar structures report in [15]. An optimized 5×5 TO-tuned optical 
crossbar router proposed in [16] aligns the ports to the corresponding directions as 
required in regular NoC topologies (e.g. mesh/torus), in which four waveguides are bent 
and the internal structure of the traditional crossbar is reoriented to reduce the number of 
waveguide crossings; however, the power loss on each routing path will be elevated due 
to extra waveguide bending. A 4×4 TO-tuned bidirectional hitless router (shown in Fig. 4) 
is fabricated and reported in [13], which however, it is not scalable to build larger-sized 
routers. 
 6 
 
Fig. 4. 4×4 hitless optical router. 
As data routing in active routers is realized by tuning MRRs, extra circuit is required 
for controlling and tuning each MRR by either TO or EO effect. In terms of effectiveness, 
TO-based tuning can achieve a wavelength tuning range of 20nm [5], at a cost of extra 
heating/cooling time (in microseconds) and higher power consumption (0.25nm/mW) 
[13], while the EO-based tuning currently can only reach wavelength tuning range of 
2nm [17] with driving power ranging from 18 to 105µW [18].  
As for passive routers, wavelength-based routing is used and the routing patterns are 
determined at design time. The λ-router [19], the first passive router ever proposed for 
NoC, is constructed by cascading MRR-based switches. But λ-router is only applicable to 
networks with even-numbered input/output ports. This parity problem is solved by a 
more general wavelength routed optical network (WRON) router reported in [20] [21]. 
An N×N WRON router (shown in Fig. 5) is composed of N stages of 2×2 MRR-based 
 7 
switches with distinctive pre-assigned resonant wavelengths. Another passive optical 
router is the oblivious bidirectional 5×5 cross-grid wavelength-router presented in [17], 
which is compact in size, but the high design complexity makes it difficult to be extended 
to construct a larger sized network. In [22], a U-shape passive optical router is introduced 
which in fact implements a wavelength-routed crossbar structure. The passive optical 
routers listed above are non-blocking. In [23], another cascaded N×N wavelength-routed 
network is proposed, which, unfortunately, is an incomplete crossbar as not all input 
signals can be routed to any of the outputs (e.g., for the given 3×3 router, there is no 
routing path between I2 and O3). A 4×4 64-wavelength optical crossbar is proposed in 
[24], which is a blocking network. For instance, simultaneous data transmission from 
I1O0 and I3O1 will cause a routing confliction.  
S2
w4
S3
S4
w4
w1
w2
w2
w3
D1
D2
D3
D4
Source
Nodes
Stage 1
S1
Stage 2 Stage 3 Stage 4
Destination
Nodes
 
S1
S2
w1
S3
S4
w1
w2
w3
w3
w4
D1
D2
D3
D4
S5 D5
w2 w4
w5
w5
Stage 1 Stage 2 Stage 3 Stage 4
Destination
Nodes
Stage 5
 
  (a)              (b) 
Fig. 5. WRON (a) 4×4; (b) 5×5. 
Active optical networks in general provide higher bandwidths[13, 25], but require 
high-speed electrical control circuits to be integrated with the photonic devices, which 
will cause extra power consumption. Furthermore, it is very difficult to realize the 
integration with current technology. Passive networks, on the other hand, can route data 
with fixed wavelength and do not require extra control circuits and tend to have better 
 8 
power performance. However, poor scalability and high design complexity are common 
problems of existing passive optical network designs. 
1.3 Optical Networks-on-Chip 
From the point of signaling, optical Networks-on-Chip can be classified as all optical 
NoC and hybrid optoelectronic NoC. All optical NoC include two types of topologies: 1) 
direct networks, such as mesh [16] , torus [17, 26], where each processing core is 
connected with a dedicated optical router; 2) indirect networks, such as optical crossbar, 
Clos [22] network, where the number of MRRs used grow gradually with the network 
size. All optical NoC is not suitable for supporting large number of cores considering the 
cost and limitation of available lasers [13] [27]. In addition, communication locality is 
poorly supported by these networks. Hybrid optoelectronic NoC provides a more 
practical solution by using optical signaling for long distance communication while 
electrical signaling for local communication. As such, it can take the advantages of 
optical signaling in bandwidth and power consumption while keep the low cost and 
flexibility of electrical signaling.  
Many hybrid NoC architectures have been proposed in the past few years. They can 
be classified into two major categories according to the optical network topologies used: 
i) hybrid direct networks including mesh-based [24], torus-based [28], 2-D folded torus-
based [29], and 3-D mesh-based [30] ; ii) hybrid indirect networks including crossbar-
based, tree-based, and clos-based. They can also be classified based on the switching 
methodologies as: circuit switching, packet switching, or wormhole switching.  
The hybrid mesh/torus optical NoCs [24, 28] employ an optical mesh/torus network 
on top of the electronic cluster routers. The folded-torus network [29] uses an optical 
 9 
torus for transmitting data messages and a companion electronic torus for transmitting 
control messages. The 3-D mesh optical NoC is a through-silicon via (TSV) based two-
layer architecture, in which optical routers are located on the top layer while the electrical 
control network is on the bottom layer, and through-silicon via (TSV). This is a packet 
switching design and before sending data, an optical path has to be set up. A 
reconfigurable circuit switching hybrid optical NoC is proposed in [31]. Another circuit 
switching NoC with wavelength-selective spatial routing (WSST) is proposed in [32]. All 
these direct NoC adopt circuit switching, which is not suitable for frequent transmissions 
of small sized packets (e.g., 64-byte cache line). 
Several hybrid NoCs employ large size optical crossbars, for example, Corona [33] 
uses a 64×64 optical crossbar in U-shape for a 64-cluster 256-core network. However, 
optical tokens are needed for routing and arbitration in resolving the confliction on the 
optical crossbar in this architecture. Large size crossbar is also used by group 
optoelectronic network to interconnect processing cores and DRAM in [34]. Firefly [35], 
a hybrid hierarchical network consists of clusters of processing cores that are connected 
by electronic concentrated mesh (CMesh) networks while inter-cluster communication is 
handled by multiple optical crossbars connecting to all CMesh routers. To reduce the size 
of crossbars, a 3D clustered multi-optical-layer hybrid NoC is proposed in [36], in which 
each optical layer is a smaller size crossbar (16×16) dedicated to interconnecting a group 
of clusters in the electronic network. Another 3D stacked optical NoC has similar 
structure but adds configurability to better support inter-group communication are 
proposed in [37]. All these networks need arbitration to resolve confliction on optical 
crossbars. Large size crossbars also suffer from significant power loss on the waveguide 
 10 
(around 9 cm for a 20×20mm die [38]) and micro-ring scattering loss. In addition, the 
scalability and footprint are questionable due to the large number of micro-rings needed 
in crossbars. 
The Clos NoC [22] adopting a 3-stage optoelectronic Clos network in U-shape layout 
uses much more waveguides but less number of MRRs than global crossbar counterparts. 
In [24], a hierarchical NoC is proposed by dividing a large size network into smaller 
groups, with each group connected by an electronic mesh network and inter-group 
communication is handled by multiple 4×4 optical crossbars. For a 256-core network, 
each group is an 8×8 mesh and 64 4×4 optical routers are needed. Similarly, a fat-tree 
based optical NoC is proposed in [38] that uses multiple layers of optical routers to 
interconnect clusters of cores. This NoC architecture adopts circuit switching which 
requires the setup of an optical circuit before data transmission.  
With smaller size crossbar routers, the above hybrids NoCs have better scalability 
than large size crossbar counterparts. However, they neglect the possible difficulties in 
implementations, such as integration of optoelectronic components, circuit setup latency, 
etc. [39]. How to efficiently combine optical network and electrical network into one 
design? In this dissertation, we explore more practical and flexible optical NoC 
architectures. 
1.4 Outline of This Dissertation Research 
This dissertation is focused on the study of high-performance and scalable optical 
NoC architectures and their performance evaluation. As reviewed in Section 1.3, passive 
optical interconnection networks (OIN) do not require extra control circuit and thus have 
the advantage in power consumption and implementations. We focus on designing 
 11 
efficient passive OINs and developing hybrid optical NoC based on these OINs. The 
major work is divided into four parts, which are organized in four chapters, respectively.  
In Chapter 2, we propose the Generic Wavelength-Routed Optical Router (GWOR) to 
address the scalability and complexity of existing passive topology. We show that GWOR 
is a non-blocking, scalable optical router with the least complexity among existing 
passive optical routers. The wavelength assignment scheme for GWOR is also devised.  
In Chapter 3, to further expand the bandwidth and improve the fault tolerance of the 
proposed GWOR, a redundant GWOR architecture is proposed by cascading different 
type GWORs. We also propose the MRR-based comb switches to reduce the complexity 
of this architecture. A solution is also presented to choose the set of resonance 
wavelength and optimize the comb switch size. 
In Chapter 4, we propose the hybrid optoelectronic NoC (HONoC) architectures 
based on the butterfly fat tree (BFT) topology [40] using the GWORs. In the proposed 
HONoCs, GWORs are used on the top levels for global inter-group communication, 
which simplifies the BFT topology by reducing the number of electronic routers and 
metallic links and thus improves the network and power performance. 
Chapter 5 is devoted for performance evaluation of the proposed HONoC 
architectures and implementation issues. To evaluate the performance of proposed optical 
and hybrid NoCs, a cycle-accurate NoC simulation platform is developed. The 
performance of the proposed HONoC architectures is evaluated in terms of throughput, 
latency and energy dissipation and compared against two major NoC architectures. 
 12 
CHAPTER 2. A GENERIC OPTICAL ROUTER DESIGN FOR OPTICAL NOC  
In this chapter, we propose a micro-ring-resonator (MRR)-based, scalable, and non-
blocking passive optical router design, namely the generic wavelength-routed optical 
router (GWOR). We first introduce the four 4×4 GWOR router structures and then show 
how to construct GWORs of larger sizes by using the proposed 4×4 GWORs as the 
primitive building blocks. The number of MRRs used in the proposed GWOR is the least 
among the existing passive optical router designs for the same network size. The power 
loss experienced on GWORs is also lower than other comparative designs.  
2.1 Router Architecture OF 4×4 GWOR 
In this section, the 4×4 GWOR is first introduced as the primitive building blocks to 
build GWORs of larger sizes. As shown in Fig. 6, a 4×4 GWOR consists of two 
horizontal and two vertical waveguides. For each waveguide, it has one and only one 
intersection with each one of the waveguides on the orthogonal direction. As so, the four 
waveguides together form a primitive checkerboard-shaped cell. The following labeling 
conventions are followed when labeling the ports of the 4×4 GWOR: 
 The two ports located at each of the four directions (i.e., north (N), west (W), south 
(S), and east (E) of the primitive checkerboard-shaped cell, are grouped and labeled 
as one bidirectional port such as Pi(Ii,Oi), where I (O) represents an input (output) 
port. In a 4×4 GWOR, the input and output are labeled in the same order for each bi-
directional port, either clockwise or counterclockwise.  
 Each waveguide is dedicated for the direct connection between an input-output pair, 
denoted as IiO3-i (i=0, 1, 2, 3). 
 13 
Following the labeling convention above, one can see that a total of four types of 4×4 
GWOR can be formed. Fig. 6(a) shows the structure of Type I 4×4 GWOR. The four 
ports are labeled, starting from the north side and traversing in counterclockwise 
direction, as P0(I0,O0) to P1(I1,O1). As a consequence of direct connections of ports, the 
ports at the south and west sides are labeled as P3(I3,O3) and P2(I2,O2), respectively. At 
each direction, the input and output ports are always ordered in counterclockwise manner.  
Fig. 6(b), (c) and (d) show the structures of the Type II, III and IV 4×4 GWORs, 
respectively. The distinction between the Type I and II 4×4 GWORs is that in Type II 
GWOR, the ports are numbered starting from the south side as P0(I0, O0), P1(I1, O1), 
P3(I3, O3), and P2(I2, O2) in clockwise manner. The Type III 4×4 GWOR has the same 
order of ports in directions as the Type I 4×4 GWOR but differs from Type I in that the 
input and output ports at each direction are ordered in clockwise manner instead. 
Similarly, Type IV 4×4 GWOR has the same order of ports in directions as the Type II 
4×4 GWOR but differs from Type II in that the input and output ports at each direction 
are ordered in clockwise manner instead. 
 14 
I1
O1
O0 I0
λ1
λ1
λ2
λ2
λ1
λ1
λ2
λ2
O3I3
I2
O2
S
W
N
E
I1
O1
O0I0
λ1
λ1
λ2
λ2
λ1
λ1
λ2
λ2
O3 I3
I2
O2
S
W
N
E
I1
O1
O0I0
λ1
λ1
λ2
λ2
λ1
λ1
λ2
λ2
O3 I3
I2
O2
S
W
N
E
I1
O1
O0 I0
λ1
λ1
λ2
λ2
λ1
λ1
λ2
λ2
O3I3
I2
O2
S
W
N
E
(a) Type I (b) Type II
(c) Type III (d) Type IV
 
Fig. 6. Basic 4×4 GWOR structures. 
For GWORs, non-blocking routing is realized by assigning MRRs to the appropriate 
corners of the intersections of waveguides so that all the input light signals can be 
directed to their designated output ports. Assuming no self-communication is allowed 
(i.e., input signal from Ii will not go to Oi), there exist 12 possible input-output pairs in a 
4×4 GWOR. As four pairs (IiO3-i) are directly connected by the four waveguides, at 
least eight MRRs are needed to realize the routing for the remaining eight pairs.  
To implement wavelength-based routing, MRRs should be assigned with different 
 15 
sizes (or resonant wavelengths). The principle of wavelength assignment is to assign the 
minimal number of wavelengths to the input-output pairs so that any input-output 
communication can be realized without causing a confliction. Given the input-output pair 
(IiOj), the input wavelength C(i, j) is determined by 
















others
ij
ji
jiji
ji
jiC
ij
i
j
)3,mod(
)3,23mod(
)3,2mod(
3
30,0
30,3
,3
),(




    (3) 
Tab. 1 lists the wavelength assigned to each input-output pair of 4×4 GWORs. To 
minimize the number of wavelengths, it can be seen from Table I that whenever possible, 
wavelengths are shared, i.e., C(i, j)=C(3-i, 3-j). It shall be noted that the same set of 
wavelengths will cover all the rows and the columns in Tab. 1.  
Tab. 1. Routing wavelength assignment of the 4×4 GWOR 
 O0 O1 O2 O3 
I0 -  λ1 λ2 λ3 
I1 λ1 - λ3 λ2 
I2 λ2 λ3 - λ1 
I3 λ3 λ2 λ1 - 
Note: "-" stands for not applicable. 
According to the wavelength assignment in Table I, to route the light signals from Ii 
to Oj and from I3-i to O3-j (i, j =0, 1, 2 or 3, i≠j, and i+j≠3), two identical MRRs 
corresponding to the assigned input wavelength are placed at the corners of the 
intersection where the light signals shall make turn to reach their designated output ports. 
As shown in Fig. 6, only two types of MRRs with a total of 8 MRRs are needed for each 
type of the 4×4 GWORs. 
 16 
The four types of 4×4 GWORs shown in Fig. 6 are isomorphic in terms of routing 
since they share the same routing wavelength assignment. It is easy to verify that the 4×4 
GWOR is non-blocking for any unicast communication using the input wavelength 
assignment as tabulated in Table I. In addition, multicasting and broadcasting can also be 
supported by simply multiplexing multiple input wavelengths at any input port.  
2.2 GENERALIZATION OF GWOR 
2.2.1 Generalization of GWOR to Even-Numbered Inputs/Outputs 
The four types of 4×4 GWORs introduced in Section 2.1 are used as the primitive 
building blocks to create larger sized GWORs. Correspondingly, four types of NN 
GWORs (where N>4) can be built based on each type of 4×4 GWOR. In this section, we 
will consider the even-numbered NN GWORs (where N=2n and n>2). In an NN 
GWOR, N waveguides are used and each waveguide is dedicated for the direct 
connection for one pair of input-output ports, denoted as IiOj (where i, j= 0, 1, ..., N-1, 
and i+j=N-1). The N waveguides are partitioned into N/2 groups and each group consists 
of two parallel waveguides. Taking the type I NN GWOR as an example, the following 
rules are followed to interconnect the N/2 groups of waveguides: 
 The first two groups of waveguides are laid out as in the 4×4 GWOR (Fig. 2(a)). The 
vertical group consists of two parallel waveguides I0ON-1 and IN-1O0, while the 
horizontal group consists of I1ON-2 and IN-2O1. The intersection of the two 
waveguide groups forms a primitive 4×4 GWOR. Assume the waveguides in 4×4 
GWOR have equal length of l and let i=2. 
 Extend all the waveguides of the router created so far by 2l /3. For the horizontal 
waveguide group, bend the east end of each waveguide so that it continues to go 
 17 
south and keep the two bent waveguides in parallel. Then the bent waveguides are 
extended until they align themselves with the vertical waveguides.  
 Add another group of waveguides horizontally to the extended router after 
completing Step 2), and have the newly added waveguide group intersect with all the 
existing waveguide groups. The newly added group waveguides are labeled as IiON-
1-i and IN-1-iOi.  
 If there are more waveguide groups to be added, let i=i+1, and add them one at a time 
following Steps 2) and 3).  
I1
O1
O0
I0
IN/2-2
ON/2-2
..
.
IN/2-1
ON/2-1
ON/2IN/2IN-1ON-1IN-2ON-2 ... IN/2+1ON/2+1
...
..
.
 
Fig. 7. Type I N×N GWOR built on Type I (N-2)×(N-2) GWOR. 
Fig. 7 shows how Type I N×N GWOR is constructed from Type I (N-2)×(N-2) 
GWOR, where both the extended sections of vertical waveguides and the newly added 
waveguides are colored in blue. One can see that for any two groups of waveguides, their 
 18 
intersections form one and only one primitive 4×4 GWOR. Hence, there is one and only 
one intersection between any two orthogonal waveguides. The N ports are labeled, from 
the top ends of the vertical waveguides in counterclockwise manner, as P0(I0,O0) 
P1(I1,O1), …, PN/2-1(IN/2-1,ON/2-1), and their corresponding directly-connected ports are PN-
1(IN-1,ON-1), PN-2 (I N-2,ON-2), …, to PN/2(IN/2,ON/2). Similar to the Type I 4×4 GWOR, in 
Type I N×N GWOR, the input and output of each bi-directional port are labeled in 
counterclockwise direction. 
To build a Type II NN GWOR, similar rules to one used in the construction of Type I 
NN GWOR are applied except that: i) in Step 1, Type II 4×4 GWOR (Fig. 6(b)) is used, 
ii) in Step 2, the vertical waveguides are extended to the north and the horizontal 
waveguides are bent to the north direction. Note that the N ports are labeled from the 
bottom ends in clockwise manner as P0(I0, O0) P1(I1, O1), …, to PN/2-1(IN/2-1, ON/2-1) and 
their corresponding directly-connected ports from PN-1(IN-1, ON-1) PN-2 (I N-2, ON-2), …, to 
PN/2(IN/2, ON/2).  
The type III and IV NN GWORs can be built following the rules similar to the ones 
used when constructing Types I and II NN GWORs, except that in Step 1, Type III 4×4 
GWOR is used for Type III N×N GWOR while Type IV 4×4 GWOR for Type IV N×N 
GWOR.  
Similar to that of 4×4 GWORs, routing of N×N GWORs is realized by placing MRRs 
to the appropriate corners of the waveguide intersections based on the wavelength 
assignment scheme derived below. Given the input-output pair IiOj, the input 
wavelength C(i, j) is determined by 
 19 


















others
Nij
NjNi
jiNji
ji
jiC
Nij
NiN
Nj
N
)1,mod(
)1,21mod(
)1,2mod(
1
10,0
10,1
,1
),(




   (4) 
Tab. 2 tabulates the wavelength assigned to each input-output pair of 8×8 GWORs. 
To route light signals from Ii to Oj (i, j= 0, 1, ... , N-1), an MRR with the assigned 
wavelength is placed at the intersection of the waveguides connecting Ii and Oj where the 
values of i and j have to satisfy the conditions of i≠j and i+j≠N-1. It can be seen from 
Table II that the input-output pairs with direct connections are assigned with the same 
wavelength (e.g., λ7 in the 8×8 GWOR), which ensures that minimum number of different 
types of MRRs is used in a GWOR. Fig. 8(a) and (b) show the constructed Type I and II 
8×8 GWOR structure, respectively, where only 6 types of MRRs are needed for each 
type. 
Tab. 2. Wavelength assignment of 8×8 GWOR 
 O0 O1 O2 O3 O4 O5 O6 O7 
I0 - λ1 λ2 λ3 λ4 λ5 λ6 λ7 
I1 λ5 - λ1 λ2 λ3 λ4 λ7 λ6 
I2 λ3 λ6 - λ1 λ2 λ7 λ4 λ5 
I3 λ1 λ5 λ6 - λ7 λ2 λ3 λ4 
I4 λ6 λ4 λ5 λ7 - λ1 λ2 λ3 
I5 λ4 λ3 λ7 λ5 λ6 - λ1 λ2 
I6 λ2 λ7 λ3 λ4 λ5 λ6 - λ1 
I7 λ7 λ2 λ4 λ6 λ1 λ3 λ5 - 
 20 
I1
O1
O0 I0
I2
O4
I7 O7 I6 O6
O2
I3
O3
I5 O5
I4
λ4
λ4
λ3
λ3
λ6
λ6λ5
λ5
λ2
λ2
λ4
λ4
λ2
λ2
λ4
λ4
λ3
λ3
λ2
λ2
λ2
λ2
λ4
λ4
λ3
λ3
λ3
λ3λ6
λ6
λ6
λ6
λ6
λ6
λ5
λ5
λ5
λ5
λ5
λ5
λ1
λ1
λ1
λ1
λ1
λ1
λ1
λ1
 
(a)  Type I 
I1
O1
O0I0
I2
O4
I7O7 I6O6
O2
I3
O3
I5O5
I4
λ4
λ4
λ3
λ3
λ2
λ2
λ4
λ4
λ2
λ2
λ4
λ4
λ3
λ3
λ2
λ2
λ2
λ2
λ4
λ4
λ3
λ3
λ3
λ3
λ1
λ1
λ1
λ1
λ1
λ1
λ1
λ1
λ5
λ5
λ5
λ5
λ5
λ5
λ5
λ5
λ6
λ6
λ6
λ6
λ6
λ6
λ6
λ6
 
(b)  Type II 
Fig. 8. Structure of 8×8 GWORs. 
 21 
Type III 8×8 GWOR differs from its Type I counterpart only in its labeling order of 
the input and output for each bidirectional port since both types follow the same 
wavelength assignment. Similarly, Type IV GWOR differs from its Type II counterpart in 
the labeling order of the input and output for each bidirectional port. Both Types II and 
IV 8×8 GWORs follow the same wavelength assignment as their Types I and III 
counterparts. As such, Types I, II, III and IV N×N GWORs are isomorphic in terms of 
routing. 
The properties of an N×N GWOR (N=2n and n≥2) are given below.  
Lemma 1. For an N×N GWOR, the intersections of N/2 waveguide groups form N(N-
2)/8 primitive checkerboard-shaped cell. 
Proof: As shown in the construction rules of N×N GWOR, every two groups form a 
primitive checkerboard-shaped cell as in Fig. 7. Hence the lemma holds. ■ 
Proposition 1. The total number of MRRs in an N×N GWOR is N(N-2), which is the 
minimum number for an N×N router. 
Proof: As in Section 2.1, 8 MRRs are assigned to each primitive checkerboard-shaped 
cell. Thus, based on Lemma 1, totally N(N-2)/8×8= N(N-2) MRRs will be used. 
As no self-communication is allowed in N×N GWORs (i.e., no input signal from Ii 
will go to Oi), there exist N(N-1) possible input-output pairs in an N×N GWOR. As N 
pairs of (IiON-i) are directly connected by N waveguides, at least N(N-1)-N=N(N-2) 
MRRs are needed to realize the routing of the rest N(N-2) pairs. Hence the proposition 
holds. ■ 
Lemma 2. The number of different MRR types for an N×N GWOR is N-2, and their 
corresponding wavelengths are λ1, λ2, …, λN-2. 
 22 
Proof: From Eqn. (4), N-1 input wavelengths are needed. Hence N-2 different types of 
MRRs are needed since no MRR is used for a directly connected input-output pair. ■ 
Proposition 2. The N×N GWORs are non-blocking. 
Proof: According to the rules of constructing N×N GWORs, for any type of N×N 
GWOR, each waveguide has one and only one intersection with each waveguide in the 
other (N-2)/2 groups. Therefore, there are totally (N-2) intersections between one 
waveguide and the other waveguides. Eqn. (4) ensures that for each input, a distinct input 
wavelength is assigned for a different output. Accordingly, at the intersection of the 
waveguides connected to Ii and Oj (i≠j, i+j≠N-1), the MRR corresponding to the input 
wavelength of IiOj is assigned. As a result, along each waveguide, N-2 different types 
of MRRs are assigned and two identical MRRs only appear at the diagonal corner of a 
waveguide intersection.  
Consequently, on the N×N GWOR, for input-output pair IiOj (i≠j, i+j≠N-1), the 
light signal from Ii with the assigned wavelength (governed by Eqn. (4)) will be dropped 
to Oj at the exact intersection with the corresponding MRR. For directly connected input-
output pair IiOj (i≠j, i+j=N-1), the input signal with the assigned wavelength will 
simply go straight to reach Oj. That is, all the N-1 output ports can be reached from any 
input port. In addition, inside any one waveguide, the light signals travelling between 
different input-output pairs through the same waveguide will not interfere in each other 
because they use different wavelengths. Hence the proposition holds. ■ 
2.2.2 Generalization of GWOR to Odd-Numbered Inputs/Outputs  
The method introduced for even-numbered GWORs in previous subsection can be 
adapted to build N×N GWOR, where N=2n+1 and n≥2. Here, the N waveguides are 
 23 
divided into (n+1) groups, among which the first n groups are the same as the even 
number cases, but the (n+1)th group contains only one waveguide (InOn). The 
construction procedure introduced in Section 2.2.1 is followed to interconnect the N 
waveguides. The only difference is that at the last step, each intersection formed by the 
(n+1)th group with any other waveguide group will only contain the two upper crossings 
of a primitive checkerboard-shaped cell, namely a reduced primitive checkerboard-
shaped cell. Obviously, four types of N×N GWORs can be built based on the four types 
of 4×4 GWORs. For odd-numbered GWORs, given the input-output pair IiOj, the input 
wavelength assignment C(i, j) is determined by 


 

 others
ji
jiC
Nij ),mod(
),(

     (5) 
Tab. 3 shows the wavelength assignment of 5×5 GWOR. An MRR with assigned 
wavelength is placed at the waveguide intersections to direct light signals from Ii to Oj 
when 𝑖  and 𝑗  satisfy 𝑖  𝑗  and 𝑖 + 𝑗  𝑁 − 1 . Based on the wavelength assignment 
tabulated in Table III, a 5×5 GWOR is shown in Fig. 9.  
Tab. 3. Wavelength assignment of MRR-based 5×5 GWOR router 
 O0 O1 O2 O3 O4 
I0 - λ1 λ2 λ3 λ4 
I1 λ4 - λ1 λ2 λ3 
I2 λ3 λ4 - λ1 λ2 
I3 λ2 λ3 λ4 - λ1 
I4 λ1 λ2 λ3 λ4 - 
 24 
I1
O1
O0 I0
I2 O2
I4 O4 I3 O3
λ1
λ3
λ4
λ4
λ4λ2
λ4
λ1
λ1
λ1
λ3
λ3
λ2
λ2
λ3
λ2
 
Fig. 9. Structure of Type I 5×5 GWOR. 
The properties of 𝑁 ×𝑁 GWOR (where, N=2n+1 and n≥2) are summarized below. 
Lemma 3. For an 𝑁 × 𝑁  GWOR, the intersections of the ( + 1)  groups of the 
waveguides form  ( − 1) 2⁄  primitive checkerboard-shaped cells and n reduced 
primitive checkerboard-shaped cells. 
Proof: As shown in the construction rules of 𝑁 ×𝑁 GWOR, according to Lemma 1, the 
intersections of the first   group of waveguides form 2/)1(
2
2/)1(
2   nnCC Nn  primitive 
checkerboard-shaped cells and the intersections of the ( + 1)𝑡ℎ group with the first n 
groups form n reduced primitive checkerboard-shaped cells. ■ 
Proposition 3. The total number of MRRs used in an 𝑁 ×𝑁  GWOR (𝑁  2 + 1) is 
(𝑁 − 1)2, and this is the minimum number for an 𝑁 × 𝑁 router. 
Proof: As in Section 2.1, 8 MRRs are assigned to each primitive checkerboard-shaped 
cell and 4 MRRs are assigned to each reduced primitive checkerboard-shaped cell. Based 
on Lemma 3, totally  ( − 2) 2 × 8⁄ +  × 4  4 2  (𝑁 − 1)2 MRRs will be used.  
Similar to the proof of Proposition 1, as no self-communication is allowed in the 
𝑁 × 𝑁 GWOR and there exist N(N-1) possible input-output pairs in it, among which 
 25 
there are (N-1) pairs (IiON-i) are directly connected by waveguides, hence, at least 
𝑁(𝑁 − 1) − (𝑁 − 1)  (𝑁 − 1)2  MRRs are needed to realize the routing of the rest 
(𝑁 − 1)2 indirectly-connected pairs. Hence the proposition holds. ■ 
Lemma 4. The total number of different types of MRRs for a N×N GWOR is 𝑁 − 1, 
and their corresponding wavelengths are λ1,…, λN-1. 
Proof: Based on Eqn. (5), N-1 input wavelengths are needed as for odd number cases. 
However, different from even number cases, 𝑁 − 1 different types of MRRs are needed 
on the intersections formed by the waveguide (InOn) with other waveguides. ■ 
Proposition 4. The constructed N×N GWOR is non-blocking. 
Proof: The Proof is similar to that for Proposition 2. 
2.3 Comparison and Analysis 
In this section, we compare the number of MRRs used and estimated optical power 
loss in GWORs against several existing typical router designs, including the matrix-based 
crossbar [41], the reduced crossbar [12], the hitless router [13], the WRON [42] [21], and 
the λ-router [19].  
The number of MRRs used in an N×N crossbar is N2, which can be reduced to N(N-1) 
in a reduced crossbar, WRON or λ-router (with even-numbered sizes only). For an N×N 
GWOR, the number of MRRs used is N(N-2) (as Proposition 1 for N=2n and n≥2) or (N-
1)2 (as Proposition 3 for N=2n+1 and n≥2). Table IV shows the number of MRRs used in 
GWORs and other routers of the same sizes. It can be seen that GWOR uses the least 
number of MRRs among all the routers. An exceptional case is that for 4×4 routers, both 
hitless router and GWOR use the smallest number of MRRs (8 MRRs).  
To evaluate the optical power loss experienced in all the routers listed in Tab. 4, the 
 26 
power loss parameters given in [34] are adopted here: each MRR has a drop-loss of 
1.5dB and a through-loss of 0.01dB, and the crossing loss and bending loss of 
waveguides are 0.05dB and 0.013dB, respectively. Therefore, for a given input-output 
pair (IiOj, where i, j= 0, 1, ..., N-1, and i≠j), the total power loss on the routing path can 
be estimated by:  
Ploss(i,j)=1.5×Ndrop+0.01×Nthrough+0.05×Ncrossing+0.013×Nbending  (6) 
where (i) Ndrop denotes the number of drop-losses, and it is determined by the number of 
resonances made on the routing path, (ii) Nthrough denotes the number of through-losses, 
and it is determined by the number of MRRs passed a light signal, (iii) Ncrossing denotes 
the number of waveguide crossings, and (iv) Nbending is the number of waveguide 
bendings. The average power loss is the arithmetic mean of all possible Ploss(i, j) in the 
given router.  
Tab. 4. Number of MRRs used in different size routers 
MRRs Crossbar Reduced Crossbar Hitless Router WRON λ-router GWOR 
4×4 16 12 8 12 12 8 
5×5 25 20 - 20 - 16 
6×6 36 30 - 30 30 24 
7×7 49 42 - 42 - 36 
8×8 64 56 - 56 56 48 
 
Tab. 5 lists the power loss comparison of the maximum and average power loss 
experienced by a light signal travelling from one input port to an output port of these 
routers. It can be seen that GWOR has the lowest maximum power loss and also the 
lowest average power loss among the six router designs.  
Theoretically speaking, no tuning circuit is needed for passive MRRs used in 
 27 
GWORs. While in practical applications, the resonance wavelength of fabricated MRRs 
may be shifted from desired values due to fabrication misalignment or ambient thermal 
variation [23]. To compensate for this inherent fabrication imperfectness of MRRs, 
different methods can be used, including post-fabrication trimming techniques, such as 
electron beam trimming [25], ultraviolet (UV) trimming [23] , and doping the desired 
waveguide regions with p-type or n-type dopants [25]. As pointed out in [43], MRRs are 
extremely sensitive to temperature variation. A 1°C temperature change can cause the 
shift of the resonance wavelength by as much as ~0.1nm [43]. The thermal sensitivity of 
MRRs can be reduced by proper design of waveguides and MRRs. In [43], a slotted 
MRR upper-clad with polymethyl methacrylate (PMMA, which has the opposite TO 
coefficient compared to the silicon material) has been introduced. The experimental 
results in [43] show that the temperature dependence of a PMMA-slotted MRR is only 27 
pm/°C compared with 91 pm/°C for a regular MRR. It is shown by simulation in [43] that 
a zero thermal sensitivity condition can be achieved with a careful design in the future.  
Tab. 5. Power loss estimation of different size routers 
Power 
Loss(dB) 
Crossbar 
Red. 
Crossbar 
Hitless 
Router 
WRON λ-router GWOR 
Max Avg Max Avg Max Avg Max Avg Max Avg Max Avg 
4×4 1.86 1.68 1.84 1.67 1.73 1.16 1.71 1.29 1.71 1.30 1.64 1.09 
5×5 1.98 1.74 1.98 1.73 - - 1.78 1.43 - - 1.79 1.37 
6×6 2.10 1.80 2.10 1.79 - - 1.85 1.55 1.85 1.55 1.93 1.40 
7×7 2.32 1.86 3.32 1.85 - - 1.92 1.71 - - 2.07 1.59 
8×8 2.44 1.92 2.44 1.91 - - 1.99 1.81 1.99 1.81 2.21 1.65 
 
In practice, TO or EO effect-based (p-i-n diodes) tuning circuits are typically used to 
adjust the resonance shift caused by thermal variation. When TO tuning is employed, the 
 28 
power consumption caused by tuning is analyzed below. In GWORs, MRRs of different 
sizes are used and the channel spacing can be set large enough to avoid the overlap of 
adjacent channels under the maximum temperature variation. As such, for a given level of 
thermal variation, on each routing path of a GWOR, only those MRRs with the resonance 
wavelength corresponding to the input wavelength (based on the routing wavelength 
assignment) need to be tuned or detuned. For instance, in 4×4 GWOR (Fig. 6 ), the 
maximum and average numbers of MRRs that are needed to be tuned/detuned are 2 (for 
example, on path I1O3 in Fig. 6) and 4/3, respectively. Assuming the thermal tuning 
efficiency 0.91nm/mW as reported in [14], one can see that the total tuning power needed 
is very small.  
2.4 Summary 
In this chapter, a generic passive non-blocking router architecture is proposed, which 
can be scaled up from the basic 4×4 to any larger size. Routing in GWORs is realized by 
adopting different input wavelengths and MRRs with different geometries set by the pre-
assigned wavelengths. The wavelength assignment schemes are derived for GWORs with 
both even and odd numbered input/output ports. In essence, an N×N GWOR needs N-1 
input wavelengths, and N(N-2) (for N=2n) or (N-1)2 (for N=2n+1) MRRs for routing. 
Compared with the existing non-blocking router designs, GWOR uses the least number 
of MRRs and causes the lowest power loss for the light signal traveling from an input 
port to an output port. In addition, the passive nature of GWORs excludes TO/EO tuning 
circuits and associated power consumption as needed by those active optical routers. As 
such, the proposed GWOR can serve as the building blocks for future high performance, 
power-efficient optical NoCs.  
 29 
CHAPTER 3. REDUNDANT GWOR AND COMB SWITHING GWOR 
ARCHITECTURES  
In this chapter, redundant GWOR (RGWOR) and comb-switching GWOR are 
proposed to improve the bandwidth GWOR. RGWOR can provide multiple routing paths 
between each pair of input-output ports by cascading different types of GWORs. It also 
improves the fault tolerance of GWORs. 
Comb switches improve the bandwidth of optical network without changing the 
complexity of network topology. Comb switching GWOR is built by replacing the 
general MRRs with MRR-based comb switches. A method is proposed to design a 
minimal size comb MRR with given number of resonance wavelengths and effectively 
determine the resonance wavelength set.  
3.1 Redundant GWOR Structure 
As shown in chapter 2, an N×N GWOR is capable of routing any combination of N 
input-output pairs given N-1 input wavelengths. However, in this case, there is only 
single routing path between any input-output pair. Based on the basic GWOR structures, 
in this chapter, we first propose a redundant GWOR (RGWOR) structure so that a set of 
different wavelengths can be used to route data on multiple paths between any input-
output pair. 
To construct a RGWOR that provides M (M≥2) routing paths between any indirectly 
connected source-destination pair, Type I and Type II GWORs are used as the basic 
building blocks (similarly, Type III and Type IV GWORs can be used when applicable). 
Fig. 6 shows the structure of an N×N RGWOR with M routing paths between each 
 30 
indirectly connected input-output pair, and this structure is designated as N×N M-
RGWOR. The N×N M-RGWOR is constructed by overlaying M basic N×N GWORs; that 
is, there are M different stages (referred as stages 0, 1, 2, …, and M-1) in N×N M-
RGWOR, among which at even stages are made of Type I N×N GWORs, while odd 
stages are made of Type II N×N GWORs. Alternatively, the N×N M-RGWOR can also be 
built with Type II N×N GWORs at even stages and Type I N×N GWORs at odd stages. 
As shown in Fig. 10, the N pairs of ports in an N×N GWOR are evenly divided into 
two groups. When N=2n, the first group contains ports (I0, O0), (I1, O1),…, (IN/2-1, ON/2-1), 
and the second group contains (IN/2, ON/2), (IN/2+1, ON/2+1), …, (IN-1, ON-1); when N=2n+1, 
the first group contains ports (I0, O0), (I1, O1),…, (I(N-1)/2-1, O(N-1)/2-1) and I(N-1)/2, while the 
second group contains O(N-1)/2, (I(N+1)/2, O(N+1)/2), (I(N+1)/2+1, O(N+1)/2+1), …, (IN-1, ON-1). For 
each GWOR, at stage k (0≤k≤M-1) of the N×N M-RGWOR, denote Ik,i as the i
th input port 
and Ok,j for the j
th output port (0≤i≤N-1, 0≤j≤N-1). The adjacent GWORs are connected 
according to the following rules: 
 For the GWOR at stage k (0<k<M-1), connect its first port group to the second port 
group of the GWOR at stage k-1 using the waveguides, Ik,iOk-1,j and Ok,iIk-1,j (j=N-
1-i), and connect its second port group to the first port group of the GWOR at stage 
k+1 using the waveguides, Ik,iOk+1,j and Ok,iIk+1,j (where j=N-1-i). No intersections 
of waveguides are allowed to form when connecting two adjacent GWORs.  
 For the GWOR at stage 0, only connect its second port group to the first port group of 
the GWOR at stage 1. For the GWOR at stage M-1, only connect its first port group 
to the second port group of the GWOR at stage M-2 in the same manner as described 
in Rule 1). 
 31 
..
.
Type I
N×N
GWOR
I0
O0
IN/2-1
ON/2-1
IN/2-1
ON/2-1
ON/2
IN/2
ON-1
IN-1 O0
I0
..
.
..
.
Stage 0
Type II
N×N
GWOR
ON-1
IN-1
ON/2
IN/2
..
.
..
.
Stage 1
Type I
N×N
GWOR
..
.
Stage 2
Type II
N×N
GWOR
Stage M-1
...
...
...
...
...
...
IN/2
ON/2
IN-1
ON-1
..
.
 
Fig. 10. Schematic of an N×N M-RGWOR structure. 
Following the above connection rules, the waveguides connecting the adjacent 
GWORs will not intersect with other waveguides. This property can be seen from an 
example of a 4×4 4-RGWOR as shown in Fig. 11. 
I1
O1
λ1
λ1
λ2
λ2
λ1
λ1
λ2
λ2
O3I3
I2
O2
I0
O0
λ4
λ4
λ5
λ5
λ4
λ4
λ5
λ5
I0 O0
I1
O1
Stage 0
λ7
λ7
λ8
λ8
λ7
λ4
λ8
λ8
O3I3
I2
O2
O3 I3 I0O0
I2
O2 I1
O1 λ10
λ10
λ11
λ11
λ10
λ10
λ11
λ11
I0 O0
I1
O1
O3 I3
I2
O2
Stage 1 Stage 3Stage 2
I2
O2
I3
O3
 
Fig. 11. Structure of the 4×4 4-RGWOR. 
The wavelength assignment of a RGWOR is determined by three factors: the input 
port number, the output port number, and the stage that the GWOR belongs to. For an 
N×N M-RGWOR, given the input-output pair IiOj, the input wavelength which shall 
make turn in the GWOR at stage k (0≤k≤M-1) is determined by: 
  (𝑖 𝑗)   (𝑁 − 1) +   (𝑖 𝑗)    (6) 
where k is the stage of the GWOR, and C0(i,j) is the input wavelength assigned to N×N 
 32 
GWOR given IiOj (by Eqn. (3) for N is even or Eqn. (4) for N is odd). The wavelength 
assignments of the 4×4 4-RGWOR and 5×5 4-RGWOR are shown in Tab. 6 and Tab. 7, 
respectively. 
Tab. 6. Routing wavelength assignment of the 4×4 4-RGWORs 
 O0 O1 O2 O3 
I0 -  λ1, λ4, λ7, λ10 λ2, λ5,λ8, λ11 λ3, λ6, λ9, λ12 
I1 λ1, λ4, λ7, λ10 - λ3, λ6, λ9, λ12 λ2, λ5,λ8, λ11 
I2 λ2, λ5,λ8, λ11 λ3, λ6, λ9, λ12 - λ1, λ4, λ7, λ10 
I3 λ3, λ6, λ9, λ12 λ2, λ5,λ8, λ11 λ1, λ4, λ7, λ10 - 
Tab. 7. Routing wavelength assignment of the 5×5 4-RGWORs 
 O0 O1 O2 O3 O4 
I0 - λ1, λ5, λ9, λ13 λ2, λ6, λ10, λ14 λ3, λ7, λ11, λ15 λ4, λ8, λ12, λ16 
I1 λ4, λ8, λ12, λ16 - λ1, λ5, λ9, λ13 λ2, λ6, λ10, λ14 λ3, λ7, λ11, λ15 
I2 λ3, λ7, λ11, λ15 λ4, λ8, λ12, λ16 - λ1, λ5, λ9, λ13 λ2, λ6, λ10, λ14 
I3 λ2, λ6, λ10, λ14 λ3, λ7, λ11, λ15 λ4, λ8, λ12, λ16 - λ1, λ5, λ9, λ13 
I4 λ1, λ5, λ9, λ13 λ2, λ6, λ10, λ14 λ3, λ7, λ11, λ15 λ4, λ8, λ12, λ16 - 
 
Properties of an 𝑁 × 𝑁 M-RGWOR are summarized below. 
Lemma 5. For an N×N M-RGWOR, given the input-output pair IiOj (where 𝑖  𝑗, 
and 𝑖 + 𝑗  𝑁 − 1 ), there exist M routing paths when using M different input 
wavelengths. 
Proof: According to the construction rules of an 𝑁 × 𝑁  M-RGWOR, there is no 
waveguide intersection between any two adjacent GWORs. As such, on the GWOR, at 
each stage of an 𝑁 × 𝑁 M-RGWOR, each waveguide has one and only one intersection 
with each of the waveguide in other waveguide groups. Therefore, totally M intersections 
are formed between one waveguide with any waveguide in other waveguide groups. At 
each of such intersection, the MRR corresponding to the input wavelength determined by 
 33 
Eqn. (6) is assigned to a given input-output pair IiOj (where i≠j, and i+j≠N-1). Totally, 
there are M distinct MRRs assigned and each of these MRRs introduces a routing path 
from Ii to Oj. Note that the M logical routing paths may share some waveguides. Hence, 
the Lemma holds. ■ 
Proposition 5. The N×N M-RGWOR is non-blocking. 
Proof: The basic N×N GWOR is non-blocking (propositions 2 and 4). According to 
Lemma 5, from any input port of an N×N M-RGWOR, all the other (N-1) outputs are 
reachable with M routing paths (assuming no self-communication is allowed). Following 
the wavelength assignment governed by Eqn. (6), one can see that the light signals 
travelling between different input-output pairs through the same waveguide do not 
conflict with each other because these signals use different wavelengths. Hence the 
proposition holds. ■ 
Compared with GWOR, RGWOR can support higher bandwidth by having more than 
one input wavelength available for routing light signals between any single input-output 
pair. Another distinct advantage of RGWOR is its fault-tolerance capability by providing 
multiple routing paths between any input-output pair. When one of the routing path 
between an input-output pair fails due to the malfunction of the MRR, the remaining 
paths can still help maintain the connectivity between the input and output ports. 
3.2 ONoC Built with Comb Switches 
Redundant GWOR expands the bandwidth of GWOR with the cascading of different 
type GWORs, which however that the same time increases the complexity of architecture 
and cost of implementation. Comb-switching technology provides another solution for 
the expanding of bandwidth of ONoC by utilizing comb switch MRRs.  
 34 
Without loss of generality, we will use the Generic Wavelength-routed Optical Router 
(GWOR) [6] as an example to describe how to use comb switches to construct WDM-
enabled wavelength-routed ONoC. Fig. 12 shows a 4×4 comb switch MRR-based 
GWOR and its counterpart with general MRRs. To improve the bandwidth of an N×N 
GWOR by M times, we can replace the MRRs with comb MRRs, each of which 
resonates the exact M wavelengths in a set Λi, where |Λi|=M, and Λ1∩Λ2∩…=Ф. We 
denote such type of GWOR as M-GWOR. Fig. 12(b) shows the 4×4 4-GWOR structure. 
The routing wavelength assignments of the general 4×4 GWOR and the comb switch 4×4 
4-GWOR are given in Tab. 8 and Tab. 9, respectively. It can see that the wavelength set 
of comb switch GWOR is a super set of general GWOR’s.  
I1
O1
O0 I0
λ1
λ1
λ2
λ2
λ1
λ1
λ2
λ2
O3I3
I2
O2
S
E
N
W
 
I1
O1
O0 I0
O3I3
I2
O2
S
W
N
E
Λ2 
Λ1 
Λ1 
Λ1 
Λ1 
Λ2 
Λ2 
Λ2 
 
      (a)  4×4 General GWOR            (b)  Comb-switched 4×4 4-GWOR 
Fig. 12. Structures of basic/WDM-enabled GWOR. 
Tab. 8. Wavelength assignment of 4×4 general MRR-based GWOR 
 O0 O1 O2 O3 
I0 -  λ1 λ2 λ3 
I1 λ1 - λ3 λ2 
I2 λ2 λ3 - λ1 
I3 λ3 λ2 λ1 - 
Note: "-" stands for not applicable. 
 35 
Tab. 9. Wavelength assignment of 4×4 4-GWOR 
 O0 O1 O2 O3 
I0 -  λ1, λ4, λ7, λ10 λ2, λ5,λ8, λ11 λ3, λ6, λ9, λ12 
I1 λ1, λ4, λ7, λ10 - λ3, λ6, λ9, λ12 λ2, λ5,λ8, λ11 
I2 λ2, λ5,λ8, λ11 λ3, λ6, λ9, λ12 - λ1, λ4, λ7, λ10 
I3 λ3, λ6, λ9, λ12 λ2, λ5,λ8, λ11 λ1, λ4, λ7, λ10 - 
 
Noticeably, with the same routing wavelength assignment but much simpler structure, 
an M-GWOR can achieve the same bandwidth improvement as the same size redundant 
GWOR (RGWOR) does. In general, the number of MRRs used in an N×N M-GWOR is 
reduced by (M-1)/M compared with that of an N×N M-RGWOR. Similarly, using comb 
switches, the bandwidth of other wavelength-routed ONoC can be improved without 
changing the network structure.  
3.3 Determination of Comb MRR Size 
In designing an ONoC with MRR-based comb switches, it is preferred to minimize 
the size of comb switches [12]. Next we will show how to determine the minimal size 
comb MRR for a given number of resonance wavelengths.  
Assume the spectrum range of input wavelengths λ0, λ1, …, λW-1 is [γ0, γW-1], where γi= 
γ0+i∆γ, 0≤i≤W-1, W is the total number of available channels, and Δγ is the constant 
channel spacing. Assume each comb MRR is capable of resonating M (M≤W) different 
wavelengths in the given spectrum range. The problem is to find out such a set of M 
resonance wavelengths with a minimal size comb MRR (i. e. with minimal cavity L). A 
heuristic method is proposed to solve this problem. 
Based on Eqn. (2), the M wavelengths satisfy: 
  
 
  
  2
 
  
     
 
  
          (7) 
 36 
where c is the velocity of light. Let γr be the reference frequency and the smallest 
frequency in the set of comb MRR resonance wavelengths and m be the corresponding 
integer satisfying Eqn. (7), i. e.   
  
      , which shows that once γr is determined, 
the comb MRR size will be determined by the integer m only. The smaller m is, the 
smaller the MRR size L is. As such, let mr be the smallest integer and the size of a comb 
MRR (L) will be minimized.  
For any other frequency γi in the resonance wavelength set of the comb switch, let γi = 
γr +kΔγ and the corresponding integer satisfying Eqn. (7) is: mi=mr +n, where, Δγ is the 
channel space, k, n are positive integers (k≤W, n≥1). As both <γi, mi > and <γr , mr > 
satisfy the Eqn. (7), we can derive that 
 𝑟
 
 
 
  
  
      (8) 
where, γr/Δγ is an integer and its value is determined once γr is chosen. Let integer S= 
γr/Δγ, from Eqn. (8), we know that mr and k/n must be divisors of S. As k≤W and n≥1, we 
have S/ mr = k/n≤ W and thus mr ≥ S/W. The following procedure is developed to choose 
values for mr, k and n:  
 
 37 
 
 
Example: Determine the size of a comb switch for 4×4 4-GWOR (i.e., M=4) in 
spectrum range γ: 192.10~195.90THz (corresponding to the visible light band λ: 
1530~1560nm, ∆γ=100GHz and W=39).  
Step 1: Let γr=192.6THz, then S=γr/∆γ=1926=2×3×3×107, and ɸ = [44 9, 18, 107, 214, 
321, 642, 963]}; 
Step 2: Calculate (M-1)S/(W-1)=152, the smallest factor of S that is greater than (M-
Find (Ʌ, L); // find the routing wavelength set Ʌ and the corresponding 
minimal size for comb MRR  
Input: (1) γ0; //lowest frequency of given spectrum range 
(2) Δγ; //channel space of given spectrum range 
(3) W; //total number of channels of given spectrum range  
Output: (1) Ʌ={γr , …, γr+M-1};// a set of M routing wavelengths of comb 
MRR 
 (2) L // comb MRR size 
Procedure body:  
{ 
Let γr= γ0, Ʌ =Ø, L=0, S= γr / Δγ; 
do{ 
while (S is a prime) { 
γr = γr + Δγ; 
S= γ/ Δγ; 
}  
Find the set of factors (ɸ) of S: ɸ = {a0, a1, …, ap-1},where integer p 
≥1, 1< ai <S, and ai <aj for 0≤i<j<p; 
Let amin=min{ ai |ai ≥ S(M-1)/(W-1), ai ϵ ɸ, 0≤i<p}; 
if (there exists amin ≥ a0){ 
Let mr= amin; 
Let n=1, num=0; 
do{ 
k= nS/mr; 
γ= γr + kΔγ; 
Ʌ = Ʌ U { γ}; 
num=num+1; 
n=n+1; 
}while (num≤M-1); 
Calculate L=cmr/neff γr; 
Return (Ʌ, L);  
}else{ 
γr = γr + Δγ; 
S= γr / Δγ; 
} 
} while(Ʌ =Ø); 
} 
 38 
1)S/(W-1) is 214.  
Step 3: Let mr=214, we have k/n =9. For n=1, 2, 3, 4, …, k=9, 18, 27, 36, …, which 
shows we can find M-1 pairs of <k, n> (<9, 1>, <18, 2>, <3, 27>, and <36, 4>) and thus 
M-1 more different wavelengths can be found in the given band for the reference 
wavelength and a minimal size comb MRR can be created for the selected wavelength 
set.  
Step 4: The resonance wavelengths are calculated and listed in Tab. 10. Then the comb 
MRR size (L) can be determined by Eqn. (7). 
Tab. 10. Example of resonance wavelength set for selected reference frequency. 
i mi ni ki γi (THz) λi(nm) 
0 214 - - 192.6 1556.55 
1 215 1 9 193.5 1549.32 
2 216 2 18 194.4 1542.14 
3 217 3 27 195.3 1535.04 
4 218 4 36 196.2 1527.99 
 
3.4 Summary 
In this chapter, redundant GWOR is introduced to improve the bandwidth and fault 
tolerance of basic GWOR. Comb switches can used to improve the bandwidth of any 
MRR-based wavelength-routed optical NoC without changing the basic network 
structure. Comb MRR-based GWOR is introduced as an example of comb switching 
optical NoCs. Given the number of resonance wavelengths, the minimal size comb MRR 
can be determined with the proposed method. 
 39 
CHAPTER 4. HYBRID OPTOELECTRONIC NETWORK-ON-CHIP 
ARCHITECTURE 
As stated in Chapter 1, hybrid optoelectronic NoC (HONoC) holds the promise for 
next generation many-core systems. In this chapter a hybrid optoelectronic NoC 
architecture is proposed to efficiently utilize the data bandwidth offered by the proposed 
optical network architecture for global communication while take advantages of the 
flexibility of electronic networks for local communication. As evaluated in [45], 
hierarchical NoC has advantage in supporting localized traffic patterns which are 
commonly observed in real applications [39] [46]. Particularly, we consider tree-based 
networks because the 3D layout of a tree-based network does not require the change of 
node degrees compared to its 2D counterpart. While other topologies such as mesh-based 
networks need increase the node degrees from 2D to 3D architecture [39]. The proposed 
hybrid NoC is built on a butterfly fat tree structure [40, 46], where GWORs are used as 
the root nodes while electronic routers used at intermediate level nodes. 
4.1 HONoC Architecture 
Compared with generic fat tree (also known as SPIN [39]), a butterfly fat tree 
compromises with less number of parent nodes. Fig. 13 (a) and (b) illustrate a 16-core 
[46] and a 64-core [40] 4-ary electronic butterfly fat tree (BFT) NoC structure, 
respectively. In these BFT networks, there are two types of nodes: 1) processing cores, 
which are leafs of BFT placed at level 0, and 2) routers which are placed at intermediate 
and root levels. For 4-ary BFTs, each parent node has 4 child nodes while each child node 
has two parent nodes. Each pair of a parent node and a child node is connected by one 
 40 
link. Starting from the leaf level to the root level, each level is labeled as level 0, level 
1, …, and so on. For an N-core BFT network, the number of routers located at the ith level 
is thus given by N/2i+1(0<i≤log2N
1/2).  
Level 0
Level 1
Level 2
 
(a) 16-core 
Level 2
Level 3
Level 1
Level 0  
(b) 64-core 
Fig. 13. Butterfly fat tree structure. 
In the proposed hybrid BFT-based optoelectronic NoC, GWORs are used at the root 
level routers and electronic routers are used for routers at intermediate levels. 
Considering the limitation of available on-chip lasers, only 4×4 GWORs are considered. 
The details of different hybrid NoCs are presented in the following sections.  
4.1.1 Architecture of 16-Core BFT-GWOR-Based HONoC  
The architecture of the 16-core BFT-based hybrid optoelectronic NoC is shown in 
Fig. 14, which is built from the 16-core BFT topology shown in Fig. 13(a). The 16 cores 
are divided into 4 clusters and each cluster is connected by a cluster router as in a 
 41 
concentrated mesh (CMesh) [46]. Instead of using two electronic routers, all four clusters 
are connected by one 4×4 GWOR. In this hybrid NoC, the inter-cluster communication is 
done through the GWOR-based optical network, while the local, intra-cluster 
communication is handled by the electronic routers. 
7×7 
Hybrid 
Router
4×4 GWOR
Cluster
4x4 GWOR
λ2λ1
λ2 λ1
λ1
λ1
λ2
λ2
 
Fig. 14. Architecture of 16-core HONoC. 
Consider its electronic counterpart where each cluster router at level 1 has six links – 
four down to its child nodes and two up to its parent nodes. As a comparison, the top two 
levels of the hybrid BFT-based HONoC are simpler as: 1) only one GWOR is used at the 
top level and 2) each cluster router connects to the GWOR through one optical link 
though still has four links down to its child nodes. Each cluster router thus functions as 
the interface (referred as the hybrid router) between the optical layer and the electronic 
layer. As such, the number of electronic routers is reduced to 4 and the number of 
electrical links is also reduced. From Chapter 3, we know that using the WDM 
technology, each optical link of a 4×4 GWOR can support three bidirectional optical 
channels (each on a different wavelength) for each cluster. Assuming that each optical 
 42 
channel operates at the same bandwidth as an electronic link, the total bandwidth 
supported by one 4×4 GWOR is 1.5 times of that of the root level of its electronic 
counterpart in Fig. 13(a). 
4.1.2 Architecture of 64-Core BFT-GWOR-Based HONoC  
Fig. 15 shows the architecture of the 64-core BFT-based HONoC. Compared with the 
electronic BFT in Fig. 13(b)., in the 64-core HONoC, the nodes at level 0 and level 1 are 
the same as those in its electronic counterpart. What makes the HONoC different is that: 
1) two 4×4 GWORs are used to replace the four routers at the root level of the BFT; 2) at 
level 2, the eight electronic routers are replaced by eight hybrid routers as those used in 
the 16-core HONoC. Compared with its electronic counterpart, in the 64-core BFT-based 
HONoC, the number of electronic routers and number of links are both reduced while the 
total bandwidth supported at the root level is increased by 1.5 times. 
4×4 GWOR 4×4 GWOR
Group
7×7 Hybrid Router 6×6 Electronic Router Processing Core
 
Fig. 15. Architecture of 64-core HONoC. 
We define the electrically connected parts as one group. Totally there are four groups 
and each group is like a 16-core BFT. The intra-group communication is handled by 
electronic routers inside each group. The inter-group communication (also referred as 
global communication) is handled by the two GWORs.  
 43 
4.1.3 Architecture of 256-Core BFT-GWOR-Based HONoC  
Following the rules described in [40], a 256-core BFT needs one more level and eight 
more routers than the 64-core BFT. In the hybrid 256-core HONoC, four 4×4 GWORs 
are used to replace the eight routers at level 4 of the electronic BFT while keep the 
structure from level 0 to level 2 unchanged. Routers at level 3 are hybrid routers used in 
the 16-core and 64-core HONoCs. Fig. 16 illustrates the 256-core BFT-GWOR-based 
HONoC architecture.  
Similarly, the 256 cores are divided into four groups and each group is a 64-core 
electronically connected BFT. Intra-group communication is handled by electronic 
routers inside each group. Inter-group communication is handled by the GWORs at the 
root level. In each group, the 64 cores are further divided into 4 sections; each section 
consists of a directly connected 6-core BFT.  
4×4 GWOR
GroupSection
7×7 Hybrid Router 6×6 Electronic Router Processing Core
4×4 GWOR 4×4 GWOR 4×4 GWOR
 
Fig. 16. Architecture of 256-core HONoC. 
Follow the similar methods listed above, for any N=2n (n≥ 4) cores, a BFT-GWOR-
based HONoC can be constructed. In the built N-core HONoC, the number of levels is 
(L+1) where L=┌log4N┐, and the number of routers at each immediate level is 
N/2i+1(0<i<L), while the number of GWORs used at the root level is N/2L+2. 
4.1.4 Properties of HONoC Architecture  
The properties of the proposed HONoC architectures are summarized and compared 
 44 
with those of electronic BFT, Mesh and CMesh architectures. Tab. 11 lists the number of 
routers and links used in different size HONoCs. Tab. 12 shows the comparison of 
HONoC with other architectures. It can be found that HONoC simplifies the electronic 
BFT architectures by reducing the number of electronic routers and the number of 
electronic links. It is clear that tree-based NoCs use less number of routers and number of 
links than Mesh counterparts, but more than CMesh counterparts. Tab. 13 compares the 
average number of hops of HONoCs and other architectures. It can be seen that in larger 
size networks, data traverse less average number of hops in tree-based NoC than in Mesh 
and CMesh counterparts. HONoC has the least average number of electronic hops among 
these topologies. Tab. 14 lists the node degrees of the four types of topologies.  
Tab. 11. Number of routers and links used by HONoCs  
 Number of Routers Number of Links 
 
Electronic  Photonic  Electronic Photonic 
16-core 4 1 16 4 
64-core 24 2 96 8 
256-core 112 4 448 16 
Tab. 12. Comparison of electronic routers and electronic links in different NoCs  
 
Number of Routers Number of Links 
 
Mesh CMesh BFT HONoC Mesh CMesh BFT HONoC 
16-core 16 4 6 4 40 20 24 16 
64-core 64 16 28 24 176 88 112 96 
256-core 256 64 120 112 736 368 480 448 
Tab. 13. Comparison of average number of hops in different NoCs  
 
Mesh CMesh BFT 
HONoC 
electronic Photonic 
16-core 4.2 2.07 2.6 1.8 0.8 
64-core 8.11 4.05 4.4 3.67 0.76 
256-core 16.05 8.027 6.36 5.54 0.75 
 45 
Tab. 14. Comparison of node degree of routers in different NoCs  
 
Mesh CMesh BFT HONoC 
16-core 5 6 6 7 
64-core 5 6 or 8 6 6 or 7 
256-core 5 6 or 8 6 6 or 7 
 
4.2 HONoC Routers  
In the above proposed hybrid NoCs, there are three types of routers: 1) 4-port optical 
router (i.e., 4×4 GWOR); 2) 6-port electronic router; 3) 7-port hybrid router. In the 
following, we will describe the architectures of the 6-port electronic and 7-port hybrid 
routers.  
Wormhole switching is adopted for all routers. Each packet is divided into fixed 
length flow control units, called flits. There are three types of flits: head, body, and tail. 
The head flit of a packet contains the routing information (source id, destination id etc.), 
as shown in Fig. 17. Once receiving the head flit of a packet, routers check the routing 
information and compute the data path for the packet. The router will keep the path for 
the packet until its tail flit leaves the router. 
Head_Flit
Type Src_ID Dst_ID Time_Stamp Data
 
Fig. 17. Head flit format. 
4.2.1 Electronic Router  
Fig. 18 shows the architecture of the electronic router used in the HONoC. All ports 
are bidirectional (with input ports denoted as I0~I5 and output ports O0~O5). The first four 
ports I0/O0~I3/O3 are connected with its four child nodes and the rest two with its parent 
 46 
nodes I4/O4~I5/O5 are connected with its parent nodes. FIFO buffers are used in input 
ports for data buffering and virtual channels (VCs) are used to improve channel 
efficiency. Each input port has a VC allocator to allocate incoming data and to arbitrate 
which VC can send data. The switch allocator is used to resolve possible conflicts at 
output sides: for each output port, if there are multiple simultaneous requests from 
different input ports, the switch allocator arbitrates which one can use the port. In each 
cycle, only one request can be granted for each output port. A 6×6 crossbar is used for 
switching flits from input ports to output ports. 
Routing Computation
VC 
Allocator
Switch 
Allocator
                 6×6 Crossbar
I5 
I0 ...
...
O5 
O0 
...
...
Routing Co putation
VC 
Allocator
...
 
Fig. 18. Six bidirectional port electrical router architecture. 
4.2.2 Hybrid Electrical-Optical Router  
Fig. 19 shows the architecture of hybrid routers. Hybrid routers are the interface 
between the electronic network and the optical network of HONoC. Each hybrid has four 
electronic links connecting to its child nodes (electronic router or processing cores) and 
one optical link connecting to GWOR. As there are three optical channels sharing the 
 47 
optical link, which means three ports needed for data routing (ports 4 to 6). Hybrid 
routers are the interface between the electronic network and the optical network of the 
NoC.  
                     7×7 Crossbar
Switch Allocator
I0 
I3 
O0 
O3 
GWOR
Buffer
VC Allocator
Routing 
Computation
O/E
O/E
O/ED
E
M
U
X
S/P
S/P
S/P
P/S
P/S
P/S
Routing 
Co putation
VC Allocator
...
...
...
...
...
...
GWOR
E/O
E/O
E/O
M
U
X
λ1 
λ2 
λ3 
λ1 
λ2 
λ3 
...
...
I4 
I5 
I6 
O4 
O5 
O6 
 
Fig. 19. Four electrical three optical bidirectional hybrid router architecture. 
At the hybrid router, data in electronic signals needs to be converted into optical 
signal (E/O) and routed by GWOR to its destination group where optical signals will be 
converted back to electronic signals (O/E). As there are three optical channels sharing the 
optical link, three separated ports are used for routing purpose. Thus totally there are 
seven ports. Each optical channel is associated with a set of interface circuits, i.e. the O/E 
circuit for converting optical signals into electronic signals (including a DEMUX for 
wavelength decoding and a deserializer for serial to parallel (S/P) conversion) at the input 
side, and the E/O circuit for converting electronic signals into optical signals (including a 
MUX for coupling different wavelength signals and a serializer for parallel to serial (P/S) 
conversion) at the output side. A 7×7 crossbar is used as the switching component. 
Similar to electronic routers, VCs are used at each input ports for data buffering. The 
 48 
routing strategy is different for electronic input channels and optical input channels: data 
from each electronic input channel may be routed to one of the 7 output channels while 
data from each optical channel can only be routed to one of the four electronic output 
channels.  
4.3 Routing Algorithms  
An N-core HONoC (𝑁  2  and  ≥ 4 ) is an (L+1)-level hierarchical structure, 
where the number of levels L=┌log4N┐. The levels are labeled as level 0, level 1 …, level 
L from leaves (lowest) to root (top), and the labeling of nodes is introduced as follow: 
 The N cores (or processing element, or PE in abbreviation) located at level 0 are 
labeled as PE0, …, PEN-1 from left to right. 
 At intermediate levels, there are N/2i+1 nodes of routers at level i (0<i≤L-1). Those 
routers are labeled as Ri,m (m=0,1, …, N/2
i+1 from left to right). Except the level (L-1), 
routers on intermediate levels are all electronic routers shown in Fig. 18. In HONoCs, 
the level (L-1) is the interface between electronic network and optical network. 
Routers used at this level are hybrid routers (HRs) shown in Fig. 19. For some cases, 
there are no electronic routers, for example, in the 16-core HONoC (Fig. 14), the only 
intermediate level is the interface between electronic network and optical network. So 
only hybrid routers are used in this structure. 
 At the root level L, there are M=N/2L+2 optical routers (GWORs) which can be 
labeled as OR0, …, ORM-1 from left to right.  
As GWORs routing algorithm has been introduced in Chapter 2, only algorithms for 
intermediate level router are presented here. 
4.3.1 Routing Algorithm of Electronic Router 
 49 
The 6×6 electronic routers are used at intermediate levels from level l to level (L-1). 
To forward a new packet, electronic routers needs to find out through which output 
port/channel the packet can reach to its destination. If there are multiple paths, arbitration 
is needed. As the routing information of the packet is contained in the head_flit, routing 
happens only to head flits and the router will keep the routing path for the packet until its 
tail_flit leaves the router. The algorithm for this type of routers is: 
 
4.3.2 Routing Algorithm of Hybrid Router 
Hybrid router routing algorithm is similar to that of the electronic routing. The 
difference is in hybrid routers, the packets received from optical channels will only be 
forwarded to the child nodes of the hybrid router.  
Find(Oindex): //find out port index for a given header flit 
Input: (1) N ; //number of cores, N=2n  
(2) R(i, m); //current router: i is the index of level the router locates and m is the No. of 
the router at level i from left to right  
(3) flit; //a head flit to be routed  
Output: Oindex; //the out port index that the packet will be forwarded to  
Define constants: 
(1) num_local=4; //number of child nodes, index 0~3 
(2) num_external =2; //number of parent nodes, index 4~5 
(3) Total number of groups=4; 
Procedure body:  
{ 
let L=┌log4N┐; //total number of levels of router nodes in the N core network  
let Si=N/4
i; //total number of sections at level i 
let Mi=(N/2
i+1)/Si; //total number of routers in each section at current level i 
let curr_sect_id =└m/Mi┘; //index of the section that current router belongs to 
let dst_sect_id = └flit.dst_id/4i┘; // index of the ancestor section of the dst_id at the level i  
if (dst_sect_id == curr_sect_id ){ 
Oindex =└flit.dst_id/4
i-1 ┘ mod 4; //out port index: one of the child nodes of current router 
} 
else{ 
Oindex = num_local +randInt(0, num_external-1); //out port index: one of the parent 
nodes of current router 
} 
} 
 50 
 
 
4.4 Summary 
This chapter introduces the architecture of hybrid BFT-GWOR-based NoC (HONoC). 
The microarchitecture of its routers and the routing algorithms are introduced too. 
HONoCs simplifies the architecture of BFT by reducing the number of routers and 
electronic links. Compared with other electronic network counterparts, HONoC has the 
least average number of electronic hops. The routing algorithms in HONoC are simple 
and deadlock-free naturally. 
Find(Oindex): //find out port index for a given header flit 
Input: (1) N ; //number of cores, N=2n  
(2) R(L-1,m); //current hybrid router: m is the No. of the hybrid router at the level L-1 from 
left to right  
(3) flit; //a head flit to be routed  
Output: Oindex; //the out port index that the packet will be forwarded to  
Define constants: 
(1) num_elec=4;//total number of electrical ports, index 0~3 
(2) num_opt =3; //total number of optical ports, index 4~5 
(3) Total number of groups=4; 
Procedure body:  
{ 
let L=┌log4N┐; //number of levels of router nodes 
let S=N/4L-1; //total number of sections at level L-1 
let M=(N/2L)/S; //total number of routers in each section at level L-1 
let dst_sect_id= └flit.dst_id/4L-1┘; //index of the destination id’s ancestor section at the level i  
let curr_sect_id =└m/M┘; //index of the section that current router belongs to 
if (dst_sect_id == curr_sect_id ){ 
 Oindex =└flit.dst_id/4
L-2 ┘ mod 4; //index of electrical channel of current router  
} 
else if(dst_sect_id >curr_sect_id){ 
Oindex = num_elec +dst_sect_id-1; //index of optical channel of current router 
} 
else{// dst_sect_id <curr_sect_id 
 Oindex = num_elec +dst_sect_id; //index of optical channel of current router  
} 
} 
 51 
CHAPTER 5. SIMULATION AND PERFORMANCE EVALUATION 
This chapter introduces the simulator that we have developed for evaluating 
optical/hybrid NoC performance. The statistical simulation results of proposed hybrid 
NoC architectures are presented network performance and energy dissipation on data 
transmission. Comparisons with typical NoC designs are presented, too. 
5.1 Simulator Development 
Currently there is no commercial simulator is available for the evaluation of optical or 
hybrid optoelectronic NoCs. To evaluate the performance of the proposed hybrid NoCs, a 
cycle accurate NoC simulator has been developed based on Noxim [47] with SystemC. 
The structure of the simulation platform is shown in Fig. 20. The major parts of the 
simulator include: 
1) Three libraries: SystemC library for simulation interfaces, Noxim library for mesh-
based NoC architectures, and the library we built for optoelectronic NoCs; 
2) Parameters configurator for simulation parameters setup; 
3) Traffic generator which generates data and triggers data transmission; 
4) Traffic receptor for collecting simulation results; 
5) Topology descriptor.  
With this simulator, we can simulate traditional NoC (Mesh, CMesh, BFT, etc.), 
hybrid optoelectronic NoC (HONoC, etc.), or optical NoC (GWOR, etc.) in different 
sizes. The performance evaluation metrics include: throughput, latency, and energy 
dissipation. The workflow of simulation is shown in Fig. 21. 
 52 
Noxim Library:
1). 5x5 electronic router
2). processing element (PE)  
Parameters configurator 
Traffic receptor
Traffic generator
NoC: 
Mesh, CMesh, HONoC, 
etc.
Library of optoelectronic NoC: 
1). 6x6 electronic router
2). cluster for 4-way CMesh
3). cluster for BFT, 
4). 7x7 optoelectronic router
5). 4x4 optical router GWOR      
SystemC library for module, 
channel, and procedure 
Topology description 
 
Fig. 20. Simulator structure. 
NoC  topology description
Build HONoC with NoC 
libraries 
Simulation setup: choose traffic 
pattern, set parameters like simulation 
time, warm-up time, etc. 
Simulation 
Collect data and 
output results
 
Fig. 21. Simulation Flow. 
5.2 Simulation setup 
5.2.1 Topologies and Layouts 
We are going to evaluate the network and power performance of 16-core, 64-core, 
 53 
and 256-core HONoCs. For each size HONoC, the same sized Mesh-based (Fig. 22 (a)) 
and CMesh-based (Fig. 22(b) and (c)) NoC are also simulated for comparison. 
Performance metrics chosen for the evaluation include average packet latency, average 
throughput, and average energy dissipation on received packets. 
     Cluster   
     (a)  4×4 Mesh                      (b)  2×2 4-way CMesh          (c)  4×4 4-way CMesh 
Fig. 22. Mesh and CMesh topologies. 
The layout of the proposed HONoC architectures will directly impact its 
performance. 3D IC technology makes the integration of hybrid signaling communication 
a viable solution for large size NoC designs. Fig. 23 and Fig. 24 illustrate the layout of 
the 16-core and 64-core BFT-based HONoCs in two layers, in which the optical network 
is laid on the top layer while the electronic network is laid on the bottom layer. The 
electronic layer is composed of tiles. Each tile consists of four cores and one electronic 
router/hybrid router connected like a 4-way CMesh. The vertical links between two layers 
is much shorter than regular electrical links (vertical link length of 20μm with a 2.5×2.5 
mm2 title area [39]) and thus the data transmission latency and energy consumption on 
vertical links can be neglected. 
 54 
Core Core
Core Core
Core Core
Core Core
Core Core
Core Core
Core Core
Core Core
 
Fig. 23. Layout of 16-core hybrid GWOR-BFT-based NoC 
 
Fig. 24. Layout of 64-core hybrid GWOR-BFT-based NoC 
5.2.2 Simulation Parameters 
The proposed HONoC architectures have an electronic network and an optical 
network. The first thing is to ensure both networks run with balanced bandwidth. That is 
to say, the input/output bandwidth at interfaces of both of hybrid routers should be 
balanced. For a hybrid router, the optical bandwidth of all its electronic channels has to 
 55 
match with that of all its optical channels. Assume the maximum frequency of the 
processing core CPU is fe Hz, and channel width is n bits (which is equal to the flit size in 
our design), then for a hybrid router, the total data bandwidth of its four electronic input 
channels may reach to 4nf0 bits per cycle. The bandwidth of the three optical channels 
should match with this bandwidth, otherwise, data will be jammed at the interface and the 
communication will be blocked. Assume the frequency of the optical signal is fo Hz, 
which means each optical channel has a bandwidth of fo bps, for the balanced bandwidth, 
the following equation has to be satisfied:  
4             (9) 
After the determination of bandwidth, parameters chosen for our simulation are listed 
as follows. Tab. 15 lists the parameters of electronic routers. 0 is the parameters of 32:1 
serializers and 1:32 deserializers [48] that used for data serial-parallel (S/P) and parallel-
serial (P/S) conversions. Tab. 17 lists the parameters of modulators and photo-detectors 
used for E/O and O/E conversion [23], respectively.  
Tab. 15. Parameters of electronic routers 
Frequency  250MHz 
Buffer size 4 flits 
Number of virtual channels per input port 4 
Packet size 4 flits 
Flit size 32 bits 
Channel width 32 bits 
Router pipeline stages 3 
Simulation warm-up time 1000 cycles 
Data collection time At least 10000 cycles  
 
  
 56 
Tab. 16. Parameters of serializers/deserializers  
 Serializer Deserializer 
Type 32:1 1:32 
Data Rate 8.5~11.3 Gb/s 8.5~11.3 Gb/s 
Power consumption 110mW 90mW 
Technology 65nm 65nm 
Latency  1cycle 1 cycle 
Tab. 17. Modulator and photodector Parameters  
 Modulator Photo-detector 
Power consumption 2.5pJ/bit 2.5pJ/bit 
Technology 65nm 65nm 
Latency Negligible  Negligible  
 
Tab. 18 lists the energy consumption of one flit traversing one router calculated with 
Orion 2.0 [49], with the assumption of tile size 2×2 mm2. It can be seen that the larger 
size routers consume more energy (both dynamic and static) and dynamic energy is the 
major part in the overall power consumption.  
Tab. 18. Energy dissipation in routers  
 5×5 6×6 7×7 8×8 
Energy (J/flit) dynamic static dynamic static dynamic static dynamic static 
90nm Technology 1.89E-09 4.54E-10 2.48E-09 6.26E-10 3.15E-09 8.25E-10 3.52E-09 9.37E-10 
65nm Technology 6.04E-10 2.68E-10 8.08E-10 3.79E-10 1.04E-09 5.10E-10 1.30E-09 6.60E-10 
 
5.3 Performance Evaluation 
In this paper, the simulations done on each NoC have a 1000-cycle warm-up time and 
data are collect with no less than 10000 cycles after the system reaches stable. Each 
statistical result is obtained as the average of 100 simulations. The results for two 
synthetic traffic patterns, i.e. uniform random traffic and non-uniform/localized traffic 
 57 
[45], are presented and discussed below. For non-uniform traffic pattern, we consider the 
distribution of 75% local traffic in which the destination is one hop away from the source, 
and the reset 25% traffic is uniformly distributed to the non-local cores. 
5.3.1 Throughput 
Throughput is measured as the average number of flits arriving at a processing core 
per cycle with the maximum value of 1 flit/core/cycle (with a given maximum injection 
rate of flit/core/cycle or 0.25 packets/core/cycle). It measures the capability of how much 
data can be transferred from one core to another core in a given amount of time. Once a 
network is saturated, its throughput will not increase with the injection rate. The 
maximum throughput of a system is directly related to the peak data rate that a system 
can sustain. 
Fig. 25(a) and Fig. 26(a) illustrates the average throughput of 16-core and 64-core 
NoC architectures under uniform traffic pattern, respectively. It can be seen that the 16-
core HONoC almost has the same throughput as the 4×4 Mesh-based NoC, which is well 
known as the best throughput NoC architecture. They both saturate with maximum 
throughput 0.3 flits/core/cycle. The 4-way 2×2 CMesh-based NoC saturates with 
maximum throughput 0.27 flits/core/cycle. With the increasing of network size from 16 to 
64, the maximum throughput of HONoC is dropped from 0.3 to 0.215 flits/core/cycle, 
which is higher than that of the 4-way 4×4 CMesh-based NoC (with maximum 
throughput 0.15 flits/core/cycle), but lower than that of the 8×8 Mesh-based NoC (with 
maximum throughput 0.29 flits/core/cycle).  
  
 58 
 
(a) Uniform Traffic 
 
(b) Non-uniform Traffic (with 75% Localized Traffic)  
Fig. 25. Average throughput of 16-core NoCs. 
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
A
ve
ra
ge
 T
h
ro
u
gp
u
t 
(f
lit
/c
o
re
/c
yc
le
) 
Injection Rate (flits/core/cycle) 
Mesh
CMesh
HONoC
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
A
ve
ra
ge
 T
h
ro
u
gh
p
u
t 
(f
lit
s/
co
re
/c
yc
le
) 
Injection Rate (flits/core/cycle) 
Mesh
CMesh
HONoC
 59 
 
(a) Uniform Traffic 
 
(b) Non-uniform Traffic ( with 75% Local Traffic)  
Fig. 26. Average throughput of 64-core NoCs. 
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
A
ve
ra
ge
 T
h
ro
u
gp
u
t 
(f
lit
s/
co
re
/c
yc
le
) 
Injection Rate (flits/core/cycle) 
Mesh
CMesh
HONoC
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
A
ve
ra
ge
 T
h
ro
u
gh
p
u
t 
(f
lit
s/
co
rf
e
/c
yc
le
) 
Injection Rate (flits/core/cycle) 
Mesh
CMesh
HONoC
 60 
 
Fig. 27. Average throughput of 256-core NoCs (75% localized traffic pattern). 
Fig. 25(b), Fig. 26(b), and Fig. 27 show the throughput of 16-core, 64-core and 256-
core NoCs under the non-uniform traffic pattern, respectively. It can be seen that all those 
networks have higher throughput and higher saturation point of injection rate for non-
uniform traffic than for uniform traffic. With the increasing of network size, CMesh-
based architectures benefit the most in throughput from the localized communication.  
5.3.2 Latency 
Latency is measured as the duration of time (in cycles) that a packet’s header flit 
enters NoC till its tail flit leaves it. In NoC architectures, the overall latency is governed 
by pipelined stages of routers. In our simulations, a flit takes three cycles to traverse one 
electronic/hybrid router, which includes one cycle for buffering, one cycle for routing and 
switching, and one cycle for traveling through the electronic link. The delay on optical 
channels and vertical inter-layer electronic links are negligible compared to regular 
electrical channels. But in our HONoC simulations, we assume that the delay for a flit 
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
A
ve
ra
ge
 T
h
ro
u
gh
p
u
t 
(c
yc
le
s)
 
Injection Rate (flits/core/cycle) 
Mesh
CMesh
HONoC
 61 
passing through E/O and P/S conversions is one cycle, similarly one cycle for O/E and 
S/P conversion.  
Fig. 28(a) and Fig. 29(a) illustrate the average latency of received packets in 16-core 
and 64-core architectures under uniform traffic pattern, respectively. They show that the 
average latency increases with the increasing injection rate. The first inflection point of 
the latency curve occurs when a network is close to saturation, for example, the point of 
0.4 flits/core/cycle injection rate on the 16-core CMesh latency curve (Fig. 28(a)). They 
also show that with the increasing of network size, the average latency increases, while 
the saturation injection rate becomes lower. For example, the 16-core HONoC saturates at 
injection rate 0.72 (flits/core/cycle) and the average latency at that point is about 23 
cycles, while the 64-core HONoC saturates at injection rate 0.28 (flits/core/cycle) and the 
latency at that point is about 30 cycles. Among the three types of architectures, HONoC 
has the lowest latency, and CMesh has the highest. In 64-core architectures, HONoC has 
more advantages in latency than in 16-core architectures. It can be anticipated that with 
the increasing of network size, the advantages of HONoC architectures will become more 
outstanding, which is due to the fact that the average number of hops, which shows that 
the average number of hops that a packet travels in HONoC is the smallest among the 
three types of architectures and the difference of average hops becomes bigger in larger 
sized network as shown in Tab. 13. 
Fig. 28(b), Fig. 29(b) and Fig. 30 illustrate the average latency in 16-core, 64-core, 
and 256-core architectures under non-uniform traffic pattern, respectively. They show 
that localized traffic lowers the average latency in general because the average number of 
hops for packets is reduced. CMesh architectures benefit the most among the three types 
 62 
of architectures from localized traffic distributions. HONoC still has the lowest latency 
among them. 
 
(a) Uniform Traffic  
 
(b) Non-uniform Traffic (with 75% Localized Traffic) 
Fig. 28. Latency of 16-core architectures (uniform traffic pattern). 
0
10
20
30
40
50
60
70
80
90
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
A
ve
ra
ge
 L
at
e
n
cy
 (
cy
cl
e
s)
 
Injection Rate (flits/core/cycle) 
Mesh
CMesh
HONoC
0
5
10
15
20
25
30
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
A
ve
ra
ge
 L
at
e
n
cy
 (
cy
cl
e
s)
 
Injection Rate (flits/core/cycle) 
Mesh
CMesh
HONoC
 63 
 
(a) Uniform Traffic 
 
(b) Non-Uniform Traffic (with 75% Localized Traffic) 
Fig. 29. Latency of received packets in 64-core NoCs. 
0
10
20
30
40
50
60
70
80
90
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
A
ve
ra
ge
 L
at
e
n
cy
 (
cy
cl
e
s)
 
Injection Rate (flits/core/cycle) 
Mesh
CMesh
HONoC
0
5
10
15
20
25
30
35
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
A
ve
ra
ge
 L
at
e
n
cy
 (
cy
cl
e
s)
 
Injection Rate (flits/core/cycle) 
Mesh
CMesh
HONoC
 64 
 
Fig. 30. Latency of received packets in 256-core NoCs (non-uniform traffic). 
5.3.3 Energy Dissipation  
When a packet is transported across a network, the consumed energy consumed by 
the packet consists of two parts: dynamic energy caused by switching activities and static 
energy due to leakage power. The dynamic energy to transfer a flit/packet from its source 
to its destination is the sum of dynamic energy consumed by all routers that the 
flit/packet traverses. Static energy is the total leak energy consumed by all routers during 
the simulation period dissipated on that packet. The average energy dissipation on 
received packets is expressed as follow:  
Eavg-pkt = Edyn_pkt +Esta_pkt    (10) 
Edyn_pkt =Eer_dyn ×ŋe-hops+Her_dyn_ × ŋh-hops+Eo × ŋo-hops    (11) 
         Esta_pkt= (Tsim × ∑Er_sta)/ Npkt_r    (12) 
Where, Eavg-pkt represents the average energy dissipation on received packets in the 
whole network, including average dynamic energy Edyn_pkt and average static energy 
10
15
20
25
30
35
40
45
50
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
A
ve
ra
ge
 L
at
e
n
cy
 (
cy
cl
e
s)
 
Injection Rate (flits/core/cycle) 
Mesh
CMesh
HONoC
 65 
Esta_pkt. Packet average dynamic energy Edyn_pkt consists of dynamic energy consumed by 
all routers packet transverses, including electrical router, hybrid router, and optical router. 
Packet average static energy Esta_pkt is the total static energy consumed by the network 
during the simulation time divided by the total number of received packets.  
Fig. 31 and Fig. 32 illustrate the average energy dissipation of received packets in 16-
core and 64-core NoC architectrues under uniform traffice pattern, respectively. The 
results are normazlied against the dynamic energy consumption result for Mesh-based 
architecture at 90nm technology. It is clear that with the technology scaling down, the 
power consumption reduces significantly for each architecute, as a result, the average 
energy dissipated on each received packet in the network is reduced.  
The NoC topology affects the energy dissipation in two folds: 1) different topologies 
use different size routers, 2) different routing algorithms result in different hop counts for 
each packet. It can been seen that in 16-core NoC, CMesh-based NoC consumes the least 
energy while the Mesh-based consumes the most (Fig. 31). The reason is that the 2×2 
CMesh-based NoC uses 6×6 routers and averagely data travels the least hops among the 
three topologies (shwon in Tab. 13). Mesh-based NoC uses 5×5 routers but the average 
hop acount is more than those of CMesh-based NoC and HONoC. The 16-core HONoC 
has slight higher power consumption than CMesh-based NoC, but with the increaing of 
network size, CMesh-based NoC uses 8×8 routers and the average number of hops 
increases faster than HONoC. It can be seen from Fig. 32 that HONoC has the least 
energy dissipation among those three topologies for 64-core NoCs. This tendency will 
continue for larger size networks, because the use of GWORs simplifys the network and 
reduces electrical links, with optical siginals consumes negligible energy. For the same 
 66 
topology, with the increasing of network size, the energy disspiation on each packet will 
also increase because the average number of hops that the packet traverses increases in 
larger size networks. 
 
Fig. 31. Normalized energy dissipation of 16-core NoCs (uniform traffic)  
 
Fig. 32. Normalized energy dissipation of 64-core NoCs (uniform traffic)  
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
90nm 65nm 90nm 65nm 90nm 65nm
Mesh CMesh HONoC
N
o
rm
al
iz
e
d
 E
n
e
rg
y 
D
is
si
p
at
io
n
 
o
n
 P
ac
ke
t 
@ Injection Rate=0.6 (flits/core/cycle) 
static
dynamic
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
90nm 65nm 90nm 65nm 90nm 65nm
Mesh CMesh HONoC
N
o
rm
al
iz
e
d
 E
n
e
rg
y 
D
is
sp
at
io
n
 o
n
 
P
ac
ke
t 
64-core @ Injection Rate=0.4 (flits/core/cycle) 
static
dynamic
 67 
5.4 Summary 
A cycle accurate NoC simulation platform has been developed to evaluate the 
performance of the proposed HONoC architectures. The simulation results show that 
HONoCs have higher throughput, less latency, and more power efficient than its 
counterparts in Mesh-based and CMesh-based large size NoCs. The benefits brought by 
GWOR become more obvious with the increasing network sizes. This conform that 
hybrid optoelectronic NoC is a viable on-chip interconnection solution for 
interconnecting large number of processing cores.  
  
 68 
CHAPTER 6. CONCLUSION AND FUTURE WORK 
Optical NoC is considered as an alternative interconnection solution for future large 
sized many-core MPSoCs. The benefits ONoC bringing include broadband ultra-high 
data bandwidth, negligible latency, and low energy consumption.  
This dissertation focuses on the study of high-performance, scalable ONoC 
architecture design. Below summarizes the major contributions and future work. 
6.1 Conclusion 
In this dissertation, MRR-based GWOR ONoC architectures and BFT-GWOR-based 
hybrid optoelectronic NoC architectures are proposed. The major research results are 
highlighted below: 
 Four 4×4 GWOR structures are proposed that can be used as the building blocks in 
larger sized GWORs.  
 The method for constructing any sized ONoC with 4×4 GWORs is proposed and 
wavelength assignment scheme for any size ONoC is also derived.  
 The redundant GWOR architecture built with multi-stage GWORs are proposed to 
improve the bandwidth and fault tolerance of GWOR. We also propose comb MRR-
based GWORs to improve bandwidth without changing the network structure.  
 Butterfly fat tree (BFT)-based HONoC using GWORs are proposed to take 
advantages of the high data bandwidth of optical networks and flexibility of 
electronic networks. 
 A cycle-accurate simulator developed to evaluate the performance of proposed 
HONoC architectures.  
 69 
 Simulation results are presented and confirm that HONoC achieves comparable 
network performance with the least power consumption compared with Mesh-based 
and CMesh-based architectures. The superiority of HONoC is more prominent with 
network size increasing.  
6.2 Future Work 
Further research on our proposed HONoC in future includes: 
 Performance evaluation of the proposed HONoCs under real application traces. 
 Exploration of various routing strategies to improve the throughput of HONoCs. 
 Exploration of other hierarchical hybrid NoC architectures to improve the 
performance of larger sized HONoC (256 cores or above). 
 70 
REFERENCES
[1] D. Yeh, L. Peh, S. Borkar, J. Darringer, A. Agarwal, and W. Hwu, "Thousand-Core Chips," Design 
& Test of Computers, IEEE, vol. 25, pp. 272-278, 2008. 
[2] S. Assefa, F. Xia, and Y. A. Vlasov, "Reinventing germanium avalanche photodetector for 
nanophotonic on-chip optical interconnects," Nature, vol. 464, pp. 80-84, 2010. 
[3] M. Lipson, "Guiding, modulating, and emitting light on Silicon-challenges and opportunities," 
Lightwave Technology, Journal of, vol. 23, pp. 4222-4238, 2005. 
[4] F. Xia, M. Rooks, L. Sekaric, and Y. Vlasov, "Ultra-compact high order ring resonator filters using 
submicron silicon photonic wires for on-chip optical interconnects," Optics Express, vol. 15, pp. 
11934-11941, 2007. 
[5] M. Lipson, "Compact Electro-Optic Modulators on a Silicon Chip," IEEE Journal of Selected 
Topics in Quantum Electronics, vol. 12, pp. 1520-1526, 2006. 
[6] A. Liu, "Announcing the world's first 40G silicon laser modulator," presented at the 
http://blogs.intel.com/research/, 2007. 
[7] Q. Xu, B. Schmidt, J. Shakya, and M. Lipson, "Cascaded silicon micro-ring modulators for WDM 
optical interconnection," Optics Express, vol. 4, pp. 9431-9435, 2006. 
[8] A. W. Fang, B. R. Koch, K.-G. Gan, H. Park, R. Jones, O. Cohen, et al., "A racetrack mode-locked 
silicon evanescent laser," Optics Express, vol. 16, pp. 1393-1398, 2008. 
[9] H. Park, A. W. Fang, R. Jones, O. Cohen, O. Raday, M. N. Sysak, et al., "A hybrid AlGaInAs-
silicon evanescent waveguide photodetector," Opt. Express, vol. 15, pp. 6044-6052, 2007. 
[10] L. Schares, C. L. Schow, S. J. Koester, G. Dehlinger, R. John, and F. E. Doany, "A 17-Gb/s low-
power optical receiver using a Ge-on-SOI photodiode with a 0.13um CMOS IC," in Optical Fiber 
Communication Conference, 2006 and the 2006 National Fiber Optic Engineers Conference. OFC 
2006, 2006, pp. 1-3. 
[11] Y. Tao, R. Cohen, M. M. Morse, G. Sarid, Y. Chetrit, D. Rubin, et al., "40Gb/s Ge-on-SOI 
waveguide photodetectors by selective Ge growth," in Optical Fiber communication/National 
Fiber Optic Engineers Conference, 2008. OFC/NFOEC 2008. Conference on, 2008, pp. 1-3. 
[12] A. W. Poon, L. Xianshu, X. Fang, and C. Hui, "Cascaded microresonator-based matrix switch for 
silicon on-chip optical interconnection," Proceedings of the IEEE, vol. 97, pp. 1216-1238, 2009. 
[13] N. Sherwood-Droz, H. Wang, L. Chen, G. B. Lee, A. Biberman, K. Bergman, et al., "Optical 4x4 
hitless Silicon router for optical Networks-on-Chip (NoC)," Optics Express, vol. 16, pp. 19395-
19395, 2008. 
[14] P. Dong, S. F. P. Preble, and M. Lipson, "All-optical compact silicon comb switch," Optics 
Express, vol. 15, pp. 9600-9605, 2007. 
[15] A. W. Poon, F. Xu, and X. Luo, "Cascaded active silicon microresonator array cross-connect 
circuits for WDM networks-on-chip," pp. 689812-689812, 2008. 
[16] H. Gu, K. H. Mo, J. Xu, and W. Zhang, "A Low-power Low-cost Optical Router for Optical 
Networks-on-Chip in Multiprocessor Systems-on-Chip," in IEEE Computer Society Annual 
Symposium on VLSI, 2009. ISVLSI '09, 2009, pp. 19-24. 
[17] N. Kirman, M. Kirman, R. K. Dokania, J. F. Martinez, A. B. Apsel, M. A. Watkins, et al., 
"Leveraging Optical Technology in Future Bus-based Chip Multiprocessors," in 
Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, 2006, 
pp. 492-503. 
 71 
[18] A. Shacham, K. Bergman, and L. P. Carloni, "Photonic Networks-on-Chip for Future Generations 
of Chip Multiprocessors," Computers, IEEE Transactions on, vol. 57, pp. 1246-1260, 2008. 
[19] M. Briere, B. Girodias, Y. Bouchebaba, G. Nicolescu, F. Mieyeville, F. Gaffiot, et al., "System 
level assessment of an optical NoC in an MPSoC platform," in Design, Automation & Test in 
Europe Conference & Exhibition, 2007. DATE '07, 2007, pp. 1-6. 
[20] L. Zhang, E. Regentova, and X. Tan, "A Generalized Packet Switching Optical Network-on-Chip 
Architecture," submitted to Special Issue on On-Chip and Off-Chip Network Architectures - ACM 
Transactions on Embedded Computing Systems, Feb. 2011. 
[21] L. Zhang, M. Yang, Y. Jiang, E. E. Regentova, and E. Lu, "Generalized Wavelength Routed 
Optical Micronetwork in Network-on-Chip," in Parallel and Distributed Computing and Systems, 
Proceeding of 18th IASTED International Conference, 2006, pp. 698-703. 
[22] A. Joshi, C. Batten, K. Yong-Jin, S. Beamer, I. Shamim, K. Asanovic, et al., "Silicon-photonic clos 
networks for global on-chip communication," in Networks-on-Chip, 2009. NoCS 2009. 3rd 
ACM/IEEE International Symposium on, 2009, pp. 124-133. 
[23] L. Zhou, S. Djordjevic, R. Proietti, D. Ding, S. Yoo, R. Amirtharajah, et al., "Design and 
evaluation of an arbitration-free passive optical crossbar for on-chip interconnection networks," 
Applied Physics A: Materials Science & Processing, vol. 95, pp. 1111-1118, 2009. 
[24] R. Morris and A. K. Kodi, "Exploring the design of 64- and 256-core power efficient 
nanophotonic interconnect," Selected Topics in Quantum Electronics, IEEE Journal of, vol. 16, pp. 
1386-1393, 2010. 
[25] R. Soref and B. Bennett, "Electrooptical effects in silicon," IEEE Journal of Quantum Electronics, 
vol. 23, pp. 123-129, 1987. 
[26] L. Zhang, E. Regentova, and X. Tan, "Packet Switching Optical Network-on-Chip Architectures," 
submitted to Special Issue on Emerging Computing Architectures and Systems of the 
Journal Computer and Electrical Engineering, Mar. 2011. 
[27] N. Kirman and J. F. Martinez, "A power-efficient all-optical on-chip interconnect using 
wavelength-based oblivious routing," in Proceedings of the fifteenth edition of ASPLOS on 
Architectural support for programming languages and operating systems, Pittsburgh, 
Pennsylvania, USA, 2010, pp. 15-28. 
[28] T. T. Ye, "On-chip multiprocessor communication network design and analysis," Ph.D., 
Department of Electrical Engineering, Stanford University, 2003. 
[29] A. Shacham, B. G. Lee, A. Biberman, K. Bergman, and L. P. Carloni, "Photonic NoC for DMA 
Communications in Chip Multiprocessors," in High-Performance Interconnects, 2007. HOTI 
2007. 15th Annual IEEE Symposium, 2007, pp. 29-38. 
[30] Y. Yaoyao, D. Lian, X. Jiang, O. Jin, H. Mo Kwai, and X. Yuan, "3D optical networks-on-chip 
(NoC) for multiprocessor systems-on-chip (MPSoC)," in 3D System Integration, 2009. 3DIC 
2009. IEEE International Conference on, 2009, pp. 1-6. 
[31] H. G. L. Bai, "A cluster-based reconfigurable optical network on chip design," in Photonics and 
Optoelectronics (SOPO), 2012 Symposium on, Shanghai, May 21-23, 2012, pp. 1-4. 
[32] J. Chan and K. Bergman, "Photonic interconnection network architectures using wavelength-
selective spatial routing for chip-scale communications," Optical Communications and 
Networking, IEEE/OSA Journal of, vol. 4, pp. 189-201, 2012. 
[33] D. Vantrease, R. Schreiber, M. Monchiero, M. McLaren, N. P. Jouppi, M. Fiorentino, et al., 
"Corona: System Implications of Emerging Nanophotonic Technology," in Computer Architecture, 
2008. ISCA '08. 35th International Symposium on, 2008, pp. 153-164. 
[34] C. Batten, A. Joshi, J. Orcutt, A. Khilo, B. Moss, C. Holzwarth, et al., "Building manycore 
processor-to-DRAM networks with monolithic silicon photonics," in High Performance 
Interconnects, 2008. HOTI '08. 16th IEEE Symposium on, 2008, pp. 21-30. 
 72 
[35] Y. Pan, P. Kumar, J. Kim, G. Memik, Y. Zhang, and A. Choudhary, "Firefly: illuminating future 
network-on-chip with nanophotonics," in IEEE/ACM Intl. Symp. on Computer Architecture 
(ISCA), 2009, pp. 429-440. 
[36] X. Zhang and A. Louri, "A multilayer nanophotonic interconnection network for on-chip many-
core communications," in Design Automation Conference (DAC), 2010 47th ACM/IEEE, 2010, pp. 
156-161. 
[37] J. Wang, L. Li, Y. Zhang, H. Pan, S. He, and R. Zhang, "A hybird hierarchical architecture for 3D 
multi-cluster NoC," in Computer Science & Education (ICCSE), 2011 6th International 
Conference on, 2011, pp. 512-516. 
[38] G. Huaxi, X. Jiang, and Z. Wei, "A low-power fat tree-based optical Network-On-Chip for 
multiprocessor system-on-chip," in Design, Automation & Test in Europe Conference & 
Exhibition, 2009. DATE '09., 2009, pp. 3-8. 
[39] B. S. Feero and P. P. Pande, "Networks-on-Chip in a Three-Dimensional Environment: A 
Performance Evaluation," Computers, IEEE Transactions on, vol. 58, pp. 32-45, 2009. 
[40] P. P. Pande, C. Grecu, A. Ivanov, and R. Saleh, "Design of a switch for network on chip 
applications," in Circuits and Systems, 2003. ISCAS '03. Proceedings of the 2003 International 
Symposium on, 2003, pp. V-217-V-220 vol.5. 
[41] A. W. Poon, F. X. Xu, and X. Luo, "Cascaded active silicon microresonator array cross-connect 
circuits for WDM networks-on-chip," in Proc. SPIE, 2008, p. 689812. 
[42] L. Zhang, M. Yang, Y. Jiang, and E. Regentova, "Architectures and routing schemes for optical 
network-on-chips," Computers and Electrical Engineering, vol. 35, pp. 856-877, 2009. 
[43] Z. Linjie, K. Okamoto, and S. J. B. Yoo, "Athermalizing and Trimming of Slotted Silicon 
Microring Resonators With UV-Sensitive PMMA Upper-Cladding," Photonics Technology 
Letters, IEEE, vol. 21, pp. 1175-1177, 2009. 
[44] Y. Onishi, N. Saga, K. Koyama, H. Doi, T. Ishizuka, T. Yamada, et al., "Long-Wavelength 
GaInNAs Vertical-Cavity Surface-Emitting Laser With Buried Tunnel Junction," IEEE Journal of 
Selected Topics in Quantum Electronics, vol. 15, pp. 838-843, 2009. 
[45] R. Das, S. Eachempati, A. K. Mishra, V. Narayanan, and C. R. Das, "Design and evaluation of a 
hierarchical on-chip interconnect for next-generation CMPs," in High Performance Computer 
Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, 2009, pp. 175-186. 
[46] P. Partha Pratim, C. Grecu, M. Jones, A. Ivanov, and R. Saleh, "Performance evaluation and design 
trade-offs for network-on-chip interconnect architectures," Computers, IEEE Transactions on, vol. 
54, pp. 1025-1040, 2005. 
[47] M. Palesi, D. Patti, and F. Fazzino. http://noxim.sourceforge.net/.  
[48] Available: http://www.pacificmicrochip.com/ 
[49] A. B. Kahng, L. Bin, P. Li-Shiuan, and K. Samadi, "ORION 2.0: A Power-Area Simulator for 
Interconnection Networks," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 
vol. 20, pp. 191-196, 2012. 
 
 73 
BIBLIOGRAPHY 
Xianfang Tan received the B.S. degree in electrical engineering from Yanshan 
University, Qinhuangdao, China in 1997; the M.S. degree in Electrical Engineering from 
Tianjin University, Tianjin, China in 2003 and the M.S. degree in Mechanical 
Engineering from the University of Nevada, Las Vegas (UNLV) in 2005. Currently, she is 
working toward the Ph.D. degree in the Department of Electrical and Computer 
Engineering, University of Nevada, Las Vegas. 
  
 74 
CURRICULUM VITAE
Graduate College 
University of Nevada, Las Vegas 
 
Xianfang Tan 
 
Degrees: 
Bachelor of Science in Electrical Engineering, 1997 
Yanshan University, China 
 
Master of Science in Electrical Engineering, 2003 
Tianjin University, China 
 
Master of Science in Mechanical Engineering, 2005 
University of Nevada, Las Vegas, USA 
 
Special Honors and Awards: 
  
2nd Place Winner of Best Dissertation Competition of the Howard R. Hughes College 
of Engineering, 2013 
 
Publications: 
1. X. Tan, M. Yang, L. Zhang, Y. Jiang, J. Yang, “Wavelength-routed optical networks-
on-chip built with comb switches,” submitted to IEEE Photonic Society Annual 
Meeting, 2013.  
2. X. Tan, M. Yang, L. Zhang, Y. Jiang, J. Yang, “A Generic Optical Router Design for 
Photonic Network-on-chips,” IEEE/OSA Journal of Lightwave Technology, vol. 30, 
no. 3, pp. 368-376, 2012. 
3. L. Zhang, E. Regentova, X. Tan, “Packet Switching Optical Network-on-Chip 
Architectures,” Journal Computers and Electrical Engineering, 2012. 
4. L. Zhang, X. Tan, Mei Yang, Y. Jiang, P. Liu, J. Yang, "Circuit-Switched On-Chip 
Photonic Interconnection Network," IEEE 9th International Conference on Group IV 
Photonics (GFP), Aug. 2012, San Diego, California. 
5. L. Zhang, X. Tan, M. Yang, M. Qi, T. Hu, J. Yang, "On-Chip Wavelength-Routed 
Photonic Networks with Comb Switches," IEEE 9th International Conference on 
Group IV Photonics (GFP), Aug. 2012, San Diego, California. 
6. X. Tan, L. Zhang, M Yang, Y. Jiang, “On a Scalable, Non-Blocking Optical Router 
for Photonic Networks-on-Chip Designs,” IEEE International Symposium on 
Photonoics and Optoelectronic (SOPO), Wuhan, China, May 2011. 
7. L. Zhang, E. Regentova, X. Tan, “A 2D-Torus Based Packet Switching Optical 
Network-on-Chip Architecture,” IEEE International Symposium on Photonics and 
Optoelectronics (SOPO), pp. 1-4, Wuhan, China, May 2011. 
8. X. Tan, L. Zhang, S. Neelkrishnan, M. Yang, Y. Jiang and Y. Yang, "Scalable and 
 75 
Fault-tolerant Network-on-chip Design Using the Quartered Recursive Diagonal 
Torus Topology," Proc. ACM Great Lakes Symposium on VLSI, pp. 309-314, 
Orlando, FL., USA, May 2008. 
 
Dissertation Title:  
High-Performance, Scalable Optical Network-on-Chip Architectures 
 
Dissertation Examination Committee: 
Committee Chair, Mei Yang, Ph.D. 
Committee Member, Yingtao Jiang, Ph.D. 
Committee Member, Shahram Latifi, Ph.D. 
Graduate Faculty Representative, Amei Amei, Ph.D. 
