Xtreme-NoC: Extreme Gradient Boosting Based Latency Model for Network-on-Chip Architectures by Sheriff, Ilma
Minnesota State University, Mankato 
Cornerstone: A Collection of Scholarly 
and Creative Works for Minnesota 
State University, Mankato 
All Graduate Theses, Dissertations, and Other 
Capstone Projects 
Graduate Theses, Dissertations, and Other 
Capstone Projects 
2021 
Xtreme-NoC: Extreme Gradient Boosting Based Latency Model for 
Network-on-Chip Architectures 
Ilma Sheriff 
Minnesota State University, Mankato 
Follow this and additional works at: https://cornerstone.lib.mnsu.edu/etds 
 Part of the OS and Networks Commons, Systems Architecture Commons, and the Theory and 
Algorithms Commons 
Recommended Citation 
Sheriff, I. (2021). Xtreme-NoC: Extreme gradient boosting based latency model for Network-on-Chip 
architectures [Master’s thesis, Minnesota State University, Mankato]. Cornerstone: A Collection of 
Scholarly and Creative Works for Minnesota State University, Mankato. https://cornerstone.lib.mnsu.edu/
etds/1127/ 
This Thesis is brought to you for free and open access by the Graduate Theses, Dissertations, and Other Capstone 
Projects at Cornerstone: A Collection of Scholarly and Creative Works for Minnesota State University, Mankato. It 
has been accepted for inclusion in All Graduate Theses, Dissertations, and Other Capstone Projects by an 













A Thesis Submitted in Partial Fulfillment of the  
Requirements for the Degree of 













April 9, 2021 
 
Xtreme-NoC: Extreme Gradient Boosting based latency model for Network-on-Chip 
Architectures. 
 




Advisor (Dr. Naseef Mansoor) 
 
________________________________ 
Committee Member (Dr. Rajeev Bukralia) 
 
________________________________ 

















Thanks are due first to my advisor, Dr. Naseef Mansoor, for the guidance, and technical advice 
that was provided throughout my master’s thesis, dedication, passion, and wisdom were invaluable 
during my research. He instilled the value of high-quality research in my mind, it wouldn’t be 
possible to finish my master’s without his vast expertise and all the resources which I have loaned 
from him. I learned a great deal from him.  
I would also like to thank my qualifying, departmental and final examining committee members: 
Dr. Mezbahur Rahman and Dr. Rajeev Bukralia for being part of my committee and giving 
valuable comments and feedback on my thesis.  
I am very grateful to Dr. Mahbubur Syed, for the valuable suggestions and advice, without him 
my journey through graduate school would have been very different.  
Finally, I am very grateful for the continuous support, encouragement, and patience of my parents, 
throughout all my years of education and for their unconditional love. I will be forever in debt for 




















ABSTRACT .................................................................................................................................... v 
Chapter 1 ......................................................................................................................................... 1 
Introduction ..................................................................................................................................... 1 
1.1 Network-on-Chip ................................................................................................................ 1 
1.2 Design Space Exploration Problem for NoCs .................................................................... 3 
1.3 Research Questions ............................................................................................................. 5 
1.4 Contributions ...................................................................................................................... 5 
1.5 Thesis Organization ............................................................................................................ 6 
Chapter 2 ......................................................................................................................................... 8 
Background and Related Work ....................................................................................................... 8 
2.1 Network-on-Chip Architectures and simulators ................................................................... 8 
2.2 Queuing Theory based NoC Latency Models ..................................................................... 11 
Chapter 3 ....................................................................................................................................... 14 
Design Space Exploration Dataset ................................................................................................ 14 
3.1 Dataset generation using Booksim simulator ..................................................................... 14 
3.2 Data Preprocessing .............................................................................................................. 20 
3.2.1 Data Cleaning ............................................................................................................... 21 
3.2.2 Feature Selection .......................................................................................................... 22 
3.2.3 Feature Scaling ............................................................................................................. 25 
3.2.4 Encoding Categorical Data .......................................................................................... 26 
Chapter 4 ....................................................................................................................................... 27 
Machine Learning ......................................................................................................................... 27 
4.1 Regression Models .............................................................................................................. 28 
4.1.1 Linear Regression ........................................................................................................ 28 
4.1.2 Bayesian Regression .................................................................................................... 29 
4.1.3 Support Vector Regressor (SVR) ................................................................................. 30 
3.1.4 Random Forest Regression .......................................................................................... 30 





3.1.6 Extreme Gradient Boosting .......................................................................................... 32 
4.2 Model creation .................................................................................................................... 34 
4.2.1 Training ........................................................................................................................ 34 
4.2.2 Hyperparameter Tuning using Grid search .................................................................. 35 
Chapter 5 ....................................................................................................................................... 37 
Experimental Results .................................................................................................................... 37 
5.1 Metrics for Evaluation ........................................................................................................ 37 
5.1.1 Mean Absolute Error (MAE) ....................................................................................... 37 
5.1.2 Mean Squared Error (MSE) ......................................................................................... 38 
5.1.3 Root Mean Squared Error (RMSE): ............................................................................ 38 
5.1.4 Coefficient of determination 𝑅2 .................................................................................. 38 
5.2 Performance of the Machine Learning Models .................................................................. 39 
5.2.1 Linear Regression ........................................................................................................ 40 
5.2.2 Bayesian Regression .................................................................................................... 42 
5.2.3 Support Vector Regression .......................................................................................... 42 
5.2.4 Multilayer Perceptron Regression ................................................................................ 42 
5.2.5 Random Forest Regression .......................................................................................... 43 
5.2.6 XGBoost Regression .................................................................................................... 45 
5.3 Comparative Performance Analysis ................................................................................... 46 
5.4 Speedup in Average Latency Computation ........................................................................ 48 
Chapter 6 ....................................................................................................................................... 50 
Conclusion .................................................................................................................................... 50 









Multiprocessor System-on-Chip (MPSoC) integrating heterogeneous processing elements (CPU, 
GPU, Accelerators, memory, I/O modules, etc.) are the de-facto design choice to meet the ever-
increasing performance/Watt requirements from modern computing machines. Although at 
consumer level the number of processing elements (PE) are limited to 8-16, for high end servers, 
the number of PEs can scale up to hundreds. A Network-on-Chip (NoC) is a microscale network 
that facilitates the packetized communication among the PEs in such complex computational 
systems. Due to the heterogeneous integration of the cores, execution of diverse (serial and 
parallel) applications on the PEs, application mapping strategies, and many other factors, the 
design of such NoCs play a crucial role to ensuring optimum performance of these systems. Design 
of such optimal NoC architecture poses a performance optimization problem with constraints on 
power, and area. Determination of these optimal network configurations is carried out by guided 
(heuristic algorithm) or unguided (exhaustive search) algorithms to explore the NoC design space. 
At each step of this design space exploration, a network configuration is simulated for 
performance, area and power for a wide range of applications. A system level modeling is required 
to conduct these simulations to accurately captures the timing behavior, energy profile, and area 
requirements of the network. Based on the accuracy of the network model, network configuration, 
and application running on the system, these simulations can be extremely slow. For example, 
running an open source NoC simulator like Bookism 2.0 for a small system containing 8 cores 
takes around 43.45 seconds on a 2.5 Ghz Dual-Core Intel Core i5 8 GB 1600 MHz DDR3 machine 
configuration. An alternative, to such network simulation is to use analytical network models 





M/G/1/N, or G/G/1 queue. Such analytical models provide good estimation of network 
performance like latency only under certain assumptions i.e.: a Poisson process for the network 
traffic with an exponential packet service time, and an exponential distribution for packet length. 
Unfortunately, these assumptions are not guaranteed for real application-based traffic patterns, and 
the accuracy of the analytical models are disputable. Hence, an accurate NoC performance model 
with accelerated runtime is required to ameliorate the slow design space exploration process of 
NoC architectures. 
To accelerate the design space exploration, in this thesis, we propose Xtreme-NoC, an extreme 
gradient boosting based NoC latency model. To design such model, we use an accurate system-
level simulator (Booksim 2.0) to generate the dataset of NoC latency. To contrast our proposed 
model with existing machine learning algorithms, we present a comparative study among different 
regression models to predict the latency of the NoC architectures. We also compare the results of 
the proposed NoC model against the latency from system level simulations. Based on our study, 
we conclude on the following: 
1. Our proposed Xtreme-NoC, outperforms other machine learning regression models such 
as linear regressor, Support Vector Regressor, and deep neural network for predicting the latency 
of NoC architectures. 
2. The Xtreme-NoC model can predict the latency of a NoC architecture with a root mean 
square error of 5.077 cycles and r-squared value of 96.16%. 









Multi-core processors are deemed to be the de-facto design choice for meeting performance/Watt 
needs of the highly computational demands applications from the field of astrophysics, deep 
learning, biology etc. As projected by the ITRS road map [1], this demand is going to increase 300 
times by the end of this decade which will necessitates the integration of thousands of cores on the 
same processor chip. To meet these requirements, modern consumer and server end processor 
chips are moving toward that direction. For example, many of the Intel 9000 series Xeon 
processors consists of 28 computing cores, AMD EPYC 7002 series consists of 64 cores [4]. To 
ensure high performance out of these processors, chip designers not only need to overcome the 
challenges in core design but also investigate novel interconnection opportunities in designing on-
chip interconnect fabric that enables efficient and low power communication between the cores.  
This chapter provides an overview of the transition from uni-core processors to multicore 
processors, the Network-on-Chip (NoC) interconnection paradigm, the NoC design space 
exploration problem, and the contribution of this article.  
1.1 Network-on-Chip 
Performance improvement for processors by increasing the operating frequencies and decreasing 
transistor size following Moore’s law is no longer feasible due to the higher power dissipation. 
This resulted in a paradigm shift to multicore processors where multiple smaller processing cores 
operating at lower frequencies are integrated on the same die [2]. Intel's 80 core Polaris [3] and 48 
core Single-chip Cloud computer [4] (SCC), Tilera's 64 core Tile 64 [5] are some mentionable 





on same die introduces the requirement for a communication infrastructure that enables the threads 
from parallel applications running on these cores to exchange information (data, or control 
messages). Traditional bus-based interconnection fabrics like the ARM AMBA [7] and the IBM 
CoreConnect [8] are only compatible for a small number of cores and not scalable. This has led 
chip designers to explore new global interconnection architectures, giving rise to the Network-on-
Chip (NoC) paradigm.  
Due to the scalable nature of the NoCs, integration of hundreds of cores on the same die is feasible 
without significantly compromising the application performance due to communication. Here the 
term core is used as an abstraction for CPU, Accelerators, Memory, I/O and ASIC blocks. 
Furthermore, NoC is a plug-and-play interconnection fabric, which decouples the design of 
computational cores from the interconnection network. Traditional NoC design requires the cores 
to be equipped with Network Interface (NI), which packetizes the incoming byte streams from the 
cores and vice versa. Once the packets are injected in such microscale networks, they can traverse 
to the destination cores through a series of NoC routers that are interconnected via short global 
interconnects following specific topology. To ensure lower buffer overheads at the NoC routers, a 
wormhole switching is adopted, which breaks down a packet into fixed length flow control units 
or flits. The size of the flits also defines the number of parallel interconnects between the NoC 
 





switches for data communication. Only the first flit a.k.a header contains routing information that 
helps to establish a path from the source to destination, which is subsequently followed by all the 
other payload flits. This enables a pipeline communication between neighboring NoC routers and 
reduces the overall communication delay. Among different topologies, the mesh topology is 
widely adopted for industrials NoCs [54] due to their modular design and ease of testing. Figure, 
1 demonstrates the mesh topology where each NoC router (apart from the ones in the edge) is 
connected to a processing core and also connected to four other NoC routers in the NSEW (North 
South East West) direction.  
1.2 Design Space Exploration Problem for NoCs  
In any architectural design the first step involves determination of the architectural parameters that 
ensure the required performance, energy, and area goals. Although manufacturing parameters e.g., 
technology parameters, floor planning etc. affects the performance of the overall chip, during 
architecture definition phase many of these parameters are unknown. Computer architects depends 
on their intuitions to determine the value for different architectural parameters alongside with 
system level simulation models for the target performance, energy, and area. These architectural 
parameters together constitute the design space for that system. A design space exploration (DSE) 
refers to the systematic analysis and pruning of unwanted design points based on parameters of 
interest.  
To ensure optimum performance (e.g., communication latency) of a NoC meeting the design 
constraints (e.g., area and power), computer architects carry out accurate system level modelling 
and simulation. As these systems are extremely complex and the behavior is very dynamic, an 
exhaustive design space exploration for the NoC architecture is required after modelling. The 





increases exponentially with the increase in architectural parameters, due to the slow cycle 
accurate simulation. For example, some of the major architectural parameters for a NoC is 
summarized in table 1. 
An alternative to the simulation-based approach is to design the performance models using 
classical queuing models where the NoC router is modelled as an M/M/1, M/G/1/N, or G/G/1 
queue. Indeed, these models provide good estimations when the following assumptions hold: 1) 
The packet length satisfies an exponential distribution, and therefore the packet service time in the 
router is exponentially distributed as well. 2) The traffic is assumed to follow a Poisson distribution 
at all traffic sources. However, in real applications, these assumptions may not hold, and the 
accuracy of the analytical model is compromised. Another approach is to utilize the benefits of 
machine learning models where some of the constrains can be relaxed. However, generation of 
data for machine learning model can be very challenging. However, the system can be evaluated 
at a few important points and then the rest of the points in the design space can be generated using 
Table 1: NoC DSE Parameters 
Category Parameter 
Interconnect Type of interconnect 
Bandwidth 




Application Communication Pattern 
Communication Probability 
Flits per packet 
Router Number of Ports 
Number of VCs per Port 
Number of Buffers per VC 
Routing Algorithm 
Congestion Awareness 
Buffer Width (bits) 






generative models. Furthermore, once the model is trained, evaluation is significantly faster 
compared to simulation-based model. In this thesis, we present the design of such a machine 
learning based model for evaluating the performance of NoC architectures to speed up the DSE 
process.  
1.3 Research Questions  
The research question for this research is to determine the applicability of machine learning based 
techniques to determine the performance parameters for a NoC architecture. However, as the 
design space for NoC architectures is massive due to large number of parameters, in this work we 
generate a dataset by considering some important parameters from different categories described 
in table 1. These parameters are then used to generate the data to train and test different predictive 
models. The specific research questions for this research are given below: 
1. How to efficiently generate a dataset that efficiently represents the design space of NoC 
architectures and captures the variations in NoC performance? 
2. What machine learning technique is the best fit for designing the NoC performance 
models? 
3. How much improvement in the execution time do we get with such models compared to 
system level simulations? 
1.4 Contributions 
The main contribution of this thesis is summarized as below:  
1. Dataset generation for NoC performance with design space parameters: Due to the 
massiveness of the design space, it is difficult to determine an exhaustive dataset that 





represents the design space and captures the variation in performance. In this thesis, we 
present an approach to generate such representative dataset.  
2. Comparative study between machine learning models to predict NoC performance:  In 
order to predict the NoC performance, in this thesis we evaluate several regression models; 
Linear regression, Support Vector Regression, Neural Network, Random forest 
Regression, and XGBoost regressor. Then based on a comparative study we propose a 
XGBoost based machine learning model for NoC performance evaluation. 
3. Comparative study between machine learning model and simulation models: In order to 
study the efficiency of proposed model, we compare the prediction of the model with cycle 
accurate simulation. We also determine the performance benefit (in terms of execution 
time) of using such model over simulations. 
1.5 Thesis Organization 
The thesis is organized in the following order: 
Chapter 1: In chapter 1, we introduce the paradigm of NoC and discuss about the design space of 
NoC. We also present the specific research questions that we address in this research as well as 
the contribution of this thesis. 
Chapter 2: In chapter 2, we provide a brief discussion on the related research for NoC 
architectures and NoC simulators. We also discuss the different queueing theory based analytical 
models proposed for evaluating the performance of NoC architectures. 
Chapter 3: In chapter 3, we discuss about the dataset generation process for the machine learning 
model. We also present the discussion on analyzing this data, cleaning the data, encoding the 





Chapter 4: In chapter 4, we present the background on different machine learning algorithms for 
regression problems. We also discuss about the process for model tuning and model selection, 
Chapter 5: In chapter 5, we present the comparative study for different predictive models. We use 
the RMSE and R2 to compare the models. We also show the speedup in the NoC DSE process 
achievable through such machine learning based NoC performance models. 








Background and Related Work 
In this section, we first discuss about different NoC architectures and simulators to provide an 
overview on the numerous design options as well as how to determine the performance for these 
architectures. Following this, we discuss about different NoC latency models existing in literature. 
Based on the literature review, these models can be classified in two broad categories - 1. Queueing 
Theory based models, 2. Machine learning based models. In the following subsections we discuss 
about these models in details.  
2.1 Network-on-Chip Architectures and simulators 
Several multicore processors have been designed and taped out from both academia and industry. 
MIT RAW [45] processor with 16 processing tiles where each tile contains 8-stage in-order single 
issue, 4-stage issue FPU, cache and router fabric. Tilera’s Tile 64 [46] is another example of such 
multicore processors, that contains 64 processing tiles connected using a Mesh topology. 
Traditionally a mesh [47] based NoC architecture is used for the ease of modular design, 
fabrication and testing. In a mesh architecture each NoC router is connected to a core tile and other 
routers in the seminal directions (North, South, East, and West). However, due to the multi-hop 
nature of such architecture alternative architectures like Torus, Folded Torus, BFT have been also 
proposed [48]. Figure 2 demonstrates these topologies. In the figure each black block represents a 
NoC router, a white block represents a processing element, and the links represents the connection 
between the routers. In [49] authors proposed an irregular NoC architecture containing many short 
and few long links resembling the small world graphs. However, introduction of long links to 





due to the pipelined NoC links. To ameliorate this, architectures utilizing novel interconnect 
technologies like RF-Interconnect [50], Wireless interconnect [51] [52], Photonic Interconnect 
[53] and 3-D integrations [54] has been proposed in research. The performance of these 
architectures varies due to the topology, router microarchitecture (i.e., router pipeline, number of 
ports, number of VCs per port), routing mechanism, switching mechanism, interconnect bandwidth 
etc. Moreover, the performance of the same architecture also varies due to the application traffic. 
To evaluate the performance latency, power, overhead of these architectures several simulators 
have been proposed. The simulators model each component of a NoC and using cycle accurate 
simulation or even triggered simulation determine the performance of a NoC architecture for 
different traffic patterns. Furthermore, the traffic patterns are highly application dependent and 
Although many of the simulators can evaluate the performance simple wired NoC architectures, 
some of them are very specific to interconnect technologies and architectures designed utilizing a 
particular interconnect technology. Booksim [55] was one of the first NoC simulators that provided 
a wide range configuration options like network topology, network size, buffer size, routing 
algorithm, flow control mechanism, traffic pattern etc. to evaluate the performance of NoC 
architectures using cycle accurate simulation. To capture the ongoing progress with NoC 
architecture this simulator was updated to Booksim 2.0 [56]. Similar to Booksim, Noxim [57] is 
another NoC simulator implemented in SystemC. However, Noxim also provided support for 





Recent developments have shown the possibility of leveraging silicon nanophotonic technologies 
for chip-scale interconnection fabrics that deliver high bandwidth and power efficient 
communications both on and off chip. Since optical devices are fundamentally different from 
conventional electronic interconnect technologies, new design methodologies and tools are 
required to exploit the potential performance benefits in a manner that accurately incorporates the 
physically different behavior of photonics. We introduce PhoenixSim, a simulation environment 
for modeling computer systems that incorporates silicon nanophotonic devices as interconnection 
building blocks. PhoenixSim has been developed as a cross-discipline platform for studying 
photonic interconnects at both the physical-layer level and at the architectural and system levels. 
The broad scope at which modeled systems can be analyzed with PhoenixSim provides users with 















system performance. Here, we describe details about the implementation and methodology of the 
simulator, and present two case studies of silicon nanophotonic-based networks-on-chip. 
SICOSYS [5] is a general-purpose interconnection network simulator. It helps in modeling 
message routers using traffic pattern, applied load and message length as input parameters. Orion 
[10] is a set of architectural power model for On-Chip interconnection routers. It can be used to 
estimate Network-on-Chip area and power consumption accurately in early design phases. an 
early-stage architectural power model for NoCs, was originally proposed and released in 2002, 
and has since been fairly widely used in academia and incorporated into industry toolchains.  
2.2 Queuing Theory based NoC Latency Models 
Comparison between the proposed latency model and lists of the previous latency models that 
derived based on using queuing theory. Queue type Arrival Service Buffer size Reference M/M/1 
Poisson Exponential Infinite [44] M/G/1 Poisson General Infinite [45] G/G/1 General General 
Infinite [46] M/G/1/K Poisson General Finite [47] G/G/1/K General General Finite [48] M/M/1/K 
Poisson Exponential Finite Proposed in this work Many latency models have been proposed in 
recent years to estimate the latency of NoC architectures. In [49] authors have proposed a machine 
learning technique based NoC latency model called SVR-NoC. Although, such work is unique and 
shows promising results, the large training set required to precisely calculate the latency for 
different NoC architectures and traffic patterns is difficult to generate. Consequently, many of the 
NoC latency models are based on queuing theory. These works can be broadly classified in two 
categories: i) infinite buffer capacity queuing systems and ii) finite buffer capacity queuing 
systems. For example, in [44], [45], and [46], authors proposed NoC latency models considering 
M/M/1, M/G/1, and G/G/1 queues respectively with infinite buffer capacity. However, the number 





constraints. On the other hand, authors in [47] and [48] proposed M/G/1/K queue and G/G/1/K 
queue-based latency model with finite buffer capacity. However, in the NoC routers, the arrival 
and departure of packets follows a Poisson distribution [50]. Consequently, the service time in the 
NoC routers should follow an exponential distribution as the time interval between Poisson events 
are characterized to be exponential. Hence, assuming general distribution for the service time will 
result in limiting the accuracy of the latency model. Therefore, to accurately capture the NoC 
latency, in this paper, we propose an analytical model based on M/M/1/K queuing systems. A 
comparative difference between the latency model proposed in this work and existing works is 
presented in Table 2. Our proposed latency model is formulated based on the following: Poisson 
distribution of the arrival rate of flits, exponential distribution of router service rate, and finite 
buffer size. Then, using this latency model we propose a framework for evaluating the speedup of 
multi-core systems. 
2.3 Machine Learning model based NoC Models 
The learning algorithms in NoC performance modeling are focused on optimizing model accuracy. 
A NoC router power and area regression model were developed using the multivariate adaptive 
regression splines technique in [58]. When compared to the ORION2.0 [59], the learning-based 
model increases prediction accuracy over a wide range of NoC implementations. In [59] authors 
used SVR and ANN for evaluating performance parameters of Network-on-Chip architectures and 
the built ML model predicted performance parameters with an accuracy of 90 - 95 percent. [59]. 
The execution time suggested system demonstrated a minimum speedup of 1500 over the Booksim 
simulator [60]. To increase the overall efficiency of three dimensional NoC architectures, a robust 
design optimization approach was suggested by [61]. The advantages of small-world networks and 





average the configured 3D small-world NoC reduced EDP by 35% relative to traditional 3D Mesh 
[61]. Considering all of the related work, the studies has been carried on various aspects of NoC 
using ML and the majority of the work focused on linear regression, multivariate adaptive 
regression splines and support vector regression. Models such as boosting algorithms, other 








Design Space Exploration Dataset  
To build a machine learning model to predict the latency of different NoC architectures, a 
representative dataset is required that captures the latency variation for different the NoC 
architectures. This chapter describes the process for generating the dataset used for training the 
machine learning model. An overview of the data generation process is outlined in Figure 3. We 
used a cycle accurate simulator to generate the data to train the model. We adopted the Booksim 
simulator for evaluating the performance of NoC architectures and generate the dataset for the 
machine learning model. The following sections discusses the detailed flow for the dataset 
generation and preprocessing steps used in this work. 
3.1 Dataset generation using Booksim simulator 
In this work, a cycle accurate NoC simulator, Booksim [63] has been used to evaluate the 
performance (i.e., Latency) of the NoC architectures. This simulator was used to generate the 
performance graphs in the textbook “Principles and Practices of Interconnection Networks” by 
Dally and Towles [62]. BookSim has not only been used for simulating NoC architectures, but 
also used in studying many different aspects of network design like router microarchitecture, 
routing algorithms, flow control, quality-of-service (QoS). Due its widespread usage in NoC 
research, full system simulators like GPGPU-sim [64], uses Booksim to model the on-chip 
interconnection network. Despite of its primary purpose of modelling on-chip interconnection, 
BookSim is a generic network simulator which has been adopted in simulating networks found in 
large-scale supercomputers and many-core processors. Versatile adoption of this simulator is 





rapid sweeps of the network design space. Moreover, Booksim is an open-source simulator [63] 
which facilitates rapid modelling of new NoC components and evaluate the performance benefit 
of the new component by integrating it with existing NoC components.  The Booksim simulator 
requires a configuration file to simulate a NoC architecture. This configuration file captures 
different parameters of a NoC architecture like topology, routing algorithms, flow control 
mechanism, as well as simulation parameters like traffic pattern, injection rate, performance metric 
etc. A sample config file is shown in Figure 4. By generating different configuration files and 
running simulation based on this configuration file, different NoC architectures can be evaluated 
for performance. Once the simulation is complete, the simulation data and detailed performance 
metrics are displayed in the terminal window. 
In this work, we adopted a three-step process for generating the dataset for the machine learning 
models. At the first step, we generate different configuration files for network simulation using a 
python script. A dataset that captures the entire NoC design space, will require latency values for 
different NoC configurations. However, generating such dataset by varying this high number of 
parameters is infeasible. Hence, in this work we limit our design space to five most important NoC 
parameters which is then evaluated for various traffic patterns and injection load. By varying the  
 
 
























































these configurations are used by the Booksim simulator to provide the latency value for different 
NoC configurations. The NoC parameters along with the values for the simulation parameters used 
in this work are listed in Table 2. At the second step, a python script is used to run the simulations 
for these different configuration files generated in step 1. Each simulation has three basic phases: 
warm up, measurement, and drain. The length of the warmup and measurement phase is a multiple 
of a basic sample period. The overall throughput is determined by the lowest throughput of all the 
destination in the network, but the average throughput is also displayed. After the warmup period 
has passed; the simulator points the warmed-up message and resent all the simulation statistics. 
Once the sample period has passed, all the measurements packets are drained from the network 
Table 2: Configuration Parameters used in this work 
Parameter Values 
Topology Type Mesh, and Torus 
Network Size 2	 × 	2, 4 × 	4	, 8	 × 	8, 16	 × 	16 
Traffic Pattern Uniform,  
Bit complement,  




Random permutation,  
Transpose 
Buffer Size 2, 4, 8, 16 
Number of Virtual Channels 2, 4, 8, 16 
Routing Algorithm Dimension Order Routing,  
Valiant’s Randomized Routing,  
Minimal Adaptive Routing,  
Randomized, Oblivious, Multi-phase, 
Minimal (ROMM) Routing 
Sample Period 1000 cycles 
Injection rate  .001 to 0.81 with increment of 0.001 






before the final latency are reported. These simulation results are dumped into an output file and 
all the configuration parameter values are used to name this file so that they can be later used to 
 
Figure 5: Sample Output file for the Booksim simulator 
 %==================================== 
% Average latency = 52.4137 
% Accepted packets = 0.12 at node 4 (avg = 0.286031) 
% latency change    = 1 
% throughput change = 1 
% Average latency = 53.2074 
% Accepted packets = 0.141 at node 4 (avg = 0.293992) 
% latency change    = 0.0149163 
% throughput change = 0.0270787 
% Warmed up ... 
%================================= 
% Average latency = 53.9576 
% Accepted packets = 0.14 at node 39 (avg = 0.299078) 
% latency change    = 0.0139033 
% throughput change = 0.0170054 
%================================= 
% Average latency = 55.3827\n%  
% Accepted packets = 0.165 at node 52 (avg = 0.299703) 
% latency change    = 0.0257317 
% throughput change = 0.0020854 
%================================= 
% Average latency = 55.2587 
% Accepted packets = 0.201 at node 52 (avg = 0.296036) 
% latency change    = 0.00224375 
% throughput change = 0.0123859 
%================================= 
% Average latency = 55.4723 
% Accepted packets = 0.225 at node 36 (avg = 0.29493) 
% latency change    = 0.00385129 
% throughput change = 0.00375266 
%================================= 
% Average latency = 54.9989 
% Accepted packets = 0.22 at node 52 (avg = 0.294925) 
% latency change    = 0.00860693 
% throughput change = 1.58939e-05 
%================================= 
% Average latency = 55.4756 
% Accepted packets = 0.226667 at node 36 (avg = 0.296484) 
% latency change    = 0.0085921 
% throughput change = 0.00525955 
… 
 
%Draining all recorded packets ... 
%Draining remaining packets ... 
 
====== Traffic class 0 ====== 
Overall average latency = 55.7986 (1 samples) 
Overall average accepted rate = 0.296895 (1 samples) 






produce the required dataset for the machine learning model. The content of an output file for a 
simulation with 64 core 2D Mesh is shown in figure 5.  
At step three, another python script is used to read the output files and extract the average latency 
value from the output files. As the file name contains all the configuration parameters used to run 
the simulation, the average latency value along with these configuration parameter values are then 
used to generate csv file for the dataset. Hence, in our generated dataset the parameters topology, 
network size, network dimension, routing algorithm, buffers per VC, number of VCs per port, 
Table 4: Description of the columns in Table 3 
Features (used variable names) Description 
Topology (topology)  Mesh, Torus topology elaborate structure of 
point-to-point connections where the nodes 
are interconnected. 
K (network_size) k-parameter determines the network’s radix. 
n (size) n-parameter determines the network’s 
dimension. 
Traffic The communication pattern between the nodes 
in the network.  
Injection rate (injection_rate) Determines the rate that packets are injected 
into the network by a node. Injection rate 
range varies for different traffic. 
VC (VC_buffers) Number of virtual channels per physical 
channel  
Buffer Size (VC_buffer_size) The depth of each virtual channel in flits. 
Routing Algorithm (routing) Denotes how packets are routed from a node 
to other node. 
Average Latency (Target 
Variable) 
The average time for a network to travel from 
one designated point to another.  
 












mesh 8 2 0.3 neighbor 2 4 romm 55.7986 
mesh 2 2 0.48 uniform 8 2 dim_order 782.086 
mesh 4 2 0.172 uniform 16 16 dim_order 36.469 






traffic pattern and injection rate are used as predictors and the performance metric average latency 
is the target variable, y. A few rows of this generated dataset are shown in Table 3. The description 
of these variables in the dataset is discussed in Table 4.  
3.2 Data Preprocessing  
Data preprocessing is required during the implementation of machine learning algorithm, since 
different models have different requirements to the predictors, and data preparation can affect 
predictive performance. Data preprocessing aims to clean data to a point where the data contains 
less bias and more variance. We also remove predictors that are highly correlated between 
themselves at this step. To reduce the skewness of the data, transformations like box-cox or log 
transformation are also performed during data preprocessing. Also, categorical variables are 
encoded, before they can be used for training the machine learning models. Figure 6 shows the 
overview of the data preprocessing steps used in this work. After collecting the data through NoC 
simulations, we process the data to remove any outliers and missing values, select the important 
features, and encode the categorical data. To perform these preprocessing tasks, we have used 
Python data manipulation library pandas and machine learning library scikit learn. In this section, 
 
Figure 6: Overview of Data Preprocessing Steps 
Raw Data (data collected through simulation)
Data Cleaning (Handling missing values, outliers)
Feature Selection (Select features that has high 
correlation with the target variable)
Feature Scaling (Standardize the distribution of all 
features)






we discuss different data preprocessing steps performed in this work before the data is used to train 
the machine learning models.  
3.2.1 Data Cleaning 
Our initial dataset consisted of total 29750 instances with 8 predictors and one target variable. At 
the first step of data cleaning, we check the average latency column in our dataset for missing 
values. We impute the rows in the dataset that had missing average latency values. Due to 
removing these rows, we lose some information about the NoC latency. However, methods like 
median or mean replacement are not suitable in this case, as such replacement can significantly 
bias the dataset and resulting in a high prediction error.  
In the next step of data cleaning, we deal with the outliers. In our dataset, we have both qualitative 
and quantitative variables. Furthermore, we have both discrete and continuous types of quantitative 
variables in our dataset. The descriptive statistics for the dataset are provided in Figure 7. From 
this figure we can see that the maximum value for the average latency is very high compared to 
the mean resulting in a right skewed distribution for the average latency. This can be also observed 
from the boxplot showing the distribution of the average latency in Figure 8. Such high average 
latency values capture the network latency after saturation. After saturation, the network latency 
 





increases exponentially with increasing packet injection rate. While designing NoC architectures, 
it is very much intended to explore parameters for the architectures that operate below the network 
saturation. Hence, such high values for the average latency are considered as outliers for the 
latency data. To detect the outlier values for average latency, we have adopted a Inter Quartile 
Range (IQR) based outlier detection methodology in this work. In IQR based outlier detection, 
any value outside the Q3 + 1.5 IQR boundary is an outlier. After detecting the outlier, we remove 
those rows from the dataset.  
3.2.2 Feature Selection  
Feature selection is an approach to capture important features for use in the implementation of the 
machine learning model to speed up the training time, enhance the learning interpretability and 
reduce the model over-fitting when there are many irrelevant features providing no more useful 
information than the current subset of variables. The irrelevant and redundant information in the 
dataset may greatly affect the performance of the regression model. Feature selection can be 
divided into three main categories: the filter model, the wrapper model and the embedded model. 
The filter model relies on a proxy measure (for example – mutual information, Spearman 
 





correlation coefficient, significance test) to select some features in the original variables without 
any additional learning model on the training dataset. However, the wrapper model requires a 
specified predictive model for each new subset and uses the error rate of the model to score, and 
the subset with best performance is selected out. Since each subset is used to build the predictive 
model, it is much more computationally intensive than the filter model [65]. As is implied by the 
name, the embedded model conducts the feature selection as a part of the predictive modelling 
process.  
In this work, we have adopted the filter-based feature selection approach. We calculate the 
Pearson’s correlation coefficient between each quantitative input variable and the target variable. 
The heatmap for the correlation is shown in Figure 9. From the figure, it can be observed that 
network size, injection rate has a linearly positive relationship with the target variable average 
latency.  These scores are used as the basis to filter the features. Another important step in a 
multiple regression analysis is to ensure that the assumption of no multicollinearity has been met. 
 





Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple 
regression model are highly correlated. To check for multicollinearity, we calculate the Variance 
Inflation Factor (VIF) which assesses how much the variance of an estimated regression coefficient 
increases if the predictors are correlated.  If no factors are correlated, the VIFs will all be 1 and 
higher VIF value denotes collinearity with other factors. Table 5 shows the VIF scores for different 
features, for multicollinearity of the input variables. The VIF score for network dimension is high. 
Hence, we remove this column from the dataset before using the data to train the machine learning 
models.  
For filter the qualitative or categorical variables like topology, traffic, and routing algorithm, we 
perform a one-way ANOVA test to determine the association of the variable with the target 
variable. The p values for the ANOVA test are shown in Table 6. In all cases the p-value is less 
than the α value of 0.05. Hence, we will use all these categorical variables for our machine learning 
model.  
Table 5: Variance Inflation Factor (VIF) for quantitative variables in the dataset 
Feature VIF 
Network_size 1.145020 
Network Dimension 10.099814 
Injection_rate 1.52119 
VCs 1.009420 
Buffer size 1.012964 
 










3.2.3 Feature Scaling 
To transform the numerical feature variables to comparable scales, we standardize the variables to 
equalize the range and/or variability. We used the Standard Scaler standardizes that subtracts the 
mean and then scales to unit variance as shown in the following equation [66]. 
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑖𝑧𝑎𝑡𝑖𝑜𝑛	𝑋4 = !"#
$
     (1) 
Where 𝜇	is mean of the feature and 𝜎 is standard deviation of the feature value. This results in a 
distribution with unit standard deviation and as variance is the square value of standard deviation, 
the variance of the distribution is also 1 [66].  
 





3.2.4 Encoding Categorical Data 
We convert the categorical features to numeric values and to improve the efficiency of the model 
we have used the One-Hot Encoding (OHE) technique for encoding. One hot encoding is a process 
where several new features are created based on the number of unique values for the categorical 
variable. The value for these newly created features for a categorical variable can be either 0 or 1 
representing the presence or absence of a particular value for that category. This is performed using 
the OneHotEncoder API of scikit learn. Figure 10 depicts this transformation of the columns after 


















Machine learning is a field within computer science that enables computer systems to perform 
tasks by learning patterns in datasets. The computer system can then use the learned knowledge to 
perform the same task on the unseen data [67]. For the majority of machine learning problems, the 
learning process can be roughly divided into two branches, namely supervised learning and 
unsupervised learning [67] In supervised learning the data samples provided for training come 
with the correct associated output, also denoted as labels. If the provided labels consist of discrete 
values, the problem is denoted as a classification problem. Similarly, if the labels instead consist 
of continuous values, the problem is denoted as a regression problem. In contrary, the machine 
learning algorithms that use unlabeled data sets are called unsupervised learning methods. In 
unsupervised learning, the learning process relies entirely on the provided data only, with no 
external knowledge. Typical problems solved by unsupervised learning methods are clustering, 
outlier detection, dimensionality reduction	 and	 association.	Machine learning algorithms are 
generic and can be adapted to many different problem domains. Therefore, the choice of algorithm 
depends heavily on the dataset being used. Therefore, there are several ways to alter an algorithm 
during the learning process to achieve satisfactory performance. [67].	
In this this study, we will only consider supervised learning because the dataset used in this work 
includes the ground truth and can be used during training. Furthermore, as our target variable is a 
continuous numeric variable, we experiment with different regression models and evaluate their 
performance in predicting the target variable. We then compare the performance of these 





variable. This whole flow is shown in Figure 11. The figure also shows the different machine 
learning models used in this work. The following sections discusses the theory for various 
regression models used in this work followed by the model creation process.  
4.1 Regression Models 
In this section, we discuss the various regression models used in this work. 
4.1.1 Linear Regression 
Linear regression finds the relationship between one variable X and that of a second variable y. 
For example, if X increases and y also increases or as X increases, y decreases. Correlation is 
another way to measure how to two variables are related. Simple linear regression estimates 
exactly how much y will change when X changes by a certain unit. With the correlation coefficient, 
the variables X and y are interchangeable. With the regression, the predicted value 𝑦9 is calculated 
using the following equation [68], 
𝑦%: = 	𝛽&<+		𝛽'<𝑥(     (2) 
 




















Here 𝛽&< is intercept and 𝛽'< it the slope of 𝑥(. The general term coefficient is reserved of 𝛽'<, the y 
variable is known as the dependent variable or target since it depends on 𝑥(. The 𝑥( variable is 
known as independent variable or feature vector.  
Linear regression aims to find the best line to predict the response. The regression line is the 
estimate that minimizes the sum of squared residual values RSS given by the following 
equation.[68] 
𝑅𝑆𝑆 = 	∑ (𝑌( − 		𝑌))(*'
+    (3) 
Here yi is the actual value and 𝑦%:  is the predicted value. An optimization algorithm such as gradient 
descent is used to find the values for the parameter 𝛽& and 𝛽'. 
4.1.2 Bayesian Regression 
Bayesian Regression formulates using probability distributions rather than point estimates. The 
output of the model is obtained from a probability distribution. The model for Bayesian with the 
response sampled from a normal distribution is 𝑦	~	𝑁(𝛽,𝑋, 𝜎+𝐼). The target y is generated from 
the normal distribution characterized by a mean and variance. It aims to determine the posterior 
distribution for the model parameters. Not only is the response generated from a probability 
distribution, but the model parameters are assumed to come from a distribution as well. The 
posterior probability of the model parameters is conditional upon the training inputs and outputs 
and can be calculated using the following equation 
𝑃(𝛽|𝑦, 𝑋) = -(/|1,3)∗	-7𝛽8𝑥9
:7𝑦8𝑥9      (4) 
𝑃(𝛽|𝑦, 𝑋) represents the posterior probability distribution of the model parameters g. This is equal 
to the likelihood of the data, 𝑃(𝑦|𝛽, 𝑋) multiplied by the prior probability of the parameters and 





4.1.3 Support Vector Regressor (SVR) 
Support Vector Machines belongs to the area of supervised learning methods and therefore need 
labeled known data to classify new unseen data. The basic approach to classify the data, starts by 
trying to create a function that splits the data into corresponding labels with (a) the least possible 
number of errors or (b) with the largest possible margin. Support vectors are the data points closer 
to the hyperplane. It supports linear, nonlinear – regression (SVR), classification and detects 
outliers. It aims to find a hyperplane in a n-dimensional space, where n is number of features. It 
tries to fit as many instances as possible on the street while limiting margin violations. The width 
of the street is controlled by a hyperparameter epsilon ε. In the SVM algorithm, we are looking to 
maximize the margin between the data points and the hyperplane. It is important to maximize the 
margin, to reduce the overall error and avoid overfitting effect. Regularization parameter is to 
balance the margin maximization [69]. 
In SVR, 𝑓(𝑥) = 𝑤. 𝑥 + 𝑏; to reach optimal of the function. The variable epsilon 𝜖 is allowed 
because the linearity constraints must be mitigated for non-linear data [70]. The variable C 
determines the trade-off between the optimization of the cost function and the amount up to which 
deviation larger than  𝜖 are tolerated. The larger C values implies greater cost of the error. 
3.1.4 Random Forest Regression 
The Random Forest is an ensemble of decision tree, it produces accurate prediction than a single 
decision tree. An individual decision tree has been formed by a different bootstrap sample of the 
original data and a set of randomly selected features. Bootstrap sampling is a statistical data 
resampling method taking random number of samples with replacement from the training subset 
to train each estimator. Randomly chosen sub-samples and features has reduced the correlation 





estimators has made a robust model minimizing the generalization error in comparison with a 
single estimator. RF uses randomness in the regression process by selecting a random subset of 
variables to determine the split at each node [71]. In each tree, the ensemble predicts the data that 
are not in the tree and by calculating the difference in the mean square errors between the Out of 
Bag (OOB) data and data used to grow the regression trees, the RF algorithm gives an error of 
prediction called the OOB error of estimate for each variable [71]. This error produces a measure 
of the importance of the variables by comparing how much OOB error of estimate increases when 
a variable is permuted, whilst all others are left unchanged [71]. The forward selection procedure 
adds variables to the model one by one, at each step the variable that is not in the model is tried 
for inclusion based on a probability threshold.  
3.1.5 Artificial Neural Networks – Multi-Layer Perceptron Regression 
Artificial neural networks (ANN) are computational models inspired by the nervous system of 
humans. The architecture of an artificial neural network defines how its neurons are arranged in 
relation to each other. Neurons are computational units in the network that have weighted input 
signals and produce an output signal using an activation function [72]. The main architectural 
features of ANN are the input layer, hidden layers, and the output layer. The input layer handles 
data input from the external environment. The hidden layers are composed of several neurons 
which are responsible for extracting patterns. Lastly, the output layer is responsible for providing 
the final output depending on the computations made before [72]. The simplest case of a feed-
forward ANN is Single-Layer Perceptron, a feed- forward network consisting of only an input 





product of each input weight and a bias. Single Layer Perceptron are linear classifiers, thus only 
capable of finding patterns that are linearly separable. To learn non-linear functions, more hidden 
layers must be added [73]. An architecture of an ANN is shown in Figure 12. 
Multi-Layer Perceptron (MLP) is composed of one input layer, one or more layer of Threshold 
Logic Units (TLUs) called the output layer. The layers close to the input layer are usually called 
the lower layers. Every layer except the output layer includes a bias neuron and is fully connected 
to the next layers. [74]. To design a particular ANN model for requires tuning of several different 
hyperparameters. A summary of the tunable hyperparameters for ANN model is given in Table 7. 
3.1.6 Extreme Gradient Boosting 
Extreme Gradient Boosting (XGBoost) is a variant of the gradient tree boosting proposed by 
Fredman [75]. Boosting is an approach to create an ensemble of the model, commonly used in 
decision trees. It fits multiple models in series with each successive model to minimize the error 
 
Figure 12: Artificial Neural Network 
Table 7: Hyperparameters for ANN 
Hyperparameter Value 
Input neurons One per feature 
Hidden layers Typically, 10 - 5 
Neurons per hidden layers Typically, 10 - 100 
Loss function MSE/MAE 






of the previous models. Most used algorithm for boosting is XGBoost, implemented on stochastic 
gradient boosting. Stochastic gradient boosting algorithm incorporates resampling of records and 
columns in each round. XGBoost provides several parameters which can be tuned to avoid 
overfitting. It is comprised of sequence of decision trees utilizing gradient descent algorithm to 
minimize the error of weak estimators in which the objective function consists of training loss and 
regularization term. The objective function, ℒ(𝜙) is calculated using the following equation. 
ℒ(𝜙) = 	∑ ΙQ𝑦(,𝑦9(R +	∑ Ω(𝑓;);(      (5) 
In equation (5), ∑ ΙQ𝑦(,𝑦9(R(  indicates the training loss function that measures the difference 
between the predicted output and the actual observations. The training loss function can be 
measure using different types of error, such as Mean Squared Error (MSE) and Logistic Loss 
calculated using the following equations,  
𝑀𝑆𝐸 = 	∑ (𝑦( − 𝑦%:)+)(      (6) 
𝐿𝑜𝑔𝑖𝑠𝑡𝑖𝑐	𝐿𝑜𝑠𝑠 = 	∑ [𝑦( lnQ1 + 𝑒"/<!R + (1 − 𝑦( lnQ1 + 𝑒/<!R)])(    (7) 
In equation (5), ∑ Ω(𝑓;);  represents the regularization term, which penalizes the complexity of the 
model to avoid overfitting. Which is equal to 𝛾, + '
+
𝜆||𝑤||+. During training, the model is trained 
additively, by optimizing for one tree at a time. XGBoost utilize the same regularization strategy 
as Regularized Greedy Forest [75] has used. All the trees are trained once at a time improving the 
performance of the algorithm in teams of it run time. Furthermore, XGBoost also supports row 
subsampling and column subsampling, two techniques used to control bias and variance in 
Random Forest. XGBoost uses two additional techniques beside regularization to improve the 
performance of the model. The first technique is shrinkage of weights, which is done by scaling 
newly added weights with parameter η, also known as the learning rate. This reduces the influence 





is the other technique used to improve the model. It works as bagging does in the random forest 
algorithm by selecting sub-samples of features for each tree. This is done to decorrelate features, 
reduce bias and prevent overfitting of the ensemble model. Furthermore, the XGBoost algorithm 
has many computational advantages compared to other ensemble models. Such advantages are 
block structure for parallel learning, cache-aware settings and out-of-core computations [].		
4.2 Model creation 
In this section, we discuss the details on model creation. After preprocessing the dataset, we split 
the data into training and test set. We use the training data to determine the parameters for the 
models discussed in the previous section. Then using the test set we evaluate our models. 
4.2.1 Training 
The goal for machine learning model is to learn from the experiments that model is capable of 
generalizing the learning into new instances. To evaluate the performance of the model, the model 
is usually trained on a subset of whole dataset and then testing the performance of the model on 
the remaining dataset that measure model’s ability to generalize. In this work, we use 80% of the 
data for training the model and 20% of the data for testing the models. These datapoints in the 
training set and test was chosen randomly using a random number generator. To make sure we 
train and evaluate the models on the same data this training and test split is kept constant across 
all the models using a seed for the random number generator. During this training, the model 
parameters are determined. For example, if we use an MLP, the training phase determines the 
values for the weight vectors for an MLP. However, we can change the number of internal layers, 
nodes in the internal layers, kernel functions for the neurons etc. to generate different architectures. 





model, we also need to experiment with these hyperparameters. In the next subsection we discuss 
the process for determining these hyperparameters.  
4.2.2 Hyperparameter Tuning using Grid search 
The process of searching for the ideal model architecture is referred to as hyperparameter tuning. 
To design optimal machine learning model architecture, we should be able to explore a range of 
possibilities. For tuning the hyperparameters and thus generating an optimal model, we have 
adopted a grid search methodology. This is a sequential hyperparameter tuning method to 
determine the best set of hyperparameters. Also, to ensure that our model is not overfitting, we 
have also used 3-fold cross validation during grid search. Hence, in this technique, we predefine a 
range of values for the hyperparameters. Then the grid search will generate a grid for all these 
values of the hyperparameters. Then for each of these points in the hyperparameter search space, 
we train an architecture and evaluate the architecture. However, for training, we divide the training 
Table 8: Hyperparameters for XGBoost 
XGBoost Hyperparameters Values 
learning_rate In each boosting step, this value shrinks the 
weight of new features, preventing overfitting. 
This value must be between 0 and 1. 
max_depth The maximum depth of a tree. Greater the depth, 
greater the complexity of the model and easier to 
overfit. This value must be an integer greater 
than 0. 
n_estimators The number of trees in our ensemble. 
gamma A regularization term, it is the minimum loss 
necessary to occur a split in a leaf. It can be any 
value greater than zero and has a default value 
of 0. 
lambda L2 regularization on the weights. This 
encourages smaller weights. Default is 1 but it 
can be any value. 
alpha L1 regularization on the weights. As lambda, this 
also encourages smaller weights. Default is 1 






set from the previous subsection in 3 portions where the sample in each portion is chosen 
randomly. From these portions, we create all possible subsets of size 2. These subsets are then 
used as the training set for the specific architecture. The third portion is used for evaluating the 
architecture. Once we have the performance metric for all these subsets, we take the average value 
of the performance metric and that is considered to be the performance for that architecture. After 
evaluating models for all the points in the hyperparameter search space, the model architecture 
(i.e., values of the hyperparameter) providing the best performance is chosen. This whole process 
of hyperparameter is implemented using the GridSearchCV API of scikit learn. However, rather 
than creating a hyperparameter search space for all the hyperparameter at once, we can also divide 
the hyperparameters into disjoint subsets and determine the optimal values of the hyperparameters 
in the first subset using the GridSearchCV. Then, using these optimal values of the first subset, we 
can tune the second subset in the same process. For example, for the XGBoost regressor, first, a 
higher learning rate is used to make training faster. Then we determine the tree-specific 
hyperparameters. Finally, model is trained to get the optimal number of estimators. A brief 

















This chapter provides a discussion on the experimental results for the machine learning models 
mentioned in chapter 4. First, we present the metrics that we have used in this work to evaluate the 
performance of the machine learning models. Then based on these metrics we evaluate the 
performance of each individual model and then present a comparative study to determine the model 
showing the best performance for predicting the average latency (i.e., target variable). After we 
determine the best model, we compare the runtime of the model to predict the average latency with 
the time it takes for the NoC simulator to determine the latency of an architecture. 
5.1 Metrics for Evaluation 
To evaluate the performance of the machine learning models, we need to measure how close the 
prediction is to the actual values. There are several metrics to determine the performance of the 
regression models such as Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean 
Square Error (RMSE) and R-squared (R2). These metrics are based on analysis of the residuals 
and assesses how well the model fits the data. The following subsections present the definition for 
these metrices.  
5.1.1 Mean Absolute Error (MAE) 
MAE is one of the simplest ways to determine the performance of the regression models. MAE is 
defined as the absolute difference between the actual observations and predicted values averaged 
over the data. The following equation can be used to calculate the MAE 
𝑀𝐴𝐸 = '
=





Here 𝑦9( is the predicted value, 𝑦(is the actual value of the target variable and m is the number of 
points in the dataset used to evaluate the model. 
5.1.2 Mean Squared Error (MSE) 
Unlike MAE which only looks at the absolute difference, MSE is calculated as the average of 
squared differences between prediction and actual observation. The formula for calculating the 
MSE is given in equation (9) 
𝑀𝑆𝐸 = '
=
∑ (𝑦9( − 𝑦()+(       (9) 
5.1.3 Root Mean Squared Error (RMSE): 
𝑅𝑀𝑆𝐸 is the square root of the average squared error in the predicted values 𝑦%:  and is calculated 
using the following equation,  
                          R𝑀𝑆𝐸 = 	√𝑀𝑆𝐸 = 	c '=∑ (𝑦9( − 𝑦()
+
(        (10)  
5.1.4 Coefficient of determination 𝑹𝟐	 
The coefficient of determination, R2, is used to analyze how differences in one variable can be 
explained by a difference in a second variable. It gives you an idea of how many data points fall 
within the results of the line formed by the regression equation. The higher the coefficient, the 
higher percentage of points the line passes through when the data points and line are plotted. The 
typical value of R2 is between 0 and 1, and sometimes it is interpreted as percentages. A higher 
percentage denotes a better model. R2 can be calculated using the following equation, 





&      (11) 
Here 𝑦9(  is the predicted value, 𝑦(  and 𝑦e  is the actual and average value of the target variable 
respectively.  





5.2 Performance of the Machine Learning Models 
In this section, we discuss the performance of the regression models discussed in section 4.1. First, 
we discuss the performance of the models individually and then in the next section we present a 
comparative study among the models based on their performance.  
 MSE RMSE 𝐑𝟐 
Linear Reg. 403.94213 20.0983 41% 











5.2.1 Linear Regression  
 MSE RMSE 𝐑𝟐 













The performance of the linear regression model is presented in Figure 13. The linear regression 
model has an RMSE value of 20.09 cycles, and a 𝑅+ value is 0.41. The low value for R2 denotes 
that the model is only able to capture 41 percent variation in the target variable. In top cases (Figure 
13 b)) where the predicted latency is close to actual latency, the residual is almost zero. However, 
 
 MSE RMSE 𝐑𝟐 












in cases where the predictions are extremely bad, the residual is as high as 90 cycles (Figure 13 
c)). 
5.2.2 Bayesian Regression 
The performance of the linear regression model is presented in Figure 14. The Bayesian regression 
model has an RMSE value of 20.07 cycles, and a 𝑅+ value is 0.42. These values shows that the 
Bayesian regressor has a performance similar to the linear regression model. In top cases (Figure 
14 b)), the residual is almost zero. Alternatively, in cases where the predictions are extremely bad 
(Figure 14 c)), the residual is almost comparable to the linear regression model. 
5.2.3 Support Vector Regression 
Figure 15 depicts the performance of the support vector regressor. The support vector regressor 
has an RMSE value of 7.76 cycles, and a 𝑅+  value is 0.90359. Hence, compared to linear 
regression and Bayesian regression, the support vector regressor is more accurate in predicting the 
average latency. In top cases (Figure 15 b)) where the predicted latency is close to actual latency, 
the residual is almost zero. On the other hand, for cases where the regressors performs poorly 
(Figure 15 c)), the residuals are higher than the linear regression and Bayesian regression models. 
However, the overall predictions are much better than the linear and Bayesian regression models 
as depicted by the lower RMSE value.  
5.2.4 Multilayer Perceptron Regression 
The performance of the MLP regressor is presented in Figure 16. The MLP regressor has an RMSE 
value of 5.9904 cycles, and a 𝑅+ value is 0.946. For cases where the regressor accurately predicts 
the average latency (Figure 16 b)), the residual is almost zero. However, for cases where the 





predicting the average latency poorly for a few samples, for most cases, the MLP regressor is able 
to accurately predict the average latency. 
5.2.5 Random Forest Regression 
The performance of the Random Forest regressor is presented in Figure 17. The Random Forest 
regressor has an RMSE value of 5.47 cycles, and a 𝑅+ value is 0.958. For cases where the regressor 
 MSE RMSE 𝐑𝟐 











accurately predicts the average latency (Figure 17 b)), the residual is almost zero. However, for 
cases where the regressor performs poorly (Figure 17 c)), the residual values are similar to MLP 
regressor.  
 MSE RMSE 𝐑𝟐 











5.2.6 XGBoost Regression 
Figure 18 shows the performance of the XGBoost regressor. For the XGBoost regressor, the max 
depth was set to 30 with a feature and observation specified sub-sampling of 0.1, meaning that 
10% of the features and 10% of the observations will be included when growing each tree. The 
XGBoost model was regularized with regularization parameter γ = 0.3 and a learning rate, η of 
 MSE RMSE 𝐑𝟐 












0.1. The RMSE, and a 𝑅+ for the XGBoost is 5.07206 cycles and .9618 respectively. Also, the 
mean average error for this model is 1.50348 cycles. For cases where the regressor accurately 
predicts the average latency (Figure 18 b)), the residual is almost zero. However, for worst 
performing cases (Figure 18 c)), the residual is similar to the linear and Bayesian regressor. 
5.3 Comparative Performance Analysis 
In this section we compare the regression models based on RMSE and R2. Figure 19 shows the 
RMSE, and R+ for all the ML models considered in this work. From Figure 19 a) we can see that 
among all the models the ensemble-based methods such as Random Forest and XGBoost performs 
the best and has the R2 value. One the other hand, simple models like linear regression and 
Bayesian regression has a much lower R2 values compared to other models. Alternatively, from 
Figure 19 b), we observe that the RMSE value for the ensemble methods are lower than all other 
methods. Also, for XGBoost, we observe the lowest RMSE value. This is due to the regularized 
objective optimization and gradient boosting mechanism used in the XGBoost algorithm. Hence, 
from this comparison we select the XGBoost as the best model for predicting the average latency 










ML Models RMSE 𝐑𝟐 (Rounded) 
Linear Reg. 20.0983 41% 
Bayesian Reg. 20.07782 42% 
SVR 7.764756 90% 
MLP Reg. 5.9904 95% 
RF Reg. 5.47026 96% 
XGBoost 5.07206 96% 
c) 





5.4 Speedup in Average Latency Computation 
In this section, we compare the prediction time of the XGBoost model with the simulation time 
for computing the average latency for different network sizes and for different traffic patterns. For 
this comparison we use three traffic patterns and four different system sizes for evaluating the 
performance of a NoC architecture. Table 9 shows the time required by the model for making a 
prediction and the time required by the simulation. For both the prediction and simulation, we have 
used the same machine configuration. From this table we see that on an average, the time required 
by the simulation-based approach is 35.11 seconds whereas the average time taken by the XGBoost 
model to predict the average latency is only 0.00412 seconds. For all the NoC configurations used 
in this table, we see that the lowest speedup is 983 and the highest speedup is 38066.7. 
Furthermore, the speedup value is much bigger for larger network sizes with an average speed up 
of 8513. This shows that such machine learning model will significantly reduce the time required 























Speedup Simulation XGBoost Model 
2 
Neighbor  
3.66 0.00240016 1524.898444 
4 19.69 0.007283926 2703.212522 
8 43.35 0.00306201 14157.36809 
16 268.98 0.007066011 38066.73718 
2 
Uniform 
1.60 0.006847143 233.6740973 
4 6.95 0.003551006 1955.783623 
8 11.14 0.001980066 5626.07424 
16 24.70 0.002777815 8891.881281 
2 
Bitcomp 
4.34 0.003509045 1236.80387 
4 3.13 0.002321005 1348.553828 
8 5.92 0.00602293 982.9102874 
16 27.85 0.00266695 10443.12185 











Machine learning, which allows model to learn without being directly trained, has been 
increasingly used in many application and businesses as the Artificial Intelligence has evolved 
over the years. In supervised learning, models of each types are presented which includes Linear 
Regression, Bayesians Regression, Random Forest Regression (RF), Support Vector Regression 
(SVR), Artificial Neural Network (ANN), Extreme Gradient Boosting (XGBoost). We 
characterized the research as determining the applicability of machine learning-based techniques 
to determining performance parameters for a NoC architecture, comparison of each representative 
model is illustrated. Following that, the data generation, pre-processing data, choosing an effective 
statistical analysis to identify correlation in the data, and analyzing the findings, data normalization 
strategies, hyper-parameter tuning and k-fold cross-validation, are discussed in theory and how 
they can be used to efficiently increase the training model's efficiency. The evaluations metrics are 
preformed to understand the inefficient predictive ability of the linear regression models. Due to 
strong non-deterministic property of the XGBoost, Artificial Neural Network, Support Vector 
Regression and Random Forest, with normalized data are designed to obtain effective tuning 
parameters, and the optimal models with the lowest RMSE and highest R2. In the final step, the 
corresponding performance of the built models compared to booksim simulator is quantitatively 
and visually analyzed in depth. The quantitative and visual finding illustrate the efficiency of the 
XGBoost for the given dataset. The accuracy of the artificial neural network MLP regression 
(RMSE = 5.9904, R2 = 95%), random forest (RMSE = 5.47026, R2 = 96%), and support vector 





process is more complex and more time consuming. Moreover, we discover that the right decision 
comparing different models is heavily influenced by the dataset well as the cross-validation and 
the quantitative metrics of the models and the computation time. From chapter 5, we conclude that 
out proposed Xtreme-NoC, outperforms other machine learning regression models such as linear 
regressor, Support Vector Regressor, and deep neural network for predicting the latency of NoC 
architectures. The Xtreme-NoC model can predict the latency of a NoC architecture with a root 
mean square error of 5.077 cycles and r-squared value of 96.16%. The model improves the runtime 
by 8513.29 times compared to simulation-based latency models. This shows that such machine 
learning model will significantly reduce the time required for the NoC DSE process. However, 
this work has the following limitations: 
1. In this work, a limited amount of data for the NoC design space has been used. However, 
this can be extended and NoC design space with more architectural and technological 
parameters can be explored. Furthermore, generative models like Generative Adversarial 
Network (GAN) can be adopted to generate many datapoints for the dataset and can be an 
exciting future direction. 
2. This research does not describe the characteristics of the dataset that makes XGBoost 
perform well. Hence, the dataset can be analyzed and explored further to determine the 
characteristics of the dataset that makes boosting algorithms like XGBoost to work well on 
the data. 
3. Finally, this work does not explore the opportunities with deep learning model (e.g., Deep 









1. al., K. K. (2007). Carbon Nanotubes as Optical Antennae. Advanced Materials, 19, 421-
426. 
2. Bienia, C. (2011, January). Benchmarking modern multiprocessors. Ph.D. Dissertation, 
Princeton Univ.,. 
3. Binkert, N., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., . . . Krishna, T. (2011, 
August). The GEM5 Simulator. ACM SIGARCH Computer Architecture News, 39(2), 1-7. 
4. Borkar, S. (2000). Obeying Moore's law beyond 0.18 micron [microprocessor design]. 
Proceedings of 13th Annual IEEE International ASIC/SOC Conference, (pp. 26-31). 
5. Brière, M., Girodias, B., Bouchebaba, Y., Nicolescu, G., Mieyeville, F., Gaffiot, F., & 
O'Conner, I. (2007). System Level Assessment of an Optical NoC in MPSoC Platform. 
Proceedings of DATE.  
6. Burke, P., Burke, P., Li, S., & Yu, Z. (2006, July). Quantitative Theory of Nanowire and 
Nanotube Antenna Performance. IEEE Transactions on Nanotechnology, 5(4), 314-334. 
7. Chang, M., Cong, J., Kaplan, A., Naik, M., Reinman, G., Socher, E., . . . Socher, E. (2008). 
CMP Network-on-Chip Overlaid With Multi-Band RF-Interconnect. Proceedings of IEEE 
International Symposium on High-Performance Computer Architecture (HPCA).  
8. Chaparro, P., Chaparro, P., Gonzalez, J., Gonzalez, J., Magklis, G., Magklis, G., . . . 
Gonzalez, A. (2007). Understanding the Thermal Implications of Multi-Core 
Architectures. IEEE Transactions on Parallel and Distributed Systems, 18(8), 1055 - 1065. 
9. Cuesta, D., Ayala, J. H., Atienza, D., Acquaviva, A., & Macii, E. (2010). Adaptive task 





10. Cui, J., & Maskell, D. (2012, June). A Fast High-Level Event-Driven Thermal Estimator 
for Dynamic Thermal Aware Scheduling. IEEE Transactions on Computer-Aided Design 
of Integrated Circuits and Systems, 31(6), 904-917. 
11. Deb, S., Ganguly, A., Chang, K., Pande, P., Beizer, B., & Heo, D. (2010). Enhancing 
Performance of Network-on-Chip Architectures with Millimeter-Wave Wireless 
Interconnects. Proceedings of ASAP, (pp. 73-80). 
12. Deb, S., Ganguly, A., Pande, P., Belzer, B., & Heo, D. (2012). Wireless NoC as 
Interconnection Backbone for Multicore Chips: Promises and Challenges. IEEE Journal 
on Emerging Selective Topic Circuits Systems, 2(2), 228-239. 
13. DiTomaso, D., Kodi, A., Kaya, S., & Matolak, D. (2011). iWise: Inter-router wireless 
scalable express channels for Networks-on-Chips (NoCs) architecture. Proceedings of 
IEEE HOTI, (pp. 11-18). 
14. Duato, J., Yalamanchili, S., & Ni, L. (2002). Interconnection Networks-An Engineering 
Approach. Morgan Kaufmann. 
15. Floyd, B., Hung, C., & O, K. (2002, May). Intra-chip wireless interconnect for clock 
distribution implemented with integrated antennas, receivers, and transmitters. IEEE 
Journal of Solid-State Circuits, 37(5), 543-552. 
16. Flynn, D. (1997). AMBA: enabling reusable on-chip designs. IEEE Micro, 17(4), 20-27. 
17. Ganguly, A., Chang, K., Deb, S., Pande, P., Belzer, B., & Teuscher, C. (2010). Scalable 
Hybrid Wireless Network-on-Chip Architectures for Multi-Core Systems. IEEE 
Transaction on Computers. 
18. Ganguly, A., Wettin, P., Chang, K., & Pande, P. (2011). Complex Network Inspired Fault-





19. Ge, T., Malani, P., & Qui, Q. (2010). Distributed Task Migration for thermal management 
in many-core systems. Proceedings of DAC, (pp. 579-584). 
20. Hanumaiah, V., Vrudhula, S., & Chatha, K. (2009). Maximizing performance of thermally 
constrained multi-core processors by dynamic voltage and frequency control. Proceedings 
of the ICCAD, (pp. 310 - 313). 
21. Ho, R., Mai, K., & Horowitz, M. (2001). The Future of Wires. Proceedings of the IEEE, 
89(4), 490 - 504. 
22. Hofmann, R., & Drerup, B. (2002). Next generation CoreConnect processor local bus 
architecture. Proceedings of 15th Annual IEEE International ASIC/SOC Conference, (pp. 
221-225). 
23. Kapur, P., Chandra, G., McVittie, J., & Saraswat, K. (2002, April). Technology and 
reliability constrained future copper interconnects - Part II: Performance Implications. 
IEEE Transactions on Electronic Devices, 49(4), 598-604. 
24. Lee, S., Zhang, L., Cong, J., Tam, S., Pefkianakis, I., Lu, S., . . . Naik, M. (2009). A scalable 
micro wireless interconnect structure for CMPs. Proceedings of the 15th annual 
International Conference on Mobile Computing and Networking, (pp. 217-228). 
25. Li, S., Ahn, J., Strong, R., Brockman, J., Tullsen, D., & Jouppi, N. (2009). McPAT: An 
integrated power, area, and timing modeling framework for multicore and manycore 
architectures. Proceedings of the International Symposium on Computer Architecture, (pp. 
469-480). 
26. Lin, J., Wu, H., Su, Y., Gao, L., Sugavanam, A., Brewer, J., . . . O, K. (2007, August). 
Communication Using Antennas Fabricated in Silicon Integrated Circuits. IEEE Journal 





27. Lysne, O., Lab., S. R., Lysaker, N., Skeie, T., Reinemo, S.-A., & Theiss, I. (2006, January). 
Layered routing in irregular networks. IEEE Transactions on Parallel and Distributed 
Systems, 17(1), 51-65. 
28. Murray, J., Pande, P., & Shirazi, B. (2012). DVFS-enabled sustainable wireless NoC 
architecture. Proceedings of IEEE SOCC.  
29. National Technology Roadmap for Semiconductors. (1997). Semiconductor Industry 
Association. 
30. Ogras, U., & Marculescu, R. (2003, July). "It's a small world after all": NoC performance 
optimization via long-range link insertion. IEEE Transactions on Very Large Scale 
Integration (VLSI) Systems, 14(7), 693-706. 
31. Pande, P., Grecu, C., Jones, M., Ivanov, A., & Saleh, R. (2005). Performance evaluation 
and design trade-offs for network-on-chip interconnect architectures. IEEE Transactions 
on Computers, 54(8), 1025-1040. 
32. Pavlidis, V., Pavlidis, V., Friedman, E., & Friedman, E. (2007). 3-D Topologies for 
Networks-on-Chip. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 
15(10), 1081-1090. 
33. Petermann, T., & Rios, P. (n.d.). Spatial small-world networks: A wiring-cost perspective. 
arXiv preprint cond-mat/0501420 (2005).  
34. Shacham, A., Shacham, A., Bergman, K., & Carloni, L. (2008). Photonic Networks-on-
Chip for Future Generations of Chip Multiprocessors. IEEE Transactions on Computers, 
57(9), 1246 - 1260. 






36. Sheikh, H., Tam, H., Amhad, I., Ranka, S., & Phanisekhar, B. (2012). Energy-and 
performance-aware scheduling of tasks on parallel and distributed systems. ACM Journal 
on Emerging Technologies and Computing Systems (JETC), 8(4). 
37. Skadron, K., Stan, M. R., Huang, W., Velusamy, S., & Sankaranarayanan, K. (2003). 
Temperature-aware microarchitecture. Proceedings of ISCA, (pp. 2-16). 
38. Sylvester, D., & Keutzer, K. (2001, April). Impact of Small Process Geometries on 
Microarchitectures in Systems on a Chip. Proceedings of the IEEE, 89(4), 467-489. 
39. Watts, D., & Strogatz, S. (1998). Collective Dynamics of 'Small World' Networks. Nature, 
393, 440-442. 
40. Woo, S., Ohara, M., Torrie, E., Singh, J., & Gupta, A. (1995). The SPLASH-2 programs: 
characterization and methodological considerations. Proceedings of ISCA, (pp. 24-36). 
41. Yeo, I., Liu, C., & Kim, E. (2008). Predictive dynamic thermal management for multicore 
systems. Proceedings of DAC, (pp. 734-739). 
42. Zhao, D., & Wang, Y. (2008). SD-MAC: Design and Synthesis of a Hardware-Efficient 
Collision-Free QoS-Aware MAC Protocol for Wireless Network-on-Chip. IEEE 
Transactions on Computers, 57(9), 1230 - 1245. 




45. David, W., Patrick, G., Henry, H., (2009). On-Chip Interconnection Architecture of the 





46. W. J. Dally and B. Towles, “Route packets, not wires: On-chip interconnection networks,” 
in Design Automation Conference, 2001. Proceedings. IEEE, 2001, pp. 684–689. 
47. J. Duato, S. Yalamanchili, and L. M. Ni, Interconnection networks: an engineering 
approach. Morgan Kaufmann, 2003. 
48. U. Y. Ogras and R. Marculescu, "It's a small world after all": NoC performance 
optimization via long-range link insertion," in IEEE Transactions on Very Large Scale 
Integration (VLSI) Systems, vol. 14, no. 7, pp. 693-706, July 2006, doi: 
10.1109/TVLSI.2006.878263. 
49. J. Ko, J. Kim, Z. Xu, Q. Gu, C. Chien, and M. F. Chang, "An RF/Baseband FDMA-
Interconnect Transceiver for Reconfigurable Multiple Access Chip-to-Chip 
Communication," in 2005 IEEE International Solid-State Circuits Conference (ISSCC) 
Digest of Technical Papers, February 2005 
50. A. Ganguly, K. Chang, S. Deb, P. P. Pande, B. Belzer, and C. Teuscher, “Scalable hybrid 
wireless network-on-chip architectures for multicore systems,” IEEE Transactions on 
Computers, vol. 60, no. 10, pp. 1485–1502, 2011 
51. S. Abadal, E. Alarcón, A. Cabellos-Aparicio, M. C. Lemme and M. Nemirovsky, 
"Graphene-enabled wireless communication for massive multicore architectures," in IEEE 
Communications Magazine, vol. 51, no. 11, pp. 137-143, November 2013, doi: 
10.1109/MCOM.2013.6658665. 
52. A. Shacham, K. Bergman, and L. P. Carloni, “Photonic networks-on-chip for future 






53. William James Dally and Brian Patrick Towles. 2004. Principles and Practices of 
Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 
54. Pande, Partha & Grecu, Cristian & Ivanov, Andre & Saleh, Resve & Micheli, Giovanni. 
(2005). Design, Synthesis, and Test of Network on Chips. IEEE Design & Test of 
Computers. 22. 404-413. 10.1109/MDT.2005.108. 
55. William James Dally and Brian Patrick Towles. 2004. Principles and Practices of 
Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 
56. Nan Jiang, Daniel U. Becker, George Michelogiannakis, James Balfour, Brian Towles, 
John Kim and William J. Dally. A Detailed and Flexible Cycle-Accurate Network-on-Chip 
Simulator. In Proceedings of the 2013 IEEE International Symposium on Performance 
Analysis of Systems and Software, 2013. 
57. V. Catania, A. Mineo, S. Monteleone, M. Palesi and D. Patti, "Improving the energy 
efficiency of wireless Network on Chip architectures through online selective buffers and 
receivers shutdown," 2016 13th IEEE Annual Consumer Communications & Networking 
Conference (CCNC), Las Vegas, NV, 2016, pp. 668-673, doi: 
10.1109/CCNC.2016.7444860.  
58. Kumar, Anil & Talawar, Basavaraj. (2018). Machine Learning Based Framework to 
Predict Performance Evaluation of On-Chip Networks. 10.1109/IC3.2018.8530505. 
59. S. Das et al., “Optimizing 3d noc design for energy efficiency: A machine learning 
approach,” in Computer-Aided Design (ICCAD), 2015 IEEE/ACM International 





60. Z. Qian et al., “Svr-noc: A performance analysis tool for network- on-chips using 
learning-based support vector regression model,” in Proceedings of the Conference on 
Design, Automation and Test in Europe. EDA Consortium, 2013, pp. 354–357.  
61. [12] Z.-L. Qian et al., “A support vector regression (svr)-based latency model for 
network-on-chip (noc) architectures,” IEEE Transactions on Computer-Aided Design of 
Integrated Circuits and Systems, vol. 35, no. 3, pp. 471–484, 2016.  
62. W. J. Dally and B. Towles, Principles and Practices of Interconnection Networks. San 
Francisco, CA: Morgan Kaufmann, 2004 
63. BookSim 1.0. https://github.com/booksim/booksim 
64. A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, “Analyzing CUDA 
workloads using a detailed GPU simulator,” in Proceedings of the IEEE Symposium on 
Performance Analysis of Systems and Software, 2009. 
65. Yu, L. and H. Liu. Feature selection for high-dimensional data: A fast correlation-based 
filter solution. in ICML. 2003. 
66. Jeff Hale Scale, “Standardize, or Normalize with Scikit-Learn “in towards data science, 
2019. 
67. Thiago Christiano Silva. Machine Learning in Complex Networks. 1st edition. 2016. 
ISBN: 3-319-17290-5. 
68. Bruce, Peter C., et al. Practical Statistics for Data Scientists: 50 Essential Concepts. 
O'Reilly, 2020.  
69. “A Top Machine Learning Algorithm Explained: Support Vector Machines (SVMs).” 






70. Alex J. S., and Bernhard S., “A tutorial on support vector regression.” 2003. 
71. Elfatih M. Abdel-Rahman, Fethi B. Ahmed & Riyad Ismail (2013) Random forest 
regression and spectral band selection for estimating sugarcane leaf nitrogen concentration 
using EO-1 Hyperion hyperspectral data, International Journal of Remote 
Sensing, 34:2, 712-728, DOI: 10.1080/01431161.2012.713142 
72. Marlies Rybnicek, Christoph Lang-Muhr, and Daniel Haslinger. “A roadmap to continuous 
biometric authentication on mobile devices”. In: IEEE, Aug. 2014, pp. 122–127. ISBN: 
978-1-4799-7324-8. 
73. Ivan Nunes da Silva. Artificial Neural Networks A Practical Course. 2017. ISBN: 3-319-
43162-5.  
74. Aurélien Géron., Hands-on machine learning with Scikit-Learn and TensorFlow concepts, 
tools, and techniques to build intelligent systems. O'Reilly, 2019. 
75. Jerome H. Friedman "Greedy function approximation: A gradient boosting machine.," The 
Annals of Statistics, Ann. Statist. 29(5), 1189-1232, (October 2001). 
 
 
 
 
 
