Speeding up the register-transfer level (RTL) simulation of network-on-chip (NoC) is essential for design optimization under various use scenarios and parameters. One of the promising approaches for RTL NoC speedup is high-level modeling. Conventional high-level modeling approaches lead to an accuracy problem or modeling efforts that are caused by the absence of modeling framework or requiring in-depth knowledge of specific behaviors of target NoCs. To support cycle-accurate and formal high-level modeling framework, we propose a cellular automata (CA) modeling framework for RTL NoC. The CA abstracts detailed RTL NoC dynamics into the proposed high-level state transitions, which support flit transmission among CA components through dynamically changing flit paths based on the target RTL routing and arbitration algorithms. To prevent the meaningless execution of stable CA, the CA are designed to be triggered by state-change events. The proposed simulation engine asynchronously invokes CA to update their states and perform actions of flit transmissions or flit-path changes based on the state-decision result. To reduce the modeling difficulty, we provide a test environment that generates the state-transition rules for CA after monitoring the relationships between high-level states and leading actions under randomly injected packets during target RTL NoC simulations. Experiments demonstrate cycle-level functional homogeneity between RTL and the abstracted CA NoC models and significant simulation speedup.
I. INTRODUCTION
Advances in nanoscale semiconductor technology enable the integration of a large number of intellectual property (IP) blocks on a single chip to meet application-specific highperformance requirements for system-on-chip (SoC). When many IP-to-IP communications are required to fulfill a given SoC application, designers can employ a network-onchip (NoC) communication architecture instead of traditional on-chip bus architectures to achieve performance scalability.
In the NoC-based SoC, packets traverse across one or multiple routers connecting IPs. Depending on the IP-to-router The associate editor coordinating the review of this manuscript and approving it for publication was Giambattista Gruosso . assignment and router parameter settings (i.e., the number of virtual channels and buffer capacity) subject to limited hardware (HW) resources, the communication performance can vary and affect the overall system performance. To find the optimal design parameters of target IPs, a large number of iterative NoC simulations is required to explore candidate design spaces.
Designers commonly develop NoCs using HW description languages (HDLs) at the register-transfer level (RTL). This abstraction is useful for the automatic synthesis from the RTL model to the circuit-or layout-level HW implementation. However, the detailed simulation of RTL NoCs can hinder the exploration of a sufficient number of design candidates due to the slow speed. To improve simulation speed, high-level modeling approaches are typically used. The typical methods can be divided into two categories: 1) queueing-based statistical modeling and 2) architectural-and algorithm-level (AAL) modeling.
The queueing modeling approaches have focused on developing analytic equations that can evaluate the NoC performance in terms of average service throughput, flit-waiting time, and queue utilization [1] - [5] . The queueing models have the advantage of negligible computation overhead due to their statistical characteristics as compared with RTL simulations, which require iterative processing of events for signal-value transitions at every execution time step.
However, the equations and their parameter values (that are related to the service time, waiting time, and so on) are finely designed or tuned for the specific NoC design under a few packet generation scenarios. When the NoC implementation or IP combinations (which can produce different packet-generation patterns) are changed, the corresponding queueing model should be customized through reformulation or parameter resetting based on newly collected plenty of simulation data. Thus, it causes difficulties in model maintenance. Moreover, the estimation of queuing models contains some statistical errors, so the inevitable estimation errors lead to the re-validation using the results of redundant RTL NoC simulation.
The AAL NoC models are usually developed during early design stages using SystemC, Matlab, and C/C++ [6] - [11] . These models specify the structures and behaviors of NoC and help the validation of routing or arbitration algorithms. The AAL models can be realized at different levels of system abstraction depending on the designer's objectives, including accuracy and simulation speed through ignoring individual subordinated HW blocks and representing the operation at the algorithm level. Since the AAL models describe the target NoC using subcomponent models, when the target NoC is modified, the modified parts of RTL NoC can be reflected by revising the corresponding AAL block. The modularity guarantees easier model maintenance than queueing models.
Although AAL modeling has the advantage of allowing flexibility, it lacks a formal modeling framework, which in turn causes modeling efforts such as defining an abstraction level, describing the components' behaviors, and handling interactions between subcomponents. Moreover, AAL modeling typically requires in-depth knowledge of the detailed operating mechanisms of the target RTL.
As an AAL modeling approach, we propose a high-level and cycle-accurate NoC modeling framework and its simulation engine. The proposed modeling framework is an extended version from our previous approach for the generality to cover various types of NoC [12] . The modeling framework is formally designed based on the cellular automata (CA) concept [13] - [15] , which describes the target NoC using a finite number of CA components with a regular connectivity pattern and each component updates its high-level states using its own and its neighbors' states at each simulation time step. In the remainder of this paper, the term cell refers to an individual CA component.
The proposed approach also provides a test environment (which consists of RTL simulator and proposed test library) for the generation of cells' state-transition rules. The rule generation alleviates the modeling efforts and provides a guide to a modeler having insufficient knowledge about the target NoC.
The cells' high-level states are defined relative to flit exchanges and flit-path changes between cells. There are three types of cells: a buffer cell (BC) that holds flits, a coupling cell (CC) that can change the flit path between BCs, and a path-viability cell (PV ) that symbolizes a path-connection (PC) capability to a receiver BC, as shown in Fig. 1 . Among the high-level states, the action state of BC or CC determines the subsequent action execution of the simulation engine. The action is one of a flit moving between BCs or making a CC's flit-path change. For the development of each cell's state-transition function, the designer references the generated rules and the target RTL NoC's routing and arbitration algorithms. The target NoC routing algorithm is utilized to find related neighbors. The arbitration algorithm is referred to CCs' path-making decision among multiple path candidates during its state transition.
To prevent the unnecessary execution of stable CA, we extend the conventional CA concept to support an event-driven invocation to run only active CA (that are currently involved in packet transmissions). For that, we introduce event-generation functions to inform influencee cells (that depend on the influencer cells' state changes). Only the influenced cells are executed at each simulation time step.
The proposed formal modeling framework enables the derivation of action-state decision rules by probing changes of those RTL signals that relate to high-level states. Using the user-specified CA network and probing state-related signals, the test environment generates the rules through tracing each flit's movement and each cell's states during the runtime simulation of the target RTL NoC.
The rest of the paper is organized as follows. Sections II and III describe the modeling methods for the proposed time-step CA and its event-driven extension, respectively. Section IV details the simulation algorithm for the event-driven cycle-accurate CA-modeling method, and Section V shows the rule-generation method for the proposed modeling and simulation framework. Section VI applies the proposed method to open-source RTL NoCs. Section VII concludes the paper.
II. CELLULAR AUTOMATON MODELING OF RTL NOC
The high-level CA model targets RTL NoC designs that have the following characteristics: synchronous routers, virtual channels (VCs), flit-based transmission, and lookahead routing. The synchronous routers have a chain of pipelined operation stages, which comprise the FIFO buffer writing (BW), routing computation (RC), VC allocation (VCA), switch allocation (SA), switch traversal (ST), and link traversal (LT) stages. To reduce the transmission delay, some parts of stages are arranged in parallel. In typical NoC architectures [16] - [19] , the RC step (that calculates the destined output port based on the destination information in a header flit) is arranged with other steps to provide the lookahead routing.
Though the architectures and pipelines of NoC routers can vary, flows of flits have a common characteristic. They are propagated at each computing stage when they are allowed to proceed; otherwise, they are stored. To mimic the flit flows over channels, the proposed modeling framework represents the architecture of RTL NoCs as proposed CA components, which have abstracted states for a flit forwarding in the BW, ST, or LT stages and a flow-path change in the VCA or SA stages.
The high-level cells are divided into three types: BC, CC, and PV , as shown in Fig. 2 (a). Each BC represents an RTL block to store flits using flip-flops (FFs), such as FIFO queues or other FFs that contain traversed flits in the ST or LT steps. Each CC is an arbiter that decides a flit-flow path in VCA or SA stages and connects a pair of two adjacent preceding and succeeding components. The preceding (source) component requests a PC when there is an incoming or residing flit to be flowed to a succeeding (destination) cell. Each PV implies the PC availability of a succeeding component (BC or CC) for CCs. The CC uses the state of the PV to check a PC viability when making a PC.
A. HIGH-LEVEL STATES OF CA
Based on the concept of the component described above, we defined each cell's high-level states as follows. 
payload, tail, single}, r d is the destination address of a flit, and vc is the allocated VC index; ∅ f : a null flit;
• p c/n is a current/next flit path, p c/n ∈P c/n , P c/n = {∅ p } ∪ {(c p , c n , vc)}, c p is the source cell of the flit path, c n is the destination cell of the flit path, and vc is the allocated VC index}, ∅ p : the empty path state; (1) : action state making a PC at the next clock trigger, a d (1) : action state making a path disconnection at the next clock trigger;
• s c is the number of remaining credits (or flit slots) at the next time, s c ∈S c , S c = {0, 1, ..., max c }; max c : the maximum number of credits (or flit slots);
the PC viability, 0 v : the PC inviability. Due to the symmetric placement of the NoC RTL blocks in a cellular structure, each cell has a unique position id. The blocks are strictly arranged in their increasing horizontal indexes (id.d) from input ports. The vertical indexes (id.v) are related to the VC, but can be other values when the id.v is already occupied with other cells, as shown in Fig. 2 (a). The PV s whose id.d is zero have a negative id.v (−2 or −1) because of the BCs, which stand for specific VC flows.
Among the defined states of BC, the b o represents the presence of any valid output flit in its associated RTL block, and the b f represents whether the flit-staying FFs of the RTL block are fully occupied or not (see Fig. 2 (b)(c)). The (b −1b o = 1) represents the validity of an input flit from an accessible previous BC (BC −1b ), and the a f (1) decision of a current BC (BC 0 ) results in the flit-fetching action, which is executed by the simulation engine when the simulation time is updated (see Fig. 3 ). During the flit-fetching action, a flit in BC −1b 's internal buffer is moved to BC 0 's buffer, and the BC whose a f is decided as a f (1) is called an active BC. After the engine executes flit-fetching actions, the (s f 0 , s f 1 ) of a BC can be identified by staying flits in the BC's buffer. When the number of flits in a buffer is changed, the engine reflects the change to (b o , b f ) using a user-developed buffer-abstraction function based on the maximum buffer size and the number of flits in the buffer, as below.
Depending on the target architecture, the structure of CCs' neighbors can be variously organized, as shown in Fig. 4 (a)(b). The PC from a CC 0 to a next neighbor CC (CC 1 ) is also possible according to target NoC design (see Fig. 4 (a)). Each CC can reference multiple PV s to check the next-time PC availability of receiver-side BCs. If the s v of a PV (for a terminal BC of a current path) changes from the 1 v to 0 v , then the path-established CC can change the path to prevent the further flit flow.
As noted above, any cell's accessibility to its neighbors can change during the simulation runtime depending on CCs' current path (p c ). If two cells are adjacent across a CC, then a cell can access the other when the pair of two cells is (c p , c n ) of the CC.p c . In the case of Fig. 4 (a), BC (0,1) (that is a BC whose (id.d, id.v) is (0, 1)) and BC (3, 0) can access each other when the (c p , c n ) of the CC (1, 0) .p c and CC (2, 0) .p c are (BC (0,1) , CC (2, 0) ) and (CC 1,0 , BC (3, 0) ), respectively.
During the state transition of a CC, its p n and a c can be newly updated. The active CC (whose a c is one of a c(1) and a d (1) ) enables the engine to assign the p n to p c for the flit-path change when the simulation time is advanced at the next clock cycle. In typical NoC designs, the SA and VCA computing steps are arranged with other ST or LT steps in parallel, so that the allocation step does not require a cycle-level computation. If the allocation computation of target NoCs causes a single-cycle latency, then the a c of a CC can be extended by including a c (2) , which represents that the CC will make a new flit path at the clock cycle after the next cycle.
The s v of PV stands for the PC availability of a next BC of a CC. Depending on the flow control mechanism, the s c of PV stands for the number of next-time remaining credits (for credit-based flow control) or next-time remaining flit slots. During the PV s' state transition, s c is updated by monitoring flit-in and flit-out action states (a f s), which enables us to predict the number of credits (or flit slots) at the next clock cycle.
For example, if the target NoC employs an on/off flow control based on the buffer fullness (see Fig. 4(b) ), then the PV (2,−2) access the a f s of the BC (2, 0) and BC (3, 0) to decide the next-time fullness of BC (2, 0) . When the BC (2,0) having a single flit is its buffer, and the BC 2,0 is not active but BC (3, 0) is active, the (s c , s v ) of PV (2,−2) for the BC (2, 0) can be set to (0, 1 v ).
If an RTL block that returns a credit locates in an adjacent router, then PV can be a neighbor to a BC (symbolizing the RTL block) in a different router, as shown in Fig. 4 (a). 
B. STATE-TRANSITION FUNCTIONS FOR BC , CC , AND PV COMPONENTS
Given the defined high-level states and the accessibility across the CC's flit-path, the following state-transition functions are defined.
Definition 3: The BC, CC, and PV have the δ f , δ c , and δ v state-transition functions, respectively. 
f ×V 1 ) n n or (P 1 n ) n n → (P n ×A c ) 0 , the statetransition function for (p n , a c ) of CC 0 , −1 and 1 are the horizontal relative positions from CC 0 , n p /n n is the number of accessible previous/next cells from CC 0 ; (receiver's) capability to receive a new flit. The receivability is highly related to the flit occupation (BC 0 .b f ). Depending on the target RTL design, the BC 0 's receivability can be affected by the BC 1b .a f (a 1b f ) because an input flit of BC 0 can be fetched to an occupied slot at the next clock trigger when a front flit of BC 0 is scheduled to leave (i.e., a 1b f = a f (1) ). Thus, the δ f references the coupled states of the sender's flit readiness and the receivability, which are
) and its consequent a 0 f . When a CC 0 has no flit path (p 0 c = ∅ p ) and there is an incoming flit to a active BC −1b (i.e., (s −2b f 0 , a −1b f ) = (¬∅ f , a f (1) )) and a destination BC 1b is receivable at next time (the PV .s v for the BC 1b is 1 v ), CC 0 is required to be active for making a PC to the BC 1b . When a new simulation time is updated, the actions to move the incoming flit (s −2b f 0 ) to a BC −1 and make a CC 0 's path are performed to enable the BC 1b to access its BC −1b during the δ f function.
If a CC 0 is neighbor to multiple previous cells at different ports, the CC determines whether a flit is directed to itself or not using the routing algorithm of the target NoC and the flits' destination. When there are multiple incoming flits to cross a CC 0 , the CC refers to the arbitration algorithm for the next-path (p n ) decision. In such a manner, δ c updates the (p 0 n , a 0 c ) using the target routing and arbitration algorithm considering the receiver-side PC viabilities ((s 1 v ) n n ) for the BC −1 s' newly incoming flits ((s −2b f 0 ) n p ) or waiting flits.
To make an empty or new path at the next clock cycle, δ c updates the a c as a d (1) or a c (1) . Both a d (1) and a c(1) lead to a path disconnection (PD). The current path of CC is for the flow of a front flit in the path-source BC −1b (that is CC 0 .p c .c p ), and the PD decision is to prepare a next flit flow.
For the PD decision, we classify the typical PD conditions (or reasons) of target NoC arbiters as follows:
• sender-side flit emptiness (pd e ) • packet-level arbitration (pd ) • flit-level destination change in a shared channel (pd d ) • no remained credit or slots (pd v ) • higher-priority PC request (pd p ) If a path-source BC −1b of a CC 0 with a valid pd e has no next residing flit (s −1b f 1 = ∅ f ) or no incoming flit (a −1b f = a i ), then the CC 0 's path will be disconnected if the front flit is scheduled to flow (a 1b f = a f (1) ). If a path-source BC −1 of a CC with a valid pd has a front flit that is a packet's last flit and scheduled to be left (i.e., (s −1b f 0 .h, a 1b f ) = (tail/single, a f )), the path will be disconnected at the next cycle.
If a BC −1b represents a shared channel where payloadtype flits with different destinations (or VC) can be mixed, as shown in Fig. 5 , the CC's path need to be changed depending on an incoming flit's destination. The PD in this situation is a pd d -valid disconnection. If a CC following the pd p receives a higher-priority PC request than the current path, the CC's path will be changed. If a path-terminal BC 1b (p c .c n ) of a pd v -valid CC 0 is unable to receive additional flits at the next cycle due to a consumed credit (or flit slot) by the scheduled flow (a 1b f = a f (1) ), then the path will be disconnected at the next cycle.
The test environment provides PD rules for CCs, which indicate whether each PD reason affects the PD decision and corresponding state-value requirements of the reason. The δ c of CC 0 checks whether any valid PD reasons are satisfied when the CC 0 has a path.
Depending on target NoCs, consecutive CCs can be placed to represent separable allocators. Between CCs, a CC can request a PC to the other CC or grant one from the other by checking the other's p n .c p or p n .c n . The δ c of the CCs update the p n based on the flits of previous BCs, PC requests from previous CCs, receiver-side PC viabilities, or a grant from a next CC. Depending on target arbitration protocols between consecutive arbiters, the δ c of a CC can be multiply executed at the same simulation time, as shown in Fig. 6 . In the example, the δ c of CC (1, 0) updates the p n for an incoming flit. The δ c of CC (2,0) detects the PC requests by checking the CC (1, 0) .p n .c n and computes the p n considering the next-time PC viability of BC (3, 0) . Then, the δ c of CC (1, 0) can confirm the grant by checking whether the CC (2, 0) .p n .c p is equal to the current cell, CC (1, 0) . If granted, the δ c of CC (1, 0) updates the a c as a c (1) .
During the PV 's δ v execution, the s v and s c are updated based on the flit-fetching states (a 1 b f , a 1 c f ) associated with the next-time credit (or flit slot) consumption and return.
In Section III, we propose methods for the event-driven state transitions and their implementation using the rules generated by the proposed test environment.
III. EVENT-DRIVEN STATE-TRANSITION FUNCTIONS OF CELLS AND THEIR IMPLEMENTATIONS
To remove the unnecessary execution of every cell at each simulation time, the state-transition function of each cell is invoked when interested state values of the cell and its neighbors are changed. The state-value changes are informed through the event exchange, and the event is defined as follows.
Definition 4: An event e is any state-value change that affects the state transitions of a cell. Every event is defined as c s , c d , tp, value, t d , where c s is the source cell of event e, c d is the event-receiver cell, tp is the event type, value is for a value for the additional information, and t d is the cycle-level delay (that is typically zero) between the source and destination cells. Note that an event is either an input (that can cause changes in the state of a receiver) or an output (that is generated depending on the receiver's interest).
We will introduce the specific event types for the cells' asynchronous state transition in Section III-A.
and a 1b f . We denote the events for notifying the BCs' state-value change as e bo , e bf , and e af , respectively. If a BC −1b or BC 1b is changed due to a new PC or PD, the event for the path change (which is denoted as e pc ) is delivered to the BC 0 for updating the changed neighbor's state value. The received events can be differentiated using their own sources (e.c s ) and types (e.tp). The changed state values can be accessed through the event source (e.c s ).
The If an a
1b/1c f remains as a f (1) , then the s c needs to be updated at the next clock cycle. For example, if a PV has values of (s 0 c , a 1b f , a 1c f ) as (4, a f (1) , a i ) after the current-time δ v execution, then the difference between the a 1b f and a 1c f requires the s c decrement for every following clock cycle until any a f changes are informed.
For the next update to the PV .s c , we define an event e i that has a delay (i.e., t d >0) and is delivered to itself (as a next-time influencee). The e i executes the PV 's state-transition function at the next cycle for a new decision.
Depending on the target NoC, if a credit-returning BC 1c is located in another router, then a propagation delay of the credit-return signal can be described using a delayed e af whose t d is not zero.
To detect a PC request or PD condition, the δ c of CC references various neighbor states of s f 0 , s f 1 , a f , p n , and s v , as defined in Def. 3. If a CC receives events associated with the change of each individual input state, then the CC could receive a number of events from multiple neighbors.
To reduce the number of events and thus alleviate the overhead of event processing and simplify the δ c decision, we define events for CC from the perspective of the PC-change request. The event e pr requests or grants a PC to other CCs, e wr withdraws a previous PC request, and e pd requests a PD caused by satisfying a valid PD reason. Based on the events, we revise δ c as follows.
Definition 5: The high-level states of CC are revised to id, ( c pr , c pd ) n pr , (c vd ) n vd , p c , p n , a c , where • ( c pr , c pd ) n pr is the path-candidate list of the pairs of a PC-requested cell (c pr ) and related path terminal (c pd ); n pr is the number of PC-requested cells;
• (c vd ) n vd is the list of PC-viable next cells (c vd ); n pd : the number of PC-viable cells;
The δ c of CC inserts the pair of c pr , c pd to its list based on the received e pr . The e pr 's c s is c pr and value is c pd (which is a destination of the c pr 's flit). The c pd can be derived using a target-routing algorithm when a port change between c pr and c pd is required.
BCs can generate e pr for succeeding CCs under two main situations:
1) when following path is not ready for an incoming flit, and 2) when current path is to be disconnected, and a new path is required for an incoming or staying flit. If any valid pd condition of a CC 0 is met, then a path-source BC −1b can detect the scheduled PD by itself or by receiving e pd from other cells. The e pd -received BC −1b generates e pr if a next flit (flit n ) exists, as shown in Fig. 7(a) . The flit n is one of a residing second flit (s −1b
. Depending on the target NoC, when CC −1 and CC 0 are located consecutively, a PC request from a CC −1 can be withdrawn due to the CC −1 's changed PC request or PC establishment to another CC 0 .
In this situation, the δ c removes the previous request from ( c pr , c pd ) n pr after receiving e wr . Otherwise, a pair in ( c pr , c pd ) n pr is removed when the pair is approved for a new path. The (c vc ) n vd is managed by receiving e v s from successive PV s.
Unlike e pr e wr , and e v , for the e pd generation, the states of two cells sometimes must be checked to confirm a PD condition such as pd e . For this case, e ps is defined for a partial satisfaction of the PD condition, and another e ps -receiving cell checks the remaining condition to generate the final e pd , as shown in Fig. 7(b) .
In the example of pd e -valid CC (1, 0) , the BC 1b can realize the partial pd e satisfaction by monitoring (s −1b f 1 , a 1b f ) as (∅ f , a f (1) ) after executing its δ f and accessing s f 1 of BC −1b . The (s −1b f 1 = ∅ f ) means that BC −1b has no extra flit in its buffer except a front flit to be left, and the s −1b f 1 value is fixed by the action-induced ψ (see Def. 2) before BCs' δ f executions at the current time.
However, the pd e 's fulfillment can be changed by the a f decision in BC −1b 's δ f execution because the current-path requirement depends on the existence of an incoming flit (which can be confirmed by checking the (a −1b f = a f (1) )). If the a f of the e ps -received BC −1b is a i , then the BC −1b generates the e pd for the full satisfaction of the pd e condition.
We introduce a detailed mechanism of the event-generation method in Section III-B.
B. EVENT GENERATION USING INFLUENCE FUNCTIONS
At every simulation time, the high-level states of cells are updated after performing actions (through fetching flits or making new PCs) or after executing the cells' state-transition functions. Influencee cells are informed about the changed states through the event delivery. The events are generated using three types of influence functions: the action-influence function (η a ), the state-transition-influence function (η s ), and the e ps/pd -influence function (η p ).
The influence functions are developed considering neighbor cells' dependency on the current cell's state (called the current-state influence). The state influence is identified using the decision rules, which will be described in the Section III-C. The overall event deliveries at each clock cycle are illustrated in Fig. 8 .
After performing actions for active BCs and CCs, the BCs' coupled states of (b o , b f , s f 0 , s f 1 ) are newly updated, and the CC's p c s are changed for new paths. The events for the state changes caused by the action are generated using η a defined below.
Definition 6: An η a of an active BC 0 notifies its infleuncee BC 0 or BC 1b about its stage changes, and the η a of an active CC 0 notifies influencee neighbor BCs about the PC changes, where • η a :
, −/+ represents the previous-/next-state values for the before/after action, and E b a is none or multiple e bo , e bf , e ps , or e pc events. The η a compares an active cell's state values before and after the action in order to detect the value changes. The b o and b f changes of active BCs provoke the generation of e bo and e bf to their dependents. If an active BC is a path terminal of a CC, then the η a can generate e ps to itself for a further valid PD-condition check, which will be discussed in Section III-C.
The η a of active CC sends e pc to path-changed neighbor BCs, each of which relies on the state (b o or a f ) of the opposite-side cell across the previous and new path. The e pc enables the receiver BC to update its neighbor's changed state. If a path is disconnected, then the e pc -received BC 0 updates its b −1b o as 0 o or a 1b f as a i based on the relative position of the event sender (e pc .c s ) from the receiver BC 0 .
The η a -generated events are delivered to a bag of destination cells, denoted as bag s . All e-received cells are executed by the simulation engine for the state transition, and the events in bag s are consumed in their state-transition functions.
After performing the state-transition functions, events for useful state changes are generated using η s , as defined below.
Definition 7: Each η s of BC, CC, and PV notifies its influencee cells about the a f and s v changes, or requests a PC change, where
is a previous-/next-state value before/after the execution of δ f /v/c , and E b s is none or multiple events of e af , e pr , e ps , e v , e wr , or e i . The η s of BC can generate e af for the changed a f to inform the influencee BC or PV (s). When a path for an incoming flit (which can be detected by (a 0 f = a f (1) )) is not ready, then the BC's η s generates e pr for a PC request. When the δ f of a path-terminal BC 1b updates the a f as a f (1) , if the a f (1) decision is a part or full requirement of the path's PD reasons, the η s generates e ps to itself to call the η p function, which checks the PD condition in detail and will be discussed in the following paragraphs.
The η s of PV can send e v to influencee CC(s) to refresh the CC's list of PC-viable cells. When a PV is related to a pd v -valid path, the PV 's η s can generate e pd to preceding BC and CC for the path disconnection. When a PV must be executed at next cycle due to a certain reason (such as the value difference between a 1b f and a 1c f ), then the η s generates e i for the next-time δ v invocation.
The η s of CC can generate e pr , e pd , or e wr to other CCs to represent the separate arbitration of serial CCs. The e pd generation by CC means that a current path will change due to a higher-priority request (pd p ). If a CC needs to be called at the next clock cycle for a certain reason (e.g., an everycycle path change), the η s can generate e i for the next-time path redecision.
Unlike other cells, BC has an additional influence function, η p to check valid PD conditions. For the η p , BC has an extra event bag (which is denoted as bag p ) to collects the e pc and e pd events. η p is defined as follows.
Definition 8: The BC's η p responds to input events of e ps and e pd and generates new PC-change events to preceding cells, where
is one or multiple events of e ps or e pd , and E b r is none or multiple events of e ps , e pr , or e pd .
In bag p , an e ps from a current BC 0 invokes the BC 0 's η p to examine terminal-side valid conditions for the existing-path disconnection. An e ps from a succeeding BC 1b causes the BC 0 's η p to check source-side valid conditions for PD. When the BC −1b receives the e pd from another succeeding cell or an e ps -caused remaining PD condition is satisfied, the e pr of a flit n can be generated. The detailed PD condition and generation mechanism will be discussed in Section III-C.
Management of the extra bag and the influence function not only helps with the influence-function's development by separating the different event-generation objectives, but also prevents unnecessary δ s and η s executions when bag s is empty. After BC's δ s and η s execution, if there are events in bag p , then the BC executes the η p function. If events only remain in bag p and not bag s , then the η p is performed without the δ s and η s being executed.
C. IMPLEMENTATIONS FOR THE δ AND η FUNCTIONS USING DECISION RULES
The decision rules provided for each cell can be utilized to implement any state-transition function (δ) and influence function (η). The rules are generated by the proposed test environment and are shared by multiple cells representing the same RTL block.
The decision rules for BCs have a simple tabular structure, as shown in Fig. 9 . Each rule provides the relationships between the coupled states of (b −1b o , b 0 f , a 1b f ) and the consequent action state (a 0 f ). For an a f decision, the required value of b −1b o is denoted as pre-condition (cond p ) and the required paired values of (b 0 f , a 1b f ) are post-conditions (cond n s). If a state in the cond n has no effect on the a f decision, then the state value is denoted as a don't-care (D).
For making an a f (1) decision for a BC 0 , the BC 0 necessarily has an accessible previous BC that has any outgoing flit
, and this is a default cond p of the a f (1) decision. If the cond p is not satisfied, then the a f is always set to a i .
Based on the definition of BC's cond p , all BCs depend on the previous BC's b o in their a f decision. However, the b 0 f and a 1b f dependence can vary according to the target NoC. As the example rules in Fig. 9(a) indicate, if the a 1b f does not affect the a 0 f change, the BC 0 is independent with BC 1b in the a 0 f decision.
As discussed in Sec. III-B, the possible events for δ f are b orelated events (that are e bo and e pc from previous cells), b frelated e bf , and a f -related events (that are e af and e pc from succeeding cells). The η a of BC −1b/0 and CC −1/1 and η s of BC 1b are designed to produce these δ f -relevant events based on neighbors' or current cell's state-change influence on the a 0 f decision.
Based on the a f -decision rules, δ f is derived using the ifthen construct shown in Fig. 9(a)(b) . Before the a f decision, the events in bag s are consumed to update the state values.
After the δ f execution, the η s of BC is performed to make a a f -change notification and request a PC change. The BC's η s is developed based on the following checkpoints, which refer to the neighbor cells' dependency on the current BC's a f change.
1) The a f change (a − f = a + f ) for influencee cells. 2) A new path requirement of an incoming flit (flit n ).
3) Any a f change (to a f (1) ) that affects a PD.
Each checkpoint leads the generation of e af (to influencee BC −1b or PV s), e pr (to a next CC), and e ps (to itself), respectively. When a next path from an active BC 0 to a succeeding CC 1 is not ready for a flit n (s −1b f 0 ), the η s generates e pr to the CC 1 . When a BC requests PC, if the port (id.p) of the path-terminal cell can differ with the port of the BC, the routing algorithm of the target NoC is utilized to find the terminal cell.
For the 3rd checkpoint, the modeler can reference each CC's table that shows the validity of PD reasons, as shown in Fig. 10 . The proposed test environment generates PD-validity tables for CC. Using the PD-validity table with a default condition table, which describes the required coupled-state values for the fulfillment of each PD reason, we can extend BC's η p as well as the η s/a (for the e ps generation).
To check the fulfillment of a valid PD reason, both cond p (if exists) and cond n should be examined. The cond p is checked by BC −1b and the cond n is done by a succeeding cell. The succeeding cell can be BC (if one of the pd e , pd l , and pd d is valid) and PV (if pd v is valid). The pd p -valid CC checks the PD based on its own NoC-specific custom rules.
When a CC 0 complies with one of pd e , pd l , or pd d reasons and the BC 1b of the CC 0 updates a f to a f (1) , the η s/a of the BC 1b generates the e ps to itself for the further PD-condition examination of η p . The e ps generation by η a enables some active BCs, which skip the δ f and η s execution due to the unchanged values of the δ f -related states, to examine the cond n fulfillment by η p .
During the η p execution, the cond n s of valid PD reasons are examined through the accessing s −1 f 0 , s −1 f 1 or a 0 f . The cond n fulfillment creates e pd or e ps depending on the existence of a corresponding cond p .
When there is an e ps from a succeeding cell, the η p consumes the e ps to check the cond p of the e ps -involved PD condition. The η p should be designed to generate e pr for a flit n when an e ps -related cond p is satisfied or an e pd arrives.
The CC or PV has only η s to influence other cells, and all received events are consumed in δ. A prototypical δ c is shown in Fig. 11 . When the CC's path is disconnected or to be disconnected (by e pd ), a next path is found among the pairs of c pr , c pd in the path-candidate list, considering the PC viability. In the consecutive CCs, the PC-viable cell of a CC can be a PC-granted CC (that is notified by e pr ). For the path selection, the arbitration algorithm of the target arbiter should be referenced for the δ c development. If the CCs are positioned consecutively, the η s of CCs is designed to generate e pr (for a PC request and grant) or e wr (for a request withdraw) based on the arbitration protocol between the separated arbiters of the target RTL design.
The state conditions for pd p are defined according to the target NoC. Based on the custom pd p condition, the δ c of CC is designed to change the current path by itself without external e pd . Then, the η s notifies a preceding cell of this pd pinduced disconnection through the e pd generation. Depending on the custom pd p condition, if a new path decision is required at the next clock cycle, the CC's η s is implemented to generate e i for the next-time redecision.
The δ v of PV needs to be developed using the architecture-specific knowledge because the credit is differently managed depending on the target NoC. Depending on the target NoC, if a credit-return signal is located in other router and a propagation delay of the signal happens, the delayed credit-return signal can be modeled using an e af whose t d is non zero.
For the s c increment or decrement at each clock cycle, due to the mismatched a 1b f and a 1c f , the η s is designed to generate e c whose t d is 1 for the next-time δ v invocation. When a CC 0 compiles with the pd v , the η s of the succeeding PV 1 is required to generate e pd when the s v is changed into 0 v .
IV. EVENT-BASED SIMULATION ALGORITHM
The proposed simulation engine executes NoC CA based on the procedures described in Alg. 1. Every simulation step in this algorithm executes the two phases defined below in the order given.
1) Perform the actions of the previously determined active cells. After the action, the ψ of active BCs and the η a of active cells are executed for the next step. The initial influenced cells are confirmed by η a . 2) Execute the δ of every influenced cell which results in updating its action states and executing the η s/p to generate events. The influenced cells are scheduled in their own lists, as shown in Fig. 12 . To avoid unnecessary execution of state-transition functions δ, the δ is required to be called after receiving as many events as possible. When there are consecutive CC 0 and CC 1 in a flowpath, the later execution of a PC-granting CC 1/0 than a PC-requesting CC 0/1 is advantageous. The priority between CC 0 and CC 1 can be adjusted by the modeler.
Based on the event flow direction, the execution priority is defined as follows:
Definition 9: The BC, PV , and CC have priorities 0, 1, and 2, respectively. A lower number means a higher priority. If there are consecutive CCs, CC can have multiple priorities (e.g., 2 or 3) that can be adjustable according to the target RTL NoC. The multiple lists for influenced cells are employed for prioritizing the order of execution of the cells.
In the first-phase in Alg. 1, the flit-fetching actions of the active BCs are executed, then the path-changing actions of the active CCs are performed to prepare new paths or remove unnecessary paths. During the flit-fetching action, the front flit of a BC −1b is moved to the active BC 0 's buffer.
After fetching a flit to each BC 0 , the ψ and η a of the BC −1b or BC 0 whose flits in the buffer are no longer moved are called. After updating a new path (p c ) of each CC 0 , the (a c , p n ) are initialized, then η a is called. The η a -generated events are delivered to the bag s/p s of their destination cells. The event-received cells are scheduled for their corresponding lists for the second-phase computation.
The previously generated events (which have a delay i.e., e.t d >0)) are stored in E f . The t d of each event in E f decrements for the time advance, and the events whose changed t d is zero are delivered to their destinations.
For the second stage, the engine empties the S c set containing previous active CCs for new active CCs. Then, δ and η of influenced cells are called according to the priority orders, as discussed in Def. 9.
For each BC, the functions of δ, η s , and η p are executed. For each PV and CC, the functions of δ and η s are executed.
Depending on the δ f /c result, the BCs and CCs that become active are scheduled to S b/c . If a BC becomes inactive (which means its a f changes from a f (1) to a i ), the BC is removed from S b .
After the η s or η p execution, each event is delivered to other cell or stored in the E f bag (if the event has a delay). If an influenced cell has a higher priority than cells in the current list (which is one of L bc , L pv , ...), then the lower priority cell is scheduled to its associated list for the deferred execution. If not, the influenced cell is scheduled to a list of imminent cells (L i ) for the immediate execution (as Line No. from 40 to 43 in Alg. 1).
The imminent cell execution is important, especially for BC, to reduce the total number of BC executions when cascading influences occur among BCs. For example, in a situation where a BC 0 's δ f depends on the BC 1b 's a f decision and influences the BC −1b 's a f decision, the later execution of an e af -received BC 0 (from BC 1b ) can cause the BC −1b to be called again due to the following BC 0 's e af generation. If a L bc -staying BC −1b waits for its turn for the δ and η s/p execution but is already called by an BC 0 's imminent cell, the cell is excluded from any further execution because its bag s/p is empty. 
Deliver each e in E 0 to e.c d as Line No. 14 to 16. Unlike other cells, every BC has the additional bag p to handle the PC-change request events in the η p execution (as Line No. from 38 to 39 in Alg. 1).
V. TEST ENVIRONMENT OF RTL NOC DESIGNS
The test environment is proposed to generate 1) the packet-arrival history (for the CA validation) and 2) BC's a fdecision rules and CC's valid PD reasons for a given RTL NoC model. Using the derived PD reasons and the default condition table describing the required coupled-state values of PD reasons (in Fig. 10 ), we can derive CC's PD-decision rules. The overall structure of the proposed test environment is shown in Fig. 13 . The test environment consists of target RTL NoC codes, their corresponding RTL simulator, and the proposed test library that can be loaded by the simulator. The test library utilizes a scenario file that contains a plurality of packet information (i.e., packet length, the source-destination pair, generation time, or initial VC) for random packet generation. The scenario file can be shared with the CA simulator to validate the functionality of CA models by comparing the packet-arrival results of the RTL and CA simulations. After completing the RTL simulation of the given scenario, the library produces the a f -decision rules for BCs and valid PD conditions for CCs.
For the test, the RTL NoC codes (consisting of NoC routers) are extended by adding 1) the packet generator and collector (PGC) blocks that represent local IP blocks, and 2) test statements to specify the RTL signals associated with cells' states and the CA network.
The PGC blocks generate packets based on the scenario. To implement the RTL PGC module and describe the test statements, the designer can utilize a collection of provided test interfaces, as shown in Table 1 . The test interfaces are developed using the Verilog simulator' programming language interface (PLI), which is defined under the IEEE1364 standard [20] . The capabilities of RTL simulators can be extended using the PLI, which enables the user-developed callbacks to be executed before or after the current-time RTL simulation, as well as supporting the reading of RTL signal values.
For the PGC development, the interfaces starting with 'get_pkt_' are utilized to get the scheduled packet information in the scenario file, as shown in Fig. 13 . After obtaining the information for each packet, calling set_pkt_sent is required to update a waiting packet from the current packet to the next packet. During the simulation, the test library traces the CA states via probing the flit transmissions, and the set_flit_arrive informs the library of the arrived flits to collect the packet-arrival times.
To describe each BC's and PV 's associated flit signals and the CA network, the test library supports the interfaces starting with 'add_' (see Table 1 ). The add_bc_in and add_bc_out are utilized to describe each BC's position (using its id) and related signals of input and output flits. The arguments for the interfaces are the BC.id', the signal-type name, and a related RTL signal, as shown in Fig. 14. The signal-type name is one of valid (flit validity), vc (VC index), pid (packet identifier), fid (flit identifier in a packet), head (indicator of header type), or tail (indicator of tail type). Each flit generated by the PGC blocks contains the pid, fid, head, and tail value, so the signals for these values can be extracted from the flit signals. The add_cc is utilized to describe a CC' location and adjacent BCs without any RTL information; thus, the test library derives the PC status by monitoring the output and input flits of previous and next BCs. The PC status of a CC is utilized to generate the valid PD reasons of the CC.
Regarding consecutive CCs, the PC status of the CCs is managed as a merged CC, which means that the generated PD reasons are shared among the series CCs. Thus, modeler's additional efforts are needed to distinguish each CC's PD reason using the merged PD reasons and the RTL implementation of the target arbiters.
To detect the PD by the pd v condition, the designer needs to add RTL signals related to the credit exhaust or buffer fullness (i.e., s v = 0 v ) using the add_pv, which has the arguments of a receiver-side BC, VC index, and 0 v -related RTL signal, as shown in 14. To derive the BC's b f value, the number of flit slots should be specified using the set_bc_size interface.
At the start of a simulation, the test library constructs some data tables to trace each flit's move and each cell's states based on the arguments of test statements employing the interfaces 'add_' and set_bc_size. During the runtime, a test process is executed by the simulator through an iterative callback registration, as illustrated in Fig. 15 . At every low-to-high clock transition time, a first callback (whose type is cbAfterDelay [20] and was previously registered) is invoked before the current-time RTL simulation. The first callback registers a second callback (cbReadOnlySynch [20] ) to be invoked after the current-time RTL simulation. The second callback identifies flits' moves and cells' states by probing the specified RTL signals, then finds rules based on the states. Then, the current-time second callback registers the cbAfterDelay callback to be invoked after the clock-cycle time, which is specified by the set_cycle_time interface.
We will introduce the detailed process of the cell-state identification and the rule derivation in Section V-A and V-B
A. CELL-STATE IDENTIFICATION IN TEST LIBRARY
At each simulation time step, during the second callback, the test library initially records flits' move and cells' states for the further rule-derivation process. Each flit-move data is the paired BCs, one which is the flit's current location and the other is an expected arrival. The flit-move data helps the library to update cells' states, and is stored with the flit's id ( pid, fid ) in the flit-move table, as shown in Fig. 15 .
For the cell-state trace, the library probes 1) each BC's states of (q, When updating the states, some state values are derived using the past values, so the library manages multiple tables at the current cycle time (t c ) as well as previous cycle time (t p ).
Each flit's move is traced until the RTL simulator makes the flit-arrival notification through executing the set_flit_arrive statement. When an output flit of a BC 0 is the input flit of BC 1b at t p , the library fill the BC pair ((BC 0 .id, BC 1b .id)) for the flit row. After comparing all output and input flits of all BCs, some parts of flit-move tables can be filled.
However, when a destination BC has multiple flit slots, some flits cannot be observed, as the flit whose id is 1, 2 in Fig. 16 . In this situation, the position of the flit can be derived using the flit-move table at t p . If a flit cannot be found in BCs' input flits, the flit's current position is changed to the flit's previous-time expected destination. If the destination is blank, the flit's current position is set to the previous position. After comparing the flit position at t c and t p , we can derive the newly moved flits at t c ; the newly moved flits imply that the previous-time a f s (a f (t p )s) of the destination BCs were a f (1) (see in Fig. 16 ). Likewise, the a f (t p ) value is determined at t c . To detect the a f change for the BC-rule derivation (which will be discussed in Section V-B), the library manages an additional BC-state table at t pp for the comparison between the a f (t pp )s and the a f (t p )s.
For updating CC's p c at t c , the test library examines all BC pairs in the current-time flit-move table whether any paired BCs are tied to a CC, then confirms the specific CCs linking the paired BCs. The p c s of the CCs are newly set to the CCs' linking BCs.
However, a certain CC 0 can maintain its path (i.e., p c = ∅ p ) even though the BC −1b does not have any flit to flow. If a previously path-existing CC 0 (p c (t p ) = ∅ p ) does not link any paired BCs, the b o (t c ) of the previous-time path source (p c (t p ).c s ) needs to be checked whether the p c (t p ).c s has any flit at t c . If the b o (t c ) of the p c (t p ).c s is 0 o , then the p c (t c ) of the CC 0 is set to unknown. During the further rule-derivation process, if an unknown p c (t c ) is referenced, the process result is ignored.
B. DECISION-RULE DERIVATION IN TEST LIBRARY
To derive the BCs' decision rules, when the pre-condition
At every clock cycle, when a required condition is satisfied, the corresponding PD reason becomes valid. If multiple conditions of PD reasons are met, the library ignores the PD occurrence due to the ambiguity of the decisive PD resaon.
When a PD occurs in a CC, if the previous-time pathsource BC does not consistently have a flit and any condition except the pd e is not satisfied, then the pd e is set to 1 after the test. If PD always occurs after a last-flit transmission, then the pd is set to 1. If a sender BC is for a shared channel, then the pd d is set to 1. If PD always happens when there is no credit (or flit slots), the pd v is set to 1. If a PD happens but all upper required conditions are not met, then the pd p is set to 1.
VI. EXPERIMENTATION
We applied the proposed approach to types of open source Verilog NoCs: 1) CONNECT [21] and 2) a short-path (SP) NoC deploying a pipeline bypassing and a combined arbitration using series arbiters [22] , [23] . In this experiment, we targeted the NoCs with a 4-by-4 router topology. Based on the positions of flit-staying FFs, the arbiter's connection with other blocks, and the pipeline-bypassing method (only for the SP NoC), we identified the CA networks of CONNECT and SP NoCs, as shown in Fig. 17 .
In the CONNECT, there are three types of BCs and two types of CCs. Each BC type represents 1) an input queue, 2) an intermediate FFs for a flit to wait for an arbitration grant, and 3) an output queue. The BCs have their own id.d as 0, 1, and 3, and are denoted as BC 0 , BC 1 , and BC 3 , respectively. The BCs are located symmetrically on each port and VC in each router.
There are two types of CCs that handle path-connection (PC) requests from 1) BCs on different input ports or 2) BCs on different VCs at an output port of an adjacent router. The CCs have their own id.d as 2 and 4. Each CC 2 makes a PC between a BC 2 and a BC 3 , and CC 4 establishes a PC between a BC 3 and a BC 0 in an adjacent router.
There are two types of PV s for a CC 2 and a CC 4 . The PV for CC 4 has id.d as 0 and receives e af from BC 0 and BC 1 to monitor flit-in and -out movements in BC 0 . The PV for CC 2 has id.d as 2, and receives e af from BC 3 and BC 0 in an adjacent router to probe the flit-in and -out moves of BC 3 .
The SP NoC has four types of BCs and three types of CCs. Among BC types, one type of a BC represents a queue at each VC and other types of BCs symbolize VC-independent common flit-staying FFs (shared channels). The BCs for shared channels are located at input ports, the destination nodes of switch, and output ports. The BC for a queue has its id.d as 1 and the BCs for shared channels have their id.ds as 0, 4, and 5.
There are three types of CCs, which have their own id.d as 1, 2, and 3. The CC 1 is for a flit bypassing to reduce the latency without staying in a queue, and the CC 2 handles PC requests from BC 1 s and CC 1 at each input port. The CC 3 handles PC requests from CC 2 s at different input ports. There is one type of PV (whose id.d is 4) for a CC 3 to check the receiver-side PC viability. The PV manages the credit consumption and return by receiving the flit-inflow or flit-outflow events (e af s) from a BC 4 in the same router and BC 4 s in adjacent routers. To generate rules for the NoC CA, we supplemented test statements for BCs in the RTL modules that are listed in Fig. 18(a) . Those modules have incoming and outflowing flit signals of BCs. The flit signals have the values of of the flit type, valid, pid, fid, and vc. The information is passed to the test environment using the proposed test interfaces (which are add_bc_in, add_bc_out, and set_bc_size), as the example in Fig. 18(b) .
The test statements for CCs are also appended to specify the CC's position and neighbor BCs' id using the add_cc interface. Comparing to the CONNECT, the SP NoC has the consecutive CCs. The proposed test environment does not support the diagnosis of PD conditions of individual CC 1 , CC 2 , and CC 3 in consecutive CCs. Thus, we specified a merged CC at each input port for a common PD-validity table, as shown in Fig. 18(c) .
The CONNECT NoC follows the on/off-based flow control considering the buffer availability, and the SP NoC does a credit-based flow control approach. Using the add_pv interface, we specified the RTL signals related to the fullness of the buffers at input and output ports on each VC. For the SP NoC test, we described the RTL signals for the credit-exhausted state of each VC, which are the full_ovc signals in the rtr_fc_state module at each output port.
Under various packet-generation scenarios, the test environment produces the overall decision rules of BCs and PD-validity tables of CCs, as described in Fig. 18(d) .
Compared to the CONNECT, the boundary condition, cond b (described in Sec. V-B), of the SP NoC is not found, which means that BC 0 s can receive a flit when the pre-condition (defined in Sec. III-C) is met. In the SP NoC, flits can flow across the PC-established CCs without any suspension because CCs always maintain their paths when flit-arrival BCs are available.
Using the common PD-validity table of the SP NoC, we manually derived the PD conditions of each CC based on the RTL implementation. The CC 1 follows a bypass mechanism: if the allocated VC of an incoming flit is different from the VC of a current PC, CC 1 requests a new PC to a next CC 2 . When the CC 2 already has a flit path or the requested VC is already asked by another BC 1 , the CC 2 ignores the CC 1 's new request. If the CC 1 does not receive a grant, the CC 1 makes a PC between a previous cell (BC 0 ) and a VC-related BC 1 . Since CC 2 and CC 3 represent the same RTL arbiter module, the PD conditions of CC 2 and CC 3 are identical, as shown in Fig. 18(d) .
The CC 2/3 employs custom rules for the pd p -caused disconnection. The CC 2/3 manages the PC requests from preceding cells using two priority groups. Each PC request in the high-priority group represents a reconnection request of a previously disconnected path caused by a credit emptiness or another higher-priority request. The PC request in the low-priority group is for a new packet transmission.
The priorities among requests in each group are determined using the target round-robin algorithm. If there are requests in both high-or low-priority groups, then a request in each group is granted. If a PC request in the high-priority group is allowed, then the grant in the low-priority group is ignored. When there are multiple requests in the high-priority group, and one of the requests is allowed, since the priority of the newly granted request becomes lowest in the target NoC, another request in the high-priority group can be granted at the next cycle. For the next-time path re-decision, the CC 2/3 can generate e i for the next-time trigger.
Based on the BC's a f -decision rules, CC's PD-validity tables, and the CC's default condition table (in Fig. 10 ), we developed cells' δ and η functions using the target XY-routing algorithm, the target round-robin arbitration algorithm, and specified events (which are described in Fig. 18(e) ).
In the CONNECT CA, the CC 2 receives e pd from its next BC 3 after checking the pd condition. The CC 4 receives e pd from its previous BC 3 for the pd e condition or does from the PV 0 on the adjacent router for the pd v condition. Both PV 0 and PV 3 updates the PC viability (s v ) based on the number of the next-time available flit slots (s c ).
In the SP CA, the CC 1 receives e pd from its previous BC 0 after checking the pd d condition. The CC 2 (or CC 3 ) receives e pd s from its BC 0 (for the pd e condition), BC 4 s (for the pd condition), PV 4 s (for pd v condition), or the adjacent CC 3 (or CC 2 ) (for the pd p condition). The PV 4 decides the s v based on the remaining credits, the flit inflow (the a f of an associated BC 4 in the same router) and the flit outflow (the a f of a BC 4 in an adjacent router), and a PC re-establishment situation of a suspended packet transmission.
For the validation of CA models, we extended the CA simulator to read the packet transmission results of the target RTL NoC and confirm the matches of the packet arrivals between the CA and RTL models at each simulation time step. Under various randomly generated scenarios, we verified the CA models of the CONNECT and SP NoCs using the runtime assertion method, checking the packet-arrival difference at runtime. An example of the packet-arrival equivalence is illustrated with the signal waveforms of the PGC-arrived flits, as shown in Fig. 19 .
For the simulation speed comparison, we measured the execution time of Verilog and CA NoC models by increasing the number of injecting packets per node, keeping the 0.4 average flit-injection rate per cycle to prevent packet loss caused by traffic congestion. We used an experimental machine whose CPU is Intel R Xeon R E-2176 2.7GHz and memory size is 64 GB. The utilized RTL simulator for the test-library development and speedup experiment is the Mod-elSim 10.6d. The equivalent cell components and proposed simulation engine are developed using C++.
The overall execution times of RTL and CA NoC models are shown in Fig. 20(a)(b) . Based on the execution times, we obtained the speedup from 14.9 to 53.2× for the CON-NECT NoC, and 78.4 to 235× for the SP NoC, as shown in Fig. 20(c) . Based on the observed data, we derived linear-regression equations for the execution-time prediction regarding the increased number of transmission packets by 200 (tx 200 ), as shown in Table 2 . In the experiments, the tx 200 values are 1, 2, 3, 4 and 5.
The regression equations can be divided into a time constant for the simulation initialization and the rate of the execution-time increment regarding tx 200 . After comparing the constants and slope coefficients between the RTL and CA regression equations, we acquired the initialization and runtime speedup, as shown in Fig. 21(a)(b) .
In the equations of CA, the increase in the the slope coefficients of tx 200 for the VC increment is almost negligible, which means that only active cells of the proposed CA simulator are asynchronously computed regardless of the size of CA network. The asynchronous cell computation is accomplished by the exchange of state-change events between cells, and this guarantees the scalable NoC simulation.
The initialization speedups are highly linked to the reduced number of model components between RTL blocks and CA cells. The runtime speedups are related to the reduced number of state-change events between RTL signal-value changes and cell-state changes. As the tx 200 goes smaller, the initialization speedup is more dominant than the runtime speedup. If the tx 200 goes to infinity, the overall speedup would converge the runtime speedup. Based on this relationship, the overall speedups (that are shown in Fig. (c) ) can be explained.
VII. CONCLUSION
We proposed a high-level NoC modeling framework and its simulation engine based on event-driven CA execution to abstract the detailed RTL operations and invoke active cells, rather than executing stable CA. The proposed CA represent the operations of the target RTL NoC as state transitions of three types of cells: BC (for the flit fetching and staying), CC (for the PC change), and PV (for the PC viability of receiver-side BC). Each cell in the CA notifies its changed state to the state-dependent neighbors through the event delivery, and the events are generated during the influence function execution. To alleviate the CA modeling efforts, we proposed a test environment that generates the BCs' decision rules and the CCs' PD conditions. Our evaluation showed various CA speedups (up to 235 times), varying the number of injected packets and VC with valid cycle-accurate transmission results.
