Abstract. Network-on-Chip (NoC) architecture has been proposed to solve the global communication problem of complex Systems-on-Chips (SoCs). However, NoC testing is a main challenging problem yet. In this article, we propose novel test architecture for NoC router testing. The proposed test architecture uses both advantages of Software Based SelfTesting (SBST) and Built in Self-Testing (BIST) methodologies. In this methodology, we propose custom test instructions with regarding the ISA of NoC processors. These custom instructions are responsible for applying test patterns and collecting their responses. In the proposed approach, the processor cores are used to manage the whole test operation of their corresponding router. So there is no need for expensive Automated Test Equipment (ATE) to access internal circuit of NoC, while bringing an efficient at-speed testing paradigm and also No need to packet transmission between NoC nodes to test the communication blocks or router. As a case study the Heracles architecture is used and the experimental results show the efficiency of the proposed test methodology over functional test strategy in the term of test time and fault coverage. With only 7.2% hardware overhead in router circuit, the proposed architecture reveals 74% Percentage decreases in test time and 4% percentage increases in fault coverage.
Introduction
Unlimited market demands for new technologies have induced a remarkable evolution of the integration capacities and Gates scaling continue to fall down. Thus, many cores can be integrated in a single die and making it possible for systems-on-chip to improve performance [1] [2] [3] . NoC has been proposed to solve the global communication problem of complex SoCs. NoC as a high performance and scalable communication mechanism, is being increasingly investigated by researchers and designers to address the issues of interconnect complexity in Multiprocessor System-on-Chips (MPSoCs) [4] . A network on chip consists of routers, Figure 1 : A typical structure of an NoC links, network interfaces (NIs), and cores and can be defined as a set of structured routers and point-to-point channels interconnecting IP cores (see Figure 1) .
It is necessary that SoC designers consider a test method for their new SoC architectures. Like other SoCs, an NoC has to be tested for manufacturing defects. Thus, many studies have been done to deal with NoC test.
Some studies try to model various potential defects of a typical NoC at high level and propose a solution to deal with them. For example in [5] Karimi et al. proposed a system level fault model based on the generic properties of NoC switch functionality. In this research, Various forms of functional switch faults were considered including dropped and corrupted data faults, direction faults, and faults resulting in multiple copies of packets in time and space. For each of these faults they proposed an error detection and diagnosis method.
Although, high level analyzing complicated SoCs show the global functionality of NoCs when facing different potential defects, but modeling at lower levels is more realistic. Considering the implementation of NoC at lower levels, we can find that the Controllability/Observability of NoC blocks is relatively low since they are extremely embedded and spread across the chip [6] . Thus, some works [6] [7] [8] [9] used Built-in Self-Testing (called BIST) architecture to overcome the mentioned problem. BIST based methodologies have high chip area overhead that causes performance degradation and excessive power dissipation. On the other hand, some other works distributed the test structure of NoC to cooperate with the communication architecture and decrease the hardware overhead. Thus, They inserted test managing components to the network and the basic strategy is to transmit test patterns to circuits under test through tested components [10] . This approach consists of two phases for NoC testing, the first one is to test communication fabric when the router (switch) can be regarded as a IP core and the same test patterns can be applied to all the routers, the second one is to reuse the communication fabric as the test access to resources and transfer test data packets through particular testing input. For example, in [11] one switch is chosen to be connected to the source of test data (named as Test Access Switch, TAS) and all the test set are broadcasted from TAS. This strategy is based on broadcasting the switches test patterns through the on-chip network for detecting faults by switches responses comparison. In [12] Sedghi et al. applied two TASs, one is located on the lower left of the network and the other is at the upper right. The advantage in test time was showed, but the location of TASs cannot change and specific routing algorithm is required. On the other side, the parallelism of test data transfer is needed to consider [13] . The communication means can be divided into unicast (point-to-point packet transfer) and multicast (one-to-many packet transfer). Among them, multicast can efficiently improve the parallelism of data transmission, so how to apply multicast transfer mode to testing is worth studying. Fang et al. [13] proposed a multicast paths testing method which modify multicast communication protocol for test and improve the testing parallelism based on Virtual Channels. In [14] Zhang et al. proposed a configurable TASs and applied with multicast transfer mode. It is greatly adaptable for parallel testing of routers and resources.
In this paper, we propose a novel test architecture solution for NoC testing. In this test architecture, we use the processing capability of NoC processors to manage whole testing process. At first, using the processor itself for managing whole test operation was presented for embedded systems with single processor and known as SBST [15] [16] [17] [18] [19] . SBST is a nonintrusive technique in nature, because it utilizes its own processor resources and instructions to execute self-testing without any hardware overhead. SBST can potentially provide sufficient testing quality without impact on performance, area, or power consumption during normal operation. But by emerging complex embedded systems with multiple processors, some works tried to reduce test time using the same SBST by employing different techniques of core-level parallelism [20, 21] . The mentioned SBST methods deal only with the faults associated with internal blocks of the processor (for example Register File, ALU and etc.). But in this work we have modify the NoC architecture in a such way to test the communication blocks (router).
Our proposed Hybrid SW/HW test architecture takes both advantages of BIST and SBST methodologies for efficient testing of routers. The idea behind our proposed methodology relies on the fact that each core of the NoC can act as a message generating center which generates the test packets required for router testing. This paper is organized as follows. Section 2, discusses about the proposed methodology, then in section 3, the proposed methodology is applied on our studied 2D-mesh NoC. After that in section 4, experimental results on the selected case study are reported. Finally, the conclusion section summarized our work.
Proposed Test Architecture
In the proposed test architecture, the main goal is testing NoC routers. This architecture takes the advantages of both BIST and SBST methodologies simultaneously. From SBST point of view, the capability of the processor cores is employed for managing testing operation and on the other hand, from BIST point of view a built-in hardware is established to execute the test process. Thus, we can look at the proposed architecture from two different viewpoints (i.e. test software and test hardware). The test software has the responsibility of test process managing while the test hardware executes it.
In this article, we focus on 2D-mesh NoC topology. 2-D mesh topology is one of the most practical and extensively used network topologies [22] [23] [24] . Figure 2 shows a typical 2D-mesh NoC topology including our proposed test architecture.
This test architecture can be considered from two different viewpoints (test software and test hardware) which are explained in following subsections. A special custom test instruction is added that can directly change the NoC state into test mode and generates the required test packets. After sending all test packets, its necessary to return back the test responses for fault analysis. Thus, we also have considered other custom test instructions for the mentioned task.
Test Hardware. As shown in Figure 2 , built-in hardware in the NoC architecture is formed to make it more testable. The Test Hardware composed of the following components:
• instruction controlled test packet generator This module generates the router's test patterns that are controlled by the new added custom test instruction.
• signature generator This module has the responsibility of compacting and generating the signature of the Register chain data • instruction controlled test response loader This module returns the generated signature of responses back into the processor for fault analysis.
• MUXs
MUXs are used to switch circuit between normal and test mode.
Test Process. In the previous section, we have reviewed the big picture of our proposed test architecture for router testing. This architecture tests the router circuit using the deterministic test packets at instruction level. These deterministic test packets are generated based on the prior knowledge of internal structure of the router. • Crossbar Arbiter is the most important component of a NoC among the mentioned blocks, since it is at the center of main pathway and does the main task of the router. In fact, router is the milestone of a NoC and arbiter is the heart of router as well.
In the proposed test pattern generation technique, instead of generating test patterns for entire router circuit which results in low fault coverage or generating test patterns for different components individually that lead to high hardware overhead, first the optimized test patterns are generated for the arbiter and then those are mapped into applicable test packets to test the entire router circuit.
Our proposed methodology for router testing process is done in three steps as shown in Figure 4 .
In the following subsections these steps are illustrated briefly. Test patterns generation. Due to the complexity of arbiters, their testing with random test sets or using traditional ATPG tools is not efficient enough. For obtaining optimized test set, we proposed a genetic based algorithm. Although we have used the mentioned algorithm for the arbiter, it can be employed for any circuits and it requires no knowledge of the internal structure of the circuit except inputs/outputs. The main I/Os of a typical arbiter circuit is shown in Figure 5 . The proposed algorithm for generating optimized test set for the arbiter is done in 4 steps:
Assume k containers with initial random value in which each container contains n number of m-bit random test patterns in which m is the number of bits required for arbiter and n is the number of test patterns.
(1) Sort containers. Compute fault coverage of CUT for each container and sort the containers according to their fault coverage. (2) Copy the container with the best fault coverage to the remaining containers It should note that 10% of containers always get random value to avoid test set staying in local minimum. (3) Generate new population of containers In our proposed algorithm, new populations are generated through combination three different ways:
• Select random values for row and column and change corresponding bit value.
• Select two random values for row and interchange their values.
• Select a random row and fill it with random value. Note that each of the mentioned population generation techniques are employed for n/3 times during generating the new population for each container. (4) Repeat step 2 to 4 until fault coverage variation remains less than the threshold value. It is worth noting that the arbiter has some constraints on its input which depend on the NoC architecture as well as its routing algorithm. Thus it has to be taken to an account during the generation of test patterns. However, we would investigate these constraints with more details in the case study section.
When the optimized test set is ready for the arbiter component, it should be mapped into applicable test packets for entire router. Mapping mechanism is related to the NoC architecture and is explained in the next section.
Test Packets generation. The test patterns generated in the previous sections must be delivered to the input ports of the router components. For this purpose, some test packets Figure 6 : General structure of message in NoC are generated that carry test data. Therefore, through the message passing process of these packets, the corresponding router components are being tested. General structure of the message in NoC architecture is as follows:
Each message consists of one or more packets while every packet composed of individual flits. The information about destination and source core, data values and control signals are placed at these flits. Additionally, each flit includes one or more phits. The Phit is the smallest unit of any message that can be transmitted through one clock cycle. Figure 6 show a general structure of a message in NoC.
In order to obtain better results, the message passing flow inside the router should be considered during test packet generation. If we consider this process from beginning to end the incoming message passes the following steps to reach the output ports as shown in Figure 7 .
Test application. After generating the test packet using the proposed methodology, we have to apply them to entire router and analysis the fault responses. Our presented test architecture in section 3 completely satisfies this requirement. In this architecture, we have introduced two custom test instruction formats for the router testing in which the first one applies test packets and the second one collects the test responses. With regarding the values of different fields in the test message, mapping these values into a custom instruction is easily possible that are illustrated on our studied case study in the next section.
Case Study
In order to show the efficiency of the proposed methodology, 2D-mesh NoC architecture is studied. The next subsection includes the introduction of chosen case study and the application steps of the proposed methodology on it. NoC Architecture. This study uses Heracles toolchain (a configurable NoC) to apply the proposed test architecture. Heracles is completely configurable for different NoC topologies. The NoC architecture specification in this case study is as follows.
• 32-bit MIPS processor of cores • 2D-Mesh topology • One packet in every message • Buffered Router input ports with two virtual channels for each buffer port • Bufferless Router output ports • DOR XY as routing algorithm As shown in Figure 1 , in a 2D-Mesh NoC, each core is connected with 4 surrounding cores. The communication between the cores has done through the router. Thus the building block of a typical router in a 2D-mesh topology would as the diagram shown in Figure 8 .
2D-Mesh Heracles Router
Testing. In this section, the application of proposed method for router testing of 2D-mesh Heracles NoC is illustrated.
Test patterns generation. As mentioned in the previous section, the proposed optimization technique requires no knowledge of the internal structure of the circuit and needs only I/O configuration.
Communication between the cores in the NoC is based on the routing algorithm. The routing algorithm directly affects the inputs of the router and its internal components. Thus we have to consider the constraints of the arbiter input space from software view point. Considering the XY routing algorithm of our case study, where a packet is first routed on the X direction and then on the Y direction (making it impossible for a packet that comes from the N or S ports to be routed to a W or E ports). With regarding the mentioned constraint, the total number of possible arbiters test inputs would be 1350 in 2D-mesh topology that may appear in normal functional mode of the NoC and the rest entries will be invalid and never happen in normal working of the NoC. Thus, the fault associated to the rest combination of inputs can be removed from the fault list.
As mentioned before, we have to generate test patterns for router components and then generate test packets based on these test patterns. For optimizing this test pattern generation, we propose a method which employs genetic algorithm.
The proposed method is applicable for any circuit (either combinational or sequential) in which for generating optimized test patterns, at first we assume k containers that each one contains n number of m-bit random test patterns (see Figure 9 ). Then we apply the test patterns of each container and select the best container as the parent one to generate the next generation. We iterate these steps until obtaining the desired fault coverage.
We have examined our methodology with different value of k and n as shown in Figure  10 . Experimentally, the best coverage with minimum number of test patterns are obtained in k = 20, n = 50.
After generating the optimized test patterns for the arbiter, the next step maps these patterns to applicable test packets. Test packet generation. In this section, we aim to map the generated test patterns for the arbiter into suitable test messages (packets). To achieve it we should analysis the format of messages and determine the best values for each message.
Given the mentioned assumptions in the previous section, the form of each message in our studied NoC architectures will be as Figure 11 where ( Src row ,Src col ) shows the row of the source core and (Des row and Des col ) shows the column of the destination core. F lit type determine the type of the flit (Header-Body-tail), V C bits indicate the virtual channel the input buffer port to be stored.
The most crucial part of the message that mostly affects the decision made by the router is the destination core coordinate value. The remaining values can simply replace with 0 or 1. The destination core coordinate can be directly computed from the arbiters test patterns.The most crucial part of the message that mostly affects the decision made by the router is the destination core coordinate value. The remaining values can simply replace with 0 or 1. The destination core coordinate can be directly computed from the arbiters test patterns.
Test application. In the previous sections, according to our proposed test architecture we have prepared hardware requirements for generating and applying test packets to the router circuit. In this section we explain the software aspect of our method which manages whole Figure 12 : Test-apply instructions format test process. Thus, we have introduced two custom instructions for achieving this goal. One is used to apply test packets (test-apply instruction) and two other instructions are used to collect the test responses (test-gather instruction).
These custom instructions are represented based on the ISE of our case study in which three formats are available (R-type, I-type and J-type). In all of the instructions in ISA, first 6-bit is used as opcode and the remaining bits are dependent to the instruction format. Thus, we also consider first 6-bit of custom instructions as opcode and the remaining bits are explained as following.
(1) Test-apply instruction format As the first step in designing the custom test instructions of proposed methodology, the length of that instruction must be determined that mean we have to investigate the NoC router architecture and obtain the number of bits that are required for test packets generation.
At most, 5 input messages that simultaneity enter into router inputs. Thus we have to generate these 5 messages by a single test instruction. Due to this, test-apply instruction must be capable of determining:
• Which input port is valid to be stored in the virtual channels?
• Which virtual channel is used to store message?
• Which output port is requested by the message? As mentioned earlier, in 2D-mesh topology the number of bits required for valid input ports, virtual channels select signal and requested output port is 5, 1, and 15 respectively. Requested output port takes 15 bits, because we need at least 3 bits for representing each output port (North-N, Sought-S, West-W, East-E and Local Core-LC). Thus, only 28 bits are required to satisfy our requirements for generating test packets and it is not dependent to the size of NoC as shown in Figure 12. (2) Test-gather instructions format After applying test packets using test-apply instructions, it is necessary to compare the test program response for fault analysis. To obtain the test response, we have proposed a simple two stage accumulator based signature generator (SG) circuit. In the first stage the router outputs to the neighbor routers (W, E, N, S, LC) are compacted into vector with fifth of initial size and it is simply being done by adding these outputs. In the second stage an accumulator is used to generate a signature of responses during execution of test program. Figure 13 show the building blocks of proposed SG.
After generating the signature of test program response, we need to turn back the generated signature into register file. This task is done using two other custom test instructions named as test-gather instructions.
Considering our assumption for the studied NoC, the size of signature after compaction is still greater than 32-bit. Thus, it cant be moved into a single register (32-bit), thus we have used 2 separate instructions to move the signature register into register file. One instruction moves higher 32 bits (named as HI) and another 
Experimental results
This section presents the experiments that are done to validate the proposed test method in the term of test time and fault coverage.
As described in the case study section, Heracles is used in all of our experiments. Heracles is an open-source, functional, parameterized, synthesizable multi-core system toolkit. Such a multi/many-core design platform is a powerful and versatile research and teaching tool for architectural NoC exploration. It is designed with a high degree of modularity to support fast exploration of future multi-core processors of different topologies, routing schemes, processing elements (cores), and memory system organizations. Its Hardware modules are implemented in synthesizable Verilog where each core is a 32 bit 7-stage pipeline MIPS processor. The Heracles tool is freely available under the open-source MIT license [25] . We configured the Heracles to meet our assumptions as declared in the previous section.
In all of our experiments, we have considered single stuck-at fault model for router testing. In each experiment, first the RTL Verilog model of the router has been divided into its components. Then, each component has been synthesized into its gate level model for being used in fault simulation process. The gate count of these components in a typical 44 2D mesh NoC are shown in Table 1 . The Mentor Graphics suite and Xilinx ISE are used for fault simulation and Verilog synthesis respectively.
According to the proposed test architecture shown in Figure 1 , we have added some components (test hardware) into the router circuit that executes the test operation. These components with their gate counts are shown in Table 2 . Although some modifications are done in the instruction decoder and datapath of the cores processor to decode test instructions and prepare the required control signals in test mode, but these modifications are negligible when comparing with the processor circuit.
Comparing the gate count of original router and our added test hardware indicates about 7.2% hardware overhead of modified router circuit with respect to its original one. AND  10  452  55  52  52  NAND  0  0  0  0  0  OR  5  395  28  0  428  NOR  0  0  2  0  2  XOR  0  415  5  0  420  XNOR  0  0  0  0  0  NOT  5  42  55  0  102  BUF  20  254  203  105  582  DFF  0  42  0  0  42  SUM  40  1600  348  157  2145 Finally, we have developed a test program for testing the router according to the proposed test methodology. As discussed in the previous section, the proposed test program mainly has composed of test-apply and test-gather instructions (test software). For evaluating the proposed test architecture in the term of single SA fault coverage, we have used simple serial fault simulation process in which at first, a single SA fault is injected into one of the sub-components, and then the whole test-program is executed to determine if the injected fault is detectable by the test program or not. This process is done for all faults in order to obtain the fault coverage for each router component. All of fault simulation processes are done in modelsim -verilog.
Although, the proposed test architecture is relatively independent of the size of NoC. But our experiments are done on 4 × 4 2D-mesh topologies and the fault coverage results of routers components with their fault counts are shown in Table 3 . Note that the fault count shows the number of faults after fault collapsing [26] .
In order to analyze the effectiveness of the proposed methodology, we also have compared it with a functional test method using ATPG tool. Thus, the router under test is synthesized into its gate level and then we used a non-commercial ATPG tool [27] to generate the test patterns. In this experiment the routers inputs are assumed to be controllable and the routers outputs are assumed to be observable by the ATE. Table 4 shows the result of ATPG tool and our proposed test architecture in 4 × 4 2D-mesh topology. The obtained test results are compared with the results of functional test using an ATPG reported in [28] .
Conclusion
We have proposed a test methodology for router testing in NoC architecture. Our test architecture used both advantages of SBST and BIST methodologies in which custom test instruction are proposed to apply test patterns and collect the test responses. The proposed test architecture for router testing has the following advantages:
• Use the processing capability of cores in test mode (SBST).
• Does whole test process inside the chip (BIST) By adding acceptable hardware.
• Obtain optimized genetic based deterministic test patterns are generated for router with regarding internal blocks of the router during test pattern generation process.
• Reach high fault coverage with low test time.
• No need to packet transmission between NoC nodes to test the router. Our experiments are done on a real NoC hardware platform that is developed by MIT University. Experimental results show that the proposed architecture for router testing with only 7.2% hardware overhead has improvement about 4% in fault coverage while reducing 74% and 81% in test time and test patterns size respectively compared to ATPG tool.
