In optical Network-on-Chip (NoC), packet switching remains popular owing to its scalability and reliability. Considering no mature optical buffering technology, deflection routing is preferable to resolve the output port contention. Reduction of deflection probability is essential to bufferless deflection optical NoC. This letter proposes a new 5*5 router architecture, and especially a deflection-supported switching fabric. Moreover, an ejection unit and an injection unit are designed to reduce the deflection. Additionally, priority-based routing computation and port allocation algorithms are designed based on the new switching fabric. The simulation results show that our proposal can improve performance at acceptable insertion loss.
Introduction
The arrival of multi-processor on chip era is a corollary of Moore's Law that the number of cores doubles every 18 months. While this trend can address several problems, it brings new challenges. As the number of the processing cores increases, electrical interconnection will severely degrade the performance of NoC, e.g. bandwidth, latency, and power consumption. Along with Wavelength Division Multiplexing (WDM) technique and necessary on-chip optical devices such as micro-ring resonators, waveguides, and photo-detectors, silicon optical NoC offers a potential opportunity for highbandwidth, low-latency, and low-power interconnection on chip.
The resource utilization of circuit switching is relatively lower, thus existing optical NoC's performance cannot be improved significantly. The advantages of flexibility, scalability and high throughput make packet switching still popular in optical NoC. Due to the cost of optical buffers, bufferless deflection routing is preferable to solve the output contention problem in optical NoC using packet switching [1] . Under deflection routing, the port allocation for the packets is based on their priorities. In general, the priority of each packet is determined by age, distance, deflection times, Quality-ofService, or some other parameters [2] . The high-priority packets can obtain their desired output ports in the contention, while the other packet will be sent to the other unoccupied output ports.
The maximum insertion loss is the maximum attenuation experienced by the optical signal, which must be considered when designing the optical output power for the lasers and the sensitivity of the optical receivers [3] . The insertion loss of each optical signal is the sum of the loss experienced in each router along its route path, so the maximum insertion loss is in proportion to the maximum path length and the loss in each router. Therefore, decreasing the probability of deflection to decrease the maximum path length is important to the bufferless optical NoC using deflection routing.
This letter designs a bufferless deflection optical router architecture. Ejection unit and injection unit are added to reduce deflections, and a deflectionsupported optical switching fabric using parallel switching elements was designed. According to the new switching fabric, this letter proposes prioritybased routing computation and port allocation algorithm to reduce the number of ON-state resonators in each cycle.
Optical router architecture
Network performance, area consumption, power consumption, maximum insertion loss, and crosstalk noise are basic requirements of optical NoC routers. As for the most popularly used mesh and torus, 5*5 is needed to connect the E, W, N, S and local port. It is desirable that the optical switching fabric inside the router is internal non-blocking, in order to increase the network throughput. Besides, the number of microring resonators should be rational to save the area consumption. Moreover, decreasing the number of waveguide crossings contributes to lower the insertion loss and the crosstalk noise. In addition, for the sake of reducing the power consumption, the optical router is supposed to turn on as few microring resonators as possible in each cycle.
On the basis of the requirements above, a 5*5 router is designed as shown in Fig. 1.(a) . The header of each input packet is separated to the control path, while the payload is forwarded to the data path. The control path contains O/E converter, routing unit, and E/O converter. The data path includes delay lines, ejection unit, optical switching fabric, and injection unit. Finally, the header and the payload of the same packet are recombined via combiners. The solid arrows represent the optical interconnection implemented by waveguides, and the dashed arrows represent the electrical interconnection to deliver the control signals. When routing computation and port allocation are executed, the payloads must wait on the delay lines. Each delay line is realized with 100 identical micro-ring resonators that are all working in-tune to produce very large optical delays of 500 ps [4] . The existing NoC can work at the frequency of more than 1 GHz, so each cycle is less than 1 ns. Similar to electronic NoC, the routing computation and port allocation in our proposal occupy 1 cycle, respectively. Therefore, the required time of the routing unit is estimated to be 2 ns. As a result, 4 delay lines are needed per non-local port in our design.
In order to decrease the probability of deflections, an ejection unit as shown in Fig. 1.(b) is added to avoid contention for the IP core. The packets, that request to eject simultaneously, can be sent to the E/O&O/E converter via the ejection unit. The packets are converted to electronic signals simultaneously in the E/O&O/E converter, and then the electronic signals are serially sent to the IP core via the only local port. Fig. 1 .(c) demonstrates our non-blocking optical switching fabric in detail. Above all, different from conventional optical router, the optical switching fabric should permit U loops, because deflection may cause a packet transmitted to its original input port. As is shown in Fig. 1.(c) , the 0-4 microring resonators provide the U loop for each port. Besides, the adoption of parallel switching elements reduces the number of waveguide crossings. Thus, the insertion loss impact on off-resonance packet traversal and the crosstalk noise are lowered. Moreover, a straight-through direction has lower insertion loss and power consumption, for the reason that the straight-through direction need not turn on any microring resonators. The feature will be considered in the algorithms inside the routing unit.
In electronic bufferless NoC, the injection unit is always located in front of the switching fabric [5] . When some input port is idle, the packet from the IP core can enter the switching fabric to participate in the contention for the non-local ports, and it is likely to be deflected. While in our proposal, the injection rule is that when the output port requested by the packet from the IP core is unoccupied, the packet can be sent to its desired output port directly. Therefore, the injection unit is situated in back of the switching fabric as is shown in Fig. 1.(d) . On the one hand, the contentions can be relieved, thus reducing the probability of deflection to some extent. On the other hand, it is a self-throttling congestion control that allows the network to scale more effectively [6] . When the network is under congestion, the injection volume of the network can be lowered.
Routing unit has two functions: routing computation and port allocation, whose algorithms are given in Fig. 2 . Since the maximum insertion loss is relatively larger due to deflection, decreasing the insertion loss of each hop should be considered in the design of the routing algorithm and port allocation algorithm. The routing computation is to select an output port for each packet among its productive ports, which can get the packet closer to its destination. On the basis of our optical switching fabric, the priority depends on the transmission direction of the packet. A straight-through direction has a higher priority than a turn, so that the insertion loss and power consumption of each can be decreased. Furthermore, in order to avoid livelock, a packet's priority will be upgraded to the highest when its age exceeds a certain threshold. 0-4 stands for the S, E, N, W, and local port, respectively. If the input port is not idle, prio [i] represents the priority of the packet from input port i, age [i] represents the duration after the packet injected, and route[i] represents its routing result. The port allocation is executed in the order of packets' priorities. In accordance with the port allocation result, the routing unit arranges the ON and OFF state of the microring resonators inside the optical switching fabric to connect corresponding input/output ports. The O/E and E/O conversion, needed by the head of each packet for routing computation and port allocation in each hop, occupies a majority of the power consumption. Therefore, our proposal will consume more power than the conventional router based on circuit switching.
Simulation results
In order to evaluate the performance and cost of our proposal and compare it with the optical NoC using circuit switching, two simulators based 8*8 torus are developed by OPNET. In both the simulators, each IP core generates 1024-bit messages according to a Poisson arrival process, and uniform traffic pattern is adopted. Each message is divided into eight 128-bit packets in our proposal. Besides, dimension order routing algorithm is chosen to forward the packets in circuit switching simulator. Notice that deflection-P represents our proposal, and XY-C stands for the one based on circuit switching. Besides, we validate the traditional injection rule is worse than ours. The deflection-Pin represents our proposal whose injection rule is replaced with the traditional one.
As is illustrated in Fig. 3.(a)(b) , the performance of our proposal is significantly better than that of XY-C, for the reason that the utilization of resources can be optimized in our proposal. Besides, deadlock detection and recovery mechanism of XY-C badly decreases the performance. While, the deadlock problem does not exist in our proposal owning to no buffers. We also observe that the traditional injection rule causes the saturation point reduced by about 6.25%. Fig. 3.(c) shows the comparison of the maximum insertion loss under 7 offered loads before the saturation point. The insertion loss comes from waveguide crossing, waveguide bending, ring drop, ring pass and propagation, and the related parameters are given in Fig. 3.(d) . The maximum insertion loss increases as the offered load increases in deflection-P and deflectionPin, because the maximum insertion loss is proportionate to the packets' max hop. While in XY-C, the maximum insertion loss maintains a fixed value which is only related to the network diameter. When reaching saturation, the maximum insertion loss in our proposal is relatively high. When the network throughput reaches approximately 600 Gbps, the maximum insertion loss is about 20 dB, meeting the communication needs. In addition, it can be seen that the maximum insertion loss will be increased greatly if our injection rule is not adopted.
