Abstract: Future interconnection networks will be required to achieve ultra-high bandwidth and low latency communications to cope with the increasing performance requirements of backbone routers, large data storage systems and supercomputing systems. Aiming at achieving ultra-high bandwidth communications and approaching optical time-of-flight processing latency while being robust to cascade impairments, the authors propose an all-optical packet-switched interconnection network, where not only the actual packet switching but also the packet processing is performed in the photonic domain. The authors present two modular architectures, based on the crossbar and the Batcher -Banyan topologies, capable of forwarding fixed-length packets with two classes of service. Both use photonic digital-processing subsystems built by combining a single integrable module which exploits cross gain modulation in a semiconductor optical amplifier. System level simulations on the crossbar switch controller guarantee that the control signals maintain an acceptable quality during the processing. Moreover, the BatcherBanyan configuration is more cost-effective than the crossbar for increasing port count, while effective network performance in terms of packet loss rate can be obtained by adding just few recirculating delay lines.
Introduction
Electronic interconnection networks are currently exploited to achieve ultra-high bandwidth and low latency communications within backbone routers, high performance computing systems and large data storage systems; however, these solutions are approaching their fundamental limitations in terms of power, wiring density and throughput [1] . Photonic circuit integration enabling compact low-power transceiver banks for high-capacity parallel optical interconnects will pose even more challenging constraints to electronic interconnection network design [2, 3] . In this field, even though optical packet switching technology is still in its infancy, optical solutions are promising to reduce wiring density and power consumption, while providing data format transparency and electromagnetic field immunity.
To satisfy the requirements of ultra-high bandwidth and low latency, in this paper we propose a solution for the interconnection network core, which exclusively exploits optics not only for the actual packet switching, but also for the packet processing and the interconnection network control functions. Photonic digital processing, despite its limitations compared with advanced electronic processing, can be the most suitable paradigm for simple and ultra-fast control and switching operations, since it reduces the packet-processing latency to the optical time-of-flight. (The processing latency is the time required to read the packet label and perform a routing decision, and is suffered by all packets at each traversed node. A routing decision for contention losing packets might involve a further delay within fibre delay lines. However, as clarified in [4] , the capacity of electronic label processing is the main limiting factor in electronically controlled optical packet switches.) In this scenario, the long-term solution aims at the realisation of integrated optical circuits, where a processing latency in the nanosecond range can be achieved, at least an order of magnitude smaller than in the electronic domain [5, 6] . The first step towards this goal is the realisation of basic and integrable photonic processing modules, that is, the all-optical logic gates [7] . More sophisticated all-optical logic networks can be built by combining the basic modules to realise the required interconnection network functionalities.
The designed all-optical interconnection network architectures are based on a synchronous paradigm, where the time is slotted and fixed-length packets are aligned at the beginning of each time slot [8] . This choice makes an adaptation layer necessary to switch asynchronous variable-length packets typically generated by attached clients; nevertheless, it significantly simplifies the packet processing and reduces the packet contentions. For this reason the synchronous paradigm is widely exploited also in the core of high-end electronic routers [4, 9, 10] . Moreover, in the limited domain of an interconnection network (differently from what happens in a wide area network), the synchronisation at the packet level and at the bit level can be solved at the network boundary in the electronic domain without the need of complex optical synchronisers.
In this paper, we first describe the implementation of a semiconductor optical amplifier (SOA)-based photonic processing module, which can be used as a building block to carry out various node functionalities. In particular, the scheme of a multi-bit photonic comparator is detailed. Then two synchronous interconnection network architectures, namely the crossbar and the Batcher-Banyan, comprising the above-mentioned subsystems are detailed, and a system analysis of the worst case path within the crossbar switch controller is performed. Finally a cost and network performance analysis of the proposed architectures is presented.
Integrable photonic processing module
All the photonic digital processing functions necessary for the operation of the all-optical packet-switched interconnection network can be achieved by combining a basic module, depicted in Fig. 1 . It consists of an SOA where counterpropagating signals interact by means of cross gain modulation (XGM) [11] . SOAs are preferred for use in alloptical processing because they can be made compact, cascadable and working at low optical powers.
The devised SOA module can carry out different logic functions depending on the input signal configuration. As depicted in Fig. 1 , the XGM interaction between a lowpower signal A acting as a probe and a high-power signal B acting as a pump allows to obtain the A AND B operation. If both signals A and B act as pumps on a third probe signal C, the implemented function is A¯AND B AND C . If the third signal is a continuous wave or a pulse train in case of non-return-to-zero or return-to-zero (RZ) A and B signals, respectively, the basic block carries out the A NOR B logic operation. Finally if A acts as a pump on the same probe, A¯is obtained.
The use of counter-propagating configurations simplifies the module architecture, allowing the photonic processing independently of the signal wavelength in the whole C-band. At the output of the SOA a filter eliminates the out-of-band noise. With a proper choice of the filter central wavelength ultra-fast processing can be achieved beyond 160 Gb/s [12] . The isolators present both at the input and at the output of the SOA eliminate the reflections. This choice improves the efficiency of the nonlinear interactions in the SOAs by preventing reflection-induced leakage of power, as well as avoiding beating between the signals and the reflected optical field. In case of cascaded configurations the layout can be simplified by removing redundant isolators either at the input or the output of the SOA. Moreover polarisation-independent processing is obtained with commercially available polarisation-insensitive nonlinear SOAs. In this way the penalty introduced by the module is low and cascaded configurations can be admitted [13, 14] .
Exploiting the above-mentioned basic module, we successfully implemented the node subsystems required for a 2 Â 2 all-optical node architecture working at 160 Gb/s [15] , that is, the label extractor, the contention detector and manager, the switching element and the packet eraser. In [16] a photonic combinatorial network was demonstrated, simultaneously providing contention resolution and switch control signals with a processing time of few tens of nanosecond. In addition, through the presented basic module, more complex logic functions such as n bit pattern matching were demonstrated [17] . The n bit comparator accepts two n bit input patterns A and B, by providing at its output the functions A ¼ B and A ¼ B. The output pattern contains a single pulse if the function is true, while no pulse is present if the function is false. The n bit photonic comparator needs four SOA-based modules, being the number of the modules independent of n. If a multi-bit magnitude comparison is needed (as in a Batcher network switching element [4] ), the n bit comparator can provide the functions A . B and A , B, exploiting just two more SOAs independently of n. Therefore this subsystem enables the implementation of architectures with a higher port count than that in [15] and supporting different packet priorities. Fig. 2 reports the photonic comparator scheme for n bit words and the simplified architecture for n ¼ 1. In both cases just the basic modules in Fig. 1 are used, with the exception of the OR logic gates, implemented with optical couplers because the input signals cannot be at the high level simultaneously.
Network architectures
Several interconnection matrices [18] can be realised by exploiting the previously described modules. In this section two architectures for synchronous all-optical interconnection networks are presented, namely the crossbar architecture and the Batcher-Banyan architecture. First of all we describe the packet format used in both architectures, depicted in Fig. 3 . An N Â N interconnection network exploits fixed-length packets, whose label is divided into three fields. Each packet begins with a packet recognition bit PR set to 1, used to detect the packet presence, thus simplifying the all-optical processing. The address field A, that follows, is d log 2 (N )e bit long and indicates the desired output port. Finally the priority bit P (0 ¼ low, 1 ¼ high) enforces two classes of service which can be used to manage output port contention.
Crossbar architecture
The crossbar solution is attractive because it is internally nonblocking, simple in architecture and modular [4, 18] . Yet, it is complex in terms of the number of crosspoints, which grow as the square of the input -output ports; to this regard, optical integration can dramatically reduce the footprint of the devices. Moreover, the complexity of the switch controller arbitration may also increase with the number of input/output ports; however, thanks to the photonic digital processing its latency is expected to be kept low. Fig. 4 shows the architecture of a 4 Â 4 crossbar, comprising a label extractor and a packet eraser for each input port, and a crosspoint, that is, a 2 Â 2 elementary switch, for each input-output port pair. These subsystems are detailed in Fig. 5 . Fig. 5a shows the label extractor connected to the input port In i (1 i 4), implemented with an AND port for each packet field to be extracted; Sync 1 is a pulse synchronised with the beginning of the slot, able to extract the PR i bit and the P i bit after a 3 bit time (t b ) delay. Sync 2 on the other hand contains as many pulses as the address length (2 in the example) synchronised with the beginning of the slot, which can extract the A i field after a t b delay. The packet eraser is depicted in Fig. 5b ; it is composed of an optical gate generator and of an AND port. The optical gate generator receives the output E i of the switch controller; if an optical pulse is detected on E i , it generates an optical gate lasting as long as the packet duration, which holds the eraser output to the low level for the entire time slot. The gate generation in the optical domain can be obtained as in [15, 19] . The 2 Â 2 elementary switch, detailed in Fig. 5c , includes an optical gate generator and a 2 Â 2 switching fabric. The 2 Â 2 switching fabric can be realised exploiting XGM in two SOAs, as detailed in [20] . If no optical power is present on the switching fabric control port (default state), the switching fabric is in cross configuration, while if enough power is received from the control port, the switching fabric moves to a bar configuration. The optical gate generator is connected to the output O ij of the switch controller and to the switching fabric control port: if an optical pulse is received on O ij (meaning that the switching fabric has to be set in the bar state) it generates an optical gate lasting as long as the packet duration, holding the switch in the bar configuration for the whole time slot.
All the processing is performed in a centralised way by the all-optical switch controller shown in Fig. 6 . It receives as an input the extracted packet recognition bits (PR i , 1 i 4), addresses (A i , 1 i 4) and priorities (P i , 1 i 4), and computes the switch control signals (O ij , 1 i, j 4) to set the required crosspoints in the bar state from the default cross state. Eventual output port contentions are resolved with the erasure (signalled by E i , 1 i 4) of the contention losing packets, depending on the priority and on Figure 2 1 bit and n bit photonic comparators contention loss AND gates, labelled as iLj. Each iLj has three inputs: the PR j bit, and the outputs of the address and priority comparators between i and j. iLj generates a high level pulse if and only if the eventual packet on input port i loses the contention with the packet on input port j. In the shown implementation the tie-breaking policy used when two packets have the same output port address and the same priority is to favour packets from lower-indexed input ports. For each input port i, the outputs of three iLj (1 j 4, j = i) are fed into a three-port OR. This port is implemented with a NOR followed by a NOT and an OR coupler. This configuration is needed to avoid any beating between non-orthogonally polarised signals, since all three OR inputs could be simultaneously at the high level. The output of the three-port OR i is high if and only if the packet from the input port i loses at least a contention: in this case a high pulse on E i is generated, which activates the packet eraser and makes all O ij , 1 j 4 (out of the AND bank on the right-hand side of Fig. 6 ) at the low level, thus disabling all 2 Â 2 switches on row i. On the contrary, a low level output from three port OR i, indicating the packet from input i can proceed to the desired output, disables the packet eraser and enables the entire row i of 2 Â 2 switches, that is, O ij , 1 j 4. The specific O ij to be activated from input port i is determined by the output address reader module i, four of which are present in Fig. 6 . An address reader module is a device with one input and as many outputs as the number of output ports (four in the example); it receives the packet output address and generates Figure 4 All-optical crossbar-based interconnection network architecture a single high level pulse on its output corresponding to the packet output address. So doing the packet from input port i is diverted to the proper output by the only 2 Â 2 switch in row i that has been set in the bar state.
Batcher-Banyan architecture
An alternative architecture is proposed for realising the alloptical interconnection network, namely a multistage Batcher-Banyan architecture [4] , enhanced with an intermediate contention manager; a 4 Â 4 example is shown in Fig. 7a . The architecture is modular and internally not blocking as the crossbar. Moreover, the Batcher-Banyan is more scalable in terms of required 2 Â 2 elements and selfrouting, that is, the packet destination output address is used to route the packet itself in a distributed way from any input to any output without the complex centralised controller required in the crossbar case. The implementation of Batcher and Banyan switching elements and of the contention manager is depicted in Figs. 7b-7d , highlighting the control functionalities. They are composed of the modules in Figs. 1 and 2, 2 Â 2 SOA-based switching fabrics [20] , optical gate generators, couplers and delay lines.
The Batcher network concentrates the packets and sorts them according to the output destination; if more than one packet has the same destination, they are sorted according to the priority field. The lowest priority is assigned to voids, that is absent packets. The Batcher network is composed of 2 Â 2 sorting elements in stages, each marked with an arrow indicating the output port where the input packet with the highest address must be routed to. For this purpose, as shown in Fig. 7b , the 2-bit photonic comparator is exploited. In case of real contention (detected by the contention detection module highlighted in the figure), packet priorities are checked with a 1 bit comparator, and the higher priority packet is sent to the marked output port. If a packet and a void are present at the two inputs, the packet is routed towards the marked port. After the processing has been performed, the routing decision is fed into the optically controlled 2 Â 2 switch that is used also in the crossbar architecture and shown in Fig. 5c . In this case a three-port OR can be realised simply with two couplers since at most one of the three input can be at the high level.
After the Batcher network a novel block, that is, the contention manager shown in Fig. 7c is introduced. Since packets have been already sorted for increasing output addresses and priorities in the Batcher network, in a N Â N architecture only N 2 1 address (i.e. dlog 2 (N )e bit) and priority (i.e. 1 bit) comparators suffice to detect contentions, resolved by erasing contention losing packets. Packet concentration on neighbouring ports is also performed to avoid internal blocking in the Banyan network that would otherwise happen if a void occurred between two ports carrying packets [4] .
Banyan switching elements, shown in Fig. 7d , are the simplest ones, since all contentions have been solved in the previous stages and their routing is based on the value of a single output address bit depending on the stage they belong to, as depicted in Fig. 7a . The ith stage routing is decided by the ith address bit, and only a 1 bit comparator is required to properly set the 2 Â 2 switch. 
System analysis
In this section we analyse the worst-case degradation of the optical control signals, which should be kept low since an error in the optical switch controller results in a fatal packet misrouting. Such a worst-case scenario has been identified in the all-optical switch controller of the crossbar architecture (shown in Fig. 6 ) for the signal traversing a 2 bit comparator, a contention loss AND, a three-port OR and an output AND. The worst path, depicted in Fig. 8 , contains seven SOAs performing the above-mentioned operations, including three SOAs required to obtain the output A ¼ B of the 2 bit address comparator.
Here we exploit and adapt a simple time-resolved SOA tool [21, 22] for modelling the SOAs used in the alloptical switch controller. Since a nonlinear SOAs is around 2 mm long, it is segmented into a number of small sections in order to accurately study its carrier dynamics. In the calculation of the gain coefficient, the carrier recombination lifetime takes non-radiative, bimolecular and Auger recombination into account. As the optical signal propagates through the cascaded optical amplifiers, signal quality degrades because of incomplete XGM in SOA. This is a source of bit error rate (BER) increase because of the extinction ratio reduction. This can be studied by observing the eye diagram at the output of each SOA. As the number of cascaded SOA increases, the contribution because of signal-amplified spontaneous emission (ASE) beat noise also increases. Therefore the BER degrades further. To restrict the ASE a noise band pass filter (BPF) is employed at the output of each SOA. In practice the filter is not ideal and the gain spectrum in the filter passband is not flat. We can assume the standard filter transfer function as
], where B 0 is the single filter 3 dB bandwidth and l c is the BPF central wavelength. As the signal and ASE pass through a cascade of amplifiers followed by the filters, signal and ASE experience a shrinkage in the filter bandwidth. With the given filter function, the 3 dB bandwidth after n stages reduces by (ln 2/n) 1/6 . Hence we have taken all the involved phenomena such as imperfect XGM, optical bandwidth reduction and contribution of signalspontaneous emission beat noise into account for the cascaded SOAs for the worst case analysis in finding BER.
Into the SOAs can be fed a signal coming from another logic gate, a probe which is a pulse train and a pump. The average pump power is high (6 dBm) in order to induce www.ietdl.org gain modulation, the average probe power is low (28 dBm).
With reference to the worst-case path in Fig. 8 , in order to achieve the highest degradation, the signal is set to 1 if it acts as a probe, while it is set to 0 if it acts as a pump. We assume a bias current of 200 mA and a low signal gain of 30 dB in each SOA. The input pulses are hyperbolic secant RZ pulses with full width half maximum of 30 ps. The simulations are run with the signals counter-propagating in the SOAs at the same 1550 nm wavelength. The ORs in Fig. 8 are simulated by coupling the signals with orthogonal polarisation, thus avoiding beating. Fig. 9a shows the eye diagram at the input and at the output of the optical path. While the input eye is open and the zero level is very low, the eye opening of the output is reduced. In particular the zero level is increased because of the imperfect XGM in the SOAs.
Simulations are also performed in order to calculate the BER of the optical control signals at 10 Gb/s. The curves, reported in Fig. 9b , show a power penalty of the output signal of 9 dB with respect to the back-to-back (B2B) at BER ¼ 10
29
[corresponding to 20.95 in the 2log(2log(BER)) scale], without however evidencing noise floors. This penalty can be ascribed to the extinction ratio degradation because of the cascaded SOAs. Nevertheless it must be noticed that the signal coming from the all-optical controller is not transmitted, rather it just drives a 2 Â 2 switch. It is important therefore to be able to separate the zero level from the one level. Moreover, the output extinction ratio can be improved for instance by means of a semiconductor saturable absorber. Exploiting this device, an extinction ratio improvement at bit rates up to 160 Gb/s has been demonstrated [23, 24] . The simulation results agree with former investigations of performance degradation induced by XGM in SOAs [25] .
Cost and performance analysis
In this section first we perform a cost comparison between the two proposed architectures, namely the crossbar and the Batcher -Banyan with contention manager. The chosen cost metric is the number of active elements (i.e. SOAs) required to build the entire switch, highlighting the switch fabric and the optical control system contributions; this metric also provides a good estimate of the switch power consumption scalability.
As shown in However Fig. 10 shows that both architectures require more than 10 4 active elements for a 64 Â 64 interconnection network, which clearly states the strong need for dense optical integration in order to make these architectures feasible.
Next we perform a simulative study aimed at assessing the network performance of both the crossbar and the BatcherBanyan architectures accepting traffic belonging to two classes of service. A common problem in all-optical architectures is the absence of feasible optical random access memories to buffer packets while the desired output is busy, which strongly degrades the network performance. In this study we assess the performance benefits brought by recirculating delay lines; in such a scenario few output ports are short-circuited back to the input after traversing a delay line lasting as the packet duration. This solution, at the expense of an increased number of actual input/output ports, allows contention losing packets to retry the arbitration for the desired output after one time slot, significantly improving the successful routing probability compared with a bufferless scenario where contention losing packets are immediately The study has been performed with a custom-built synchronous network simulator; the offered traffic is Bernoullian with two equally loaded classes and output addresses are uniformly chosen. Fig. 11 shows the blocking probability of high and low traffic classes as a function of the number of delay lines D, when the offered load is 0.5. www.ietdl.org
From the plot we can notice that without delay lines (D ¼ 0) the blocking probability of both classes is very high, as expected. However just few delay lines allow to significantly reduce the blocking probability, at least of the high traffic class, for all evaluated number of input/output ports N. For a given number of delay lines, a larger switch achieves a higher blocking differentiation, that is, the ratio between the high class and the low class blocking probabilities is larger. The network analysis thus confirms the effectiveness of separating the traffic in two classes of service to achieve good performance even with few delay lines.
Conclusion
In this paper we have presented crossbar and Batcher-Banyan optical packet-switched interconnection networks, capable of all-optically processing and forwarding synchronous fixed-length packets. Both architectures use photonic digitalprocessing subsystems built by combining a single SOAbased integrable module-exploiting XGM.
Worst-case system simulation ensures that sufficient signal quality is preserved during the all-optical processing of the packets, while a cost comparison in terms of active elements between the two architectures shows that the Batcher -Banyan configuration is more scalable for increasing port count. Finally network simulation shows the benefits in terms of high class packet loss rate reduction when few recirculating delay lines are introduced.
Acknowledgment
The work described in this paper was carried out with the support of the BONE-project ('Building the Future Optical Network in Europe'), a Network of Excellence funded by the European Commission through the 7th ICT-Framework Programme.
