Abstract-Digital optical logic circuits capable of performing bit-wise signal processing are critical building blocks for the realization of future high-speed packet-switched networks. In this paper, we present recent advances in all-optical processing circuits and examine the potential of their integration into a system environment. On this concept, we demonstrate serial all-optical Boolean AND/XOR logic at 20 Gb/s and a novel all-optical packet clock recovery circuit, with low capturing time, suitable for burst-mode traffic. The circuits use the semiconductor-based ultrafast nonlinear interferometer (UNI) as the nonlinear switching element. We also present the integration of these circuits in a more complex unit that performs header and payload separation from short synchronous data packets at 10 Gb/s. Finally, we discuss a method to realize a novel packet scheduling switch architecture, which guarantees lossless communication for specific traffic burstiness constraints, using these logic units.
I. INTRODUCTION

O
PTICAL systems have rapidly expanded in capacity from early systems at 55 Mb/s to present commercial systems with 10 or 40 Gb/s per wavelength. However, in these systems processing, switching and routing are still performed in the electrical domain. As the channel rate continues to increase, the effort for matching the optical and electronic speeds is done at a great cost and complexity and has already exceeded its practical limits. As a result, switching and routing applications have to be shifted into the optical domain to allow successful scaling of the switching capability of routers, so as to be compatible with the capacity of WDM transmission. To this end, low-complexity all-optical processing circuits can significantly assist in relieving the network from undesirable latencies related to o/e/o conversions at the switching nodes. However, the use of low-complexity optical switching, by itself, does not guarantee full bandwidth exploitation. The sporadic/bursty nature of traffic requires on-demand use of bandwidth, so as to minimize capacity waste when no information is transmitted. Currently deployed optical networks are circuit switched, and individual data packets are aggregated into circuits before transmission in the network. For this reason, these networks are unsuitable for packet traffic and result in inefficient resource utilization. With the explosion of data traffic related to Internet applications, optical packet/burst switching [1] - [3] has been proposed as a method to fully exploit the advantage of statistical multiplexing, in the sense that packets make on-demand use of the outgoing capacity, while at the same time taking advantage of new optical techniques to overcome the limitations related to O/E/O conversions. Toward this goal, serious effort has been invested during the last few years on the implementation of the critical subsystems required for a "true" all-optical packet switch. Semiconductorbased devices [4] have lead the technology to a whole new class of compact circuits with very low power requirements that can be potentially integrated on a chip module. High-speed optical gates capable of performing bit-wise Boolean logic [5] - [12] , clock recovery circuits [13] - [16] , header separation and recognition techniques [17] - [22] , optical flip-flops [23] , and optical shift registers [24] , [25] with read/write capability [25] are only part of the significant progress made up to now in the field. However, despite these achievements, integration of functionalities in a system environment to perform lossless packet processing and routing in the optical domain still remains an elusive target.
In this paper, we present recent advances in the development of all-optical signal-processing circuits and discuss a scheme for the realization of a recently proposed packet switch node. We demonstrate all-optical Boolean AND/XOR logic up to 20-Gb/s data rates using a semiconductor optical amplifier (SOA)-based ultrafast nonlinear interferometer (UNI) [12] . We also describe a novel clock recovery circuit [26] that has a very short capture time and as a result can be used to handle bursty-packet traffic. Following this, we describe a more complex circuit that performs "on the fly" packet header and payload separation. This circuit consists of a packet clock recovery circuit and an AND gate and its operation was successfully verified for 10-Gb/s data packets [27] . Finally, we discuss the system perspective of these optical circuits to build an optical packet switch [28] . The switch can be implemented using the header extraction unit, a number of cascaded AND/XOR optical gates and a number of "switchable" delay lines. It can in principle guarantee lossless communication for data obeying a specified burstiness property and has the ability to process data on a packet-by-packet basis without the need for additional buffering.
The rest of this paper is organized as follows. Section II presents experimental results of the AND/XOR logic and clock recovery circuits. Section III assesses their performance in a subsystem environment, demonstrating header extraction at 10 Gb/s. Finally, Section IV discusses the design and implementation of the optical packet scheduling switch with the assistance of the optical circuits discussed before.
II. DIGITAL OPTICAL CIRCUITS: DESIGN AND IMPLEMENTATION
A. AND/XOR Boolean Logic Circuits
Single and dual rail logic operations as the Boolean AND/XOR are crucial for any real processing application. Both AND-gates and XOR-comparators are important to build complex optical circuits like binary adders [29] , address recognition schemes [17] - [22] , data encoders/decoders, or optical decision circuits [6] . Digital AND is the most common Boolean function and has been used in demultiplexing of optical time-division multiplexing (OTDM) signals [30] , [31] , and wavelength conversion [32] , [33] , as well as 3R-regeneration [34] . For such operations, the UNI is a very promising candidate, because its balanced structure compensates for the long-lived nonlinearities of the SOA [10] and allows for its cascadability [31] . All-optical bitwise AND and XOR logic using the SOA-assisted UNI gate has been reported so far up to 100 [11] and 20 Gb/s [12] , respectively. Fig. 1 shows a schematic diagram of the UNI gate in a counterpropagating configuration for single or dual rail operation. As in other interferometric devices, the UNI gate relies on measuring the differential phase change between the two polarization components of the signal that must be switched, which is termed clock (CLK) signal in Fig. 1 . On entering the UNI, the clock signal is analyzed into its two orthogonal polarization components, which are relatively delayed for the purposes of measurement. The relatively delayed polarization components are obtained using a polarization split-and-delay (PSD) element that consists of a polarization beam splitter (PBS), cross-spliced at 45 with a length of polarization maintaining fiber (PMF). The differential phase between the two orthogonal polarization components is provided by the fast nonlinearity of an SOA through its interaction with a second control signal whenever this is present. To complete the UNI gate, an identical PSD element is used at the exit of the SOA, whose purpose is to remove the relative delay of the polarization components and force them to interfere in a second PBS. In the absence of any control pulse the incoming clock signal emerges in one of the ports of the second PBS, which we call Unswitched-U port.
The device is configured as an AND gate, if one control pulse is introduced into the SOA, so that it coincides with one of the two clock polarization components. The control pulse imposes a local, time dependent, refractive index change in the SOA that causes a relative phase change of between the two components, so that after interference in the output PBS, the total polarization vector of the clock signal rotates by 90 . In that case, the output pulses appear at the second port of the PBS, which we call Switched-S port. In order to perform dual rail XOR logic, a second counterpropagating control pulse is introduced in the UNI. This control pulse is temporarily synchronized with the second clock signal polarization component and is arranged to impose a phase change on this component too. With this arrangement and whenever the two control signals are present, the relative phase difference between the polarization components of the clock becomes zero again at the exit PBS. As such, the Following, we present experimental results of AND/XOR logic using the UNI gate at 20-Gb/s line rate.
1) Experiment:
Three optical signals are used as inputs into the gate: control signals A and B, and clock signal, CLK. The A XOR B logic result is imprinted on the CLK signal, which is held continuously to a logical 1 on input. Fig. 2 shows the block diagram of the UNI gate configured to perform XOR logic.
The optical signals were produced by two gain-switched distributed feedback (DFB) lasers LD1/LD2 driven at 10 GHz and emitting at 1545.2 and 1554.6 nm, respectively. Both lasers produced 9-ps optical pulses after linear compression. LD1 provided the optical CLK and Control A signals, whereas LD2 provided Control B signal. The output of LD2 was modulated using a Li:NbO modulator, driven from a programmable pulse generator. All the optical streams had their repetition rate doubled by bit interleaving in a split-relative-delay-andrecombine fiber doubler and were introduced in the UNI gate.
The UNI gate itself consisted of two similar PSD structures, designed to introduce and remove a relative delay of 25 ps before and after the SOA. The active element was a 1.5-mm bulk InGaAsP/InP ridge waveguide SOA, with small signal gain of 30 dB at 1558.9 nm and 80 ps recovery time, when driven with 0.7-A dc current. The two control signals were arranged to counterpropagate the CLK signal in the SOA. Finally, the control signal A and B pulses were synchronized with the relatively advanced and delayed components of the CLK signal.
2) Results: The performance of the gate was evaluated for several control data patterns, including full duty-cycle signals. Here, we show results for Control A as a 20-GHz full duty-cycle signal and Control B as a 20-Gb/s pseudodata pattern that was obtained by driving the modulator at various frequencies with different duty cycles. At the output of the repetition doubler, the pattern obtained consisted of a series of alternating "1s" and "0s" followed by a number of consecutive "1s," and at the end an equal series of alternating "1s" and "0s." By readjusting the modulator drive frequency and pulse width, it was possible to obtain patterns with different number of consecutive "1s."
For successful Boolean XOR operation between A and B, both control signals must be present in the gate and the switched port must record a logical "1" if either control pulse is "1," and a logical "0" if both control pulses are "1" or "0." Fig. 3 illustrates the logic output of the gate with data for the four logical combinations of Control A and B. Control B is a 32-bit-long pseudodata pattern that was obtained by driving the modulator at 625 MHz. Bit-by-bit checking proves that correct XOR operation has been obtained. The contrast ratio between the ON-OFF states of the switch port of the gate was 6:1 and the required energies of CLK, Control A and B pulses were 2.5, 15, and 4 fJ, respectively. The difference in switching energies between A and B pulses is a result of modal gain difference in the SOA. Note that the switching energy values are indeed low. This implies that the gate can in principle operate in an integrated circuit optimized for low losses without the need for external signal amplification.
The switching quality of the gate depends strongly on the precision by which the optical signals are synchronized and is expected to degrade in the presence of timing jitter. This degradation was evaluated by recording the change in the switched bits, as the arrival time of Control A was varied with respect to the arrival time of the other two signals. Fig. 4 shows the variation of the power in the S-port of the device and indicates that the temporal window within which the output degrades by 3 dB is about 30 ps wide. This large value of the switching window is in part due to the relatively long pulses used, but nevertheless shows that the device will be tolerant to timing jitter in the incoming data signals.
In conclusion, it is worth mentioning that dual rail logic experiments (i.e., XOR operation) at 20-Gb/s data rates using SOAbased optical gates is, to our knowledge, the highest reported so far, and that is mainly due to the slow relaxation time of the SOA used. However, the significant progress made during the last few years toward reducing the relaxation time in SOAs reveals that the operational speed of optical gates may in principle be increased. For example, quantum dot devices with gain recovery time less than 3 ps have been recently demonstrated [35] , and several techniques for speeding up the carrier recombination time in bulk semiconductors have been developed, including optical pumping at transparency [36] and continuous-wave (CW) assist light injection [37] .
B. All-Optical Packet Clock Recovery Circuit
Clock recovery circuits are front-end units used to generate the local clock signal, to assist in performing data recovery in a receiver, and to provide the means for synchronization in the following layers of gates used in digital signal processing. In all-optical packet-switched network nodes, clock recovery circuits are likely to be used to provide the clock signal of alloptical logic gates of the type discussed in the previous section. Compared to circuits used for synchronous digital hierarchy (SDH) data transmission, packet clock recovery circuits have special requirements, since they must accommodate burstiness in traffic. In the general case, packets arriving in a node from different destinations will be neither synchronous with respect to each other nor of necessarily equal length [38] . In order to minimize bandwidth waste in synchronization, clock recovery circuits must provide instantaneous clock signal generation at the beginning of the packet and immediate clock signal termination at the end of the packet. Rapid clock signal termination is particularly required for clock recovery units designed for use in all-optical circuits with all-optical gates. This is because continuing clock signal generation in the absence of data packet signals will lead to extraneous logic outputs from the optical gates and errors in the circuits. To avoid these errors, guard bands between the packets must be used, reducing channel bandwidth efficiency. It is therefore highly desirable for the local clock recovery unit to be capable of generating a clock signal of duration, ideally, equal to the length of each packet. Several techniques have been proposed so far for all-optical clock recovery. These include synchronized mode-locked ring lasers [13] , [14] , electronic phase-locked loops [15] , use of a tuned Fabry-Pérot (FP) etalon [39] and self-pulsating DFB lasers [16] . Ring lasers, phase-locked loops, and FP etalons on their own are not suitable for data packets because they require a continuous data stream as input and they require a relatively long time to synchronize. Self-pulsating DFB lasers acquire phase locking within a few nanoseconds, and their clock signal persists in oscillating for more than 100 ns after the data signal ceases [40] . Even though these units are mature and proven in applications for clock recovery for burst-mode transmission at up to 40 Gb/s [38] , [40] , due to the long persistence of the generated clock signal they may not be ideal for powering all-optical logic circuits.
In this section we present a packet clock recovery circuit that uses an FP etalon tuned to the line rate and a high-speed amplitude equalization function that is provided by the nonlinear switching function of a UNI gate. The operation of the device has been tested for both continuous data streams and optical packets at 10 Gb/s [26] . Key features of this technique are that clock acquisition is achieved within a small number of bits and is independent of the line rate, and the derived clock signal persists, approximately, only for the duration of the original data packet. Other practical advantages of the circuit are that it is self-synchronizing, it does not require high-frequency electronic circuitry, and at least in principle it may be integrated. Clock recovery with this approach is therefore particularly applicable to optical packet switching and is expected to assist the implementation of more complex processing circuits toward optical packet routing, as will be discussed in Sections III and IV.
1) Experiment:
The experimental setup of the packet clock recovery is depicted in Fig. 5 and consists of two parts: the optical packet generator and the clock recovery unit that comprises an FP etalon and a UNI gate. The circuit takes advantage of the decaying exponential impulse response function of the filter, in order to partially fill the "0s" of the incoming data stream by preceding "1s," and of the nonlinear transfer function of the UNI gate to equalize the partially filled "1s." This arrangement results in a clock signal with nearly constant amplitude and duration nearly equal to the input packet.
The circuit was tested at 10.326 Gb/s in order to match the free spectral range (FSR) of the available FP etalon. For the generation of the optical packets, a gain-switched DFB laser (LD1) at 1.290 75 GHz was used to produce 8-ps pulses at 1549.2 nm. The optical signal was then modulated with a 2 1 pseudorandom bitstream electrical signal and three-times bit interleaved to generate a 10.326-Gb/s pseudodata stream. A second modulator, driven from a programmable pulse generator, was used to form the optical packets at the input of the clock recovery unit. The packet stream was then launched in the FP etalon. The finesse of the filter was 20.7, corresponding to a lifetime roughly of 7 bits. The output of the FP was then amplified and was introduced as the control signal in a UNI gate. The optical gate was powered by a CW signal obtained from a second DFB laser (LD2), operating at 1545 nm. The two signals were arranged to interact in counterpropagating fashion within the UNI gate. The UNI was optimized for 10-Gb/s operation, so that the PSD element (see Fig. 1 ) was designed to provide 50 ps of differential delay between the orthogonal polarization components of the CW signal. The active element was again a 1.5-mm bulk InGaAsP-InP ridge waveguide SOA. The UNI operated as an AND gate, and in the presence of control pulses, the outcome of the derived packet clock signal appeared in its S-port.
2) Results: Data packets of different length, period, and content have been introduced in the clock recovery circuit by changing the width, period, and delay of the electrical pulse that drives the second modulator in order to evaluate its performance. Fig. 6 shows typical results obtained from the circuit with input data traffic consisting of short, 40-bit-long (i.e., 4 ns duration) packets arriving with a period of 12.5 ns. Fig. 6(a) displays an example of the packets in the data stream and Fig. 6(b) shows the electrical spectrum of the data packet sequence. Fig. 6(c) and (d) shows the corresponding output from the FP etalon. Note that this is a clock-resembling signal, highly amplitude modulated, whose length is roughly equal to the original packet length, extended at its leading edge by the rise time of the FP and at its trailing edge by its lifetime. The effect of the filter is primarily to suppress all data modes outside its full-width at half-maximum (FWHM) (i.e., 500-MHz band centered at the line rate 10.326 GHz), as shown in Fig. 6(d) .
By inserting this clock-resembling signal as control into the UNI, copolarized with one of the CW components in the SOA, a nearly equal amplitude clock packet is obtained at the S-port. The output clock packet is depicted in Fig. 6(e) , revealing an amplitude modulation (highest to lowest pulse ratio) of less than 1.5 dB. The contrast ratio between the ON-OFF states of this port of the gate is 50:1. The corresponding RF spectrum is shown in Fig. 6(f) , where it is evident that all data modes are suppressed in excess of 35 dB compared to the 10-GHz clock component, while the 80-MHz spaced packet subharmonics remain unaffected. The rms timing jitter of the extracted packet clock was measured by microwave spectrum analysis and was found to be less than 1 ps. Switching power and mean energy per pulse for the CW and the control signal were 1 mW and 120 fJ, respectively. These two parameters determine the saturation level of the SOA and enable the control pulses to affect a differential phase change of roughly between the two components of the CW signal.
Compared to the original data packets, the derived clock packets are extended at their leading and trailing edges due to the lifetime of the FP filter. However, due to the saturation properties of the SOA, the rise time of the packet clock signal is much shorter than its decay time or the lifetime of the FP. In the leading edge of the packet, the SOA is less saturated and can provide higher gain to the low-energy pulses that appear at the output of the FP filter, so that they can cause enough phase shift on the CW component as well. For the entire set of the generated and evaluated packets, the recovered clock was found to have a rise time of less than 3 bits and a fall time of less than 10 bits. Therefore if this clock recovery unit is used for bit-wise processing of optical packets, fixed guard-bands of 3 bits before and 10 bits after each packet are required. Clearly, the rise and fall times of the clock recovery circuit and hence the required guard-bands depend on the finesse of the FP filter. In turn, the finesse of the filter determines its ability to fill the missing "1s" in the data packet and must therefore be determined by the expected sequence of continuous "0s" in the data stream. In order to study the performance of the circuit in terms of the number of consecutive "0s" and "1s" in the data stream, a theoretical model has also been developed [26] . For this discussion, we focus on the worst case scenario, whereby the packet consists of a sequence of consecutive "0s" followed by 31 consecutive "1s." Fig. 7(a) and (b) shows the amplitude modulation at the output of the FP and the gate, respectively, as a function of consecutive "0s" in the incoming packet for various filter finesses. It is evident that an FP etalon of finesse 80 would be required in order to handle packets containing up to 30 consecutive "0s." In that case, the nonlinear gate reduces the amplitude modulation by 16.5 dB, depending on the saturation level of the SOA. Fig. 7(c) shows the rise and fall times of the recovered packet clock as a function of the filter finesse for the worst case packet described before.
As expected, using an FP with a higher finesse assists in reducing the amplitude modulation of the packet clock at the expense of slower rise and fall times in the derived signal. In the case of data packets with up to 30 consecutive "0s," requiring a filter of finesse 80, the rising edge of the packet clock is still only 3 bits long, while its trailing edge is 82 bits long. These 85 bits may be assumed to be part of the overhead associated with packet switching, and its severity depends on the protocol and the packet sizes employed. For example, assuming 424-bit-long asynchronous transfer mode packets at 10 Gb/s, this synchronization overhead is 16%, while for the short end of IP packets (i.e., 40 bytes), this becomes 21%. This overhead is not excessive, and it should be noted that it does not increase as percentage of the channel bandwidth as the line rate increases. The rise and fall times of the circuit are defined by the finesse of the FP as fixed numbers of bits and are independent of the rate. Given that the UNI gate has been shown to operate at significantly higher data rates [11] , by choosing an FP filter of the appropriate FSR, operation of this clock recovery circuit should be extendable to higher rates. In this case the guard-bands between packets will scale simply with the bit period and will not increase in numbers of bits. Recently, this clock recovery scheme was successfully applied to traffic of 10-Gb/s asynchronous short data packets by taking advantage of the fact that it self-synchronizes and requires small guard-bands [41] .
The following section describes an application of this clock recovery circuit in an all-optical technique for the separation of the header from the payload of an optical packet.
III. A 10-Gb/s HEADER EXTRACTION CIRCUIT
A very crucial operation in a packet switched node that lends itself for execution with all-optical techniques is the "on the fly" separation and recognition of the address information embedded in an optical packet. Toward this goal, several methods have been proposed so far for the payload/header separation [17] - [22] . Even though these methods have demonstrated the in-principle feasibiltiy of all-optical header extraction, the circuits proposed increase rapidly in complexity with the number of header bits and the length of the packet.
In this section, we describe a relatively simple, all-optical header extraction circuit that consists of the clock recovery circuit described before, with an additional UNI logic gate at its output. Successful operation of this unit has been demonstrated for 10-Gb/s synchronous short data packets [27] . This method does not make use of any high-speed electronics and requires only the 10-Gb/s packet as input for the separation task to be performed. Additionally, it takes advantage of the small guard-band requirements of the packet clock recovery circuit to operate with minimized bandwidth efficiency degradation.
A. Concept and Experiment
The block diagram of the proposed circuit is shown in Fig. 8 . The circuit consists of three subunits: the previously described optical packet generator and packet clock recovery circuit and an additional high-speed UNI gate (UNI 2), built identically to UNI 1 as shown in Fig. 8 . The output of the packet generator is split into two parts: one used as input in the clock recovery circuit and one to enter as data signal in UNI 2. The extracted optical packet clock signal that persists for the duration of the original data packet is used as the control signal in UNI 2. For header separation, UNI 2 is configured to perform a simple AND logic operation between the original incoming packet stream and a delayed version of the recovered packet clock. Successful header extraction is obtained if the original packet and the extracted clock are relatively delayed in time by an amount equal to the packet address length plus the rise time required for clock acquisition. As such, the packet clock is delayed in an optical delay line, so that only the payload bits of the original packet fall within the window of the recovered clock and are therefore switched at the output of the gate. In this fashion, header information appears at the U-port of UNI 2, since there are no clock pulses to switch the gate and the payload data exits from S-port.
In order to assist the header extraction process, there are three packet formatting requirements for the incoming data signal that have to be considered. Due to the very short but nonzero rise and fall times of the packet clock recovery circuit that stem from the storage property of the FP filter, a number of consecutive "1s" at the leading edge of the packet and a fixed guard-band of "0s" between packets are required. The "1s" at the front of the packet and the "0s" at its end have to persist for the duration of the rise and the fall time of the clock recovery circuit, respectively. Additionally, in order to avoid switching with the first imperfect bits from the packet clock, a number of consecutive "0s," equal to its rise bits, is also required in between payload and header information. Moreover, the guard-band of "0s" between successive packets has to be further extended, so as to take into account the delay required between original packet and packet clock. As a result, the bandwidth efficiency of the technique, determined by these requirements, increases for packets with smaller header-to-payload ratios.
B. Results
In this experiment, packets of different length, period, and content have been generated by rate multiplication with the packet generator circuit, used in the experiments described in Section IIB, and were used to test the performance of the header extraction circuit at 10-Gb/s nominal rate. Fig. 9(a) shows typical results for two such packets at the input of UNI2. In the examples shown here, the packets are 30 bits long ( 3 ns duration) and arrive at 6.2-ns time intervals. In this case, the FP used had a finesse of 20.7, resulting in packet clock rise and fall times of 2 and 8 bits. As a result, these time constants require that the packets used here are formatted, so as to contain two "1s" as preamble bits, 8 bits assigned to header, a guard-band of two "0s" between header and payload, and finally 18 bits for the payload. More precisely, the headers of the packets shown in Fig. 9 (a) are the sequences "01 011 001" and "10 111 000," and their respective payloads are the sequences "001 111 001 000 110 010" and "101 001 101 111 000 001." Note that the last packet formatting requirement for a guard-band between consecutive packets of at least 18 "0s" (2 preamble bits 8 header bits 8 fall time bits, i.e., 1.8 ns) is satisfied by the period and length of the incoming packets. Fig. 9(b) illustrates the output of the packet clock recovery circuit as it enters UNI 2 that is properly delayed by 1 ns, with respect to the original packet. To complete the header/payload separation task, the synchronized packet and self-extracted clock signals are logically ANDed together in UNI 2. Fig. 9 (c) and (d) show the headers and payloads of the packets at the Uand S-ports, respectively. The pulse energies required for the signals interacting in UNI 2 were 2 fJ for the data pulses and 24 fJ for the packet clock pulses. As seen, the switching energies of UNI 2 are significantly lower that the corresponding energies in UNI 1 of the clock recovery circuit. This is because in UNI 1 the SOA operates under heavy saturation, so as to remove the amplitude modulation of the clock resembling output from the FP filter. The average contrast ratio between the ON-OFF states of UNI 2 is 8:1 for the payload (S-port) and 6:1 for the header (U-port). The data patterning observed in Fig. 9 (c) and (d) is primarily due to imperfect rate multiplication and can be largely eliminated if a proper 10-Gb/s transmitter source is used.
The proposed technique offers the advantage of serial, bit-by-bit processing on a packet-by-packet basis with small associated bandwidth overhead, as a result of the fast, self-synchronization capability of the clock recovery circuit. Finally, the circuit can easily handle packets of any length or with any address to payload ratio with no increase in its complexity. In fact, for packets with different address lengths, one has only to alter the temporal delay between the optical signals at the input of UNI 2.
IV. OPTICAL PACKET SCHEDULING SWITCH
For signals transmitted in optical networks, the need to avoid o/e/o transitions and to perform as many processes as possible in the optical domain has been a long-established aim. The recent technology maturation and advances hold promise that the fulfillment of this long established goal is closer to hand. Even though deep processing for the routing of optical packets might still be performed in the near future by high-speed electronics, progress has been significant in optical pattern recognition and routing [17] - [22] . As discussed in Sections II and III, the modules described possess key functionalities that may be used for the implementation of an all-optical packet switched node. In this section, we describe the application of digital optical circuits, toward the realization of the "optical packet-scheduling switch". The concept and architecture of the optical packetscheduling switch is described in detail in [28] and here the discussion focuses on its implementation using the optical modules described previously.
The block diagram of this switch is depicted in Fig. 10 . It is designed to guarantee lossless communication, for sessions that obey a specified "burstiness" property, and to tolerate the delay induced when transforming them into smooth sessions through the use of input flow control. The optical packet-scheduling switch may comprise six subunits: 1) the packet clock recovery circuit for packet-bit synchronization; 2) the header extraction circuit, which incorporates the packet clock recovery functionality; 3) a header processing circuit that solves the packet routing algorithm for all packets, so that they may be routed to their desired outgoing links, and at the same time ensures that no contentions occur within the switch; 4) an XOR gate for header reinsertion; 5) a time slot interchange unit-termed Scheduler-whose role is to implement programmable delays between packets, as defined by the header processing unit, and to rearrange them, avoiding internal collisions; 6) a switching fabric for packet routing to their desired outgoing link that can be electrically or optically controlled.
On arriving at the node through an incoming link, the packet stream enters the header extraction circuit. This circuit comprises a packet clock recovery circuit, which in conjunction with an AND gate performs header separation on each packet, as previously described in Section III. This same clock recovery circuit also provides the clock signal to be used for powering all the following optical gates in the fabric and for new header reinsertion, or the control signal for the switches incorporated in the Scheduler. The extracted address information is then processed in the header processing unit. The aim of this unit is to recognize each packet address information and to generate the required signals that will determine the specific route for each packet, internally, in the Scheduler and the states of the switches employed in the switch fabric, as well as to generate the new headers for the data packets.
In principle, part of these operations can be performed all-optically using a local address generator [22] , a series of optical gates for header matching, followed by optical flip/flops [23] and a wavelength converter [32] , [33] for the recovered packet clock, in order to obtain the Control signal for the switches. More specifically, the extracted header on input to the processing unit is compared with each possible locally generated address. This may be performed using either XOR comparators [12] , [42] , or optical gates in different configuration for the generation of correlation pulses [7] , in case the incoming header matches the local one. The output correlation pulse can be used to input an optical flip/flop that can change between two different CW wavelength states ( and ), so as to set it in the desired state (wavelength). In that way header information is shifted into the wavelength domain [21] . The generated CW signal powers a wavelength converter, whose state is controlled optically by the previously generated packet clock. The -converted packet clock signal forms the control signal for the following optical switches used in the Scheduler and the switch fabric, so as to map out a specific route, according to the wavelength information.
Before entering the Scheduler units, the packets have their new addresses reinserted using a UNI gate that performs XOR logic operation. In practice, the new header and payload would be properly synchronized before the XOR gate in a way so as to be sequentially imprinted on the packet clock, as discussed in Section II. The Scheduler can in principle consist of a series of selectable fiber delay stages ( and ) connected with 2 2 exchange-bypass switches that may be controlled optically. High-speed all-optical exchange-bypass switches can be easily implemented, based on a special configuration of UNI, that incorporates two inputs and two output ports. In the absence of optical control pulses, the switch operates in the BAR state, and both data inputs pass straight through to the output ports. On the other hand, if control pulses are present, the switch is in the CROSS state and the two data inputs are interchanged at its output [43] . The state of the exchange-bypass switches is set by the header processor output, so as to define the matrix of possible packet delays in order to avoid internal collisions and resolve contention in the node [28] .
Finally, the packets enter the optical switch fabric, whose role is to route the appropriately time-delayed optical packets to their requested outgoing routes. Several topologies have been proposed so far for implementing an N N switch fabric using discrete switching devices [44] . All these topologies can be in principle implemented in the optical domain using either a set of UNI gates configured to perform simple AND logic or wavelength converters and wavelength-selective elements, or even SOA gate arrays in a more robust implementation [45] . Alternatively, an equal series of 2 2 optical exchange-bypass switches can be incorporated, as depicted in Fig. 10 . All these implementations offer significant performance advantage, since the internal optical paths of the switch fabric can be optically reconfigured on the bit level.
The main concept of the switch is to re-arrange incoming packets in a way that each packet requests a different outgoing link. The scheduling algorithms and the correct packet delay are determined by the header processing unit. A unique feature in the scheduler concept is its modular design, as it can handle traffic with exponentially higher burstiness, in the sense that adding a single switching element results in doubling its buffering capacity. This feature is not exhibited by other optical packet switches, which use fiber inefficiently only for inducing a fixed delay. To this end, the packet scheduling switch can guarantee lossless communication, if flow control protocols are used to force the traffic entering the network to comply with the highest burstiness property allowed [28] .
It is also important to point out the practical benefits, in total, from the system perspective that arise from the utilization of the main features and functionalities provided by the demonstrated optical circuits. First, the presented devices are particularly attractive for use in packet switching architectures, as they have the ability to operate on a packet-by-packet basis, even for asynchronous data traffic, and also allow for "on the fly" bit processing. Fast self-synchronization capability is provided due to the properties of the packet clock recovery circuit, offering improved bandwidth efficiency. The packet switch, finally, benefits from the simplicity in the design, as two of the prerequisite functionalities, i.e., time synchronization and header extraction, are performed using only two optical gates, and configuration of the Scheduler is released from complicated designs, since the overall number of the required switching devices is minimized by incorporating 2 2 optical switches triggered by packet clock signals.
V. CONCLUSION
We have demonstrated the successful operation of a number of critical modules that are needed for the realization of an all-optical packet node. We have also discussed how these units can be integrated to perform packet switching. The modules operate without the need for high-speed electronics and possess the ability for "on the fly" signal processing on a packet-by-packet basis with small bandwidth overhead, avoiding the requirement for optical buffering. Even though the realization of a "true" optical packet switch has not yet materialized, the rapid development and maturation of all-optical technology is likely to lead to this soon.
