Abstract-In this paper, we review recent advances in ultrafast optical time-domain technology with emphasis on the use in optical packet switching. In this respect, several key building blocks, including high-rate laser sources applicable to any time-division-multiplexing (TDM) application, optical logic circuits for bitwise processing, and clock-recovery circuits for timing synchronization with both synchronous and asynchronous data traffic, are described in detail. The circuits take advantage of the ultrafast nonlinear transfer function of semiconductor-based devices to operate successfully at rates beyond 10 Gb/s. We also demonstrate two more complex circuits-a header extraction unit and an exchange-bypass switch-operating at 10 Gb/s. These two units are key blocks for any general-purpose packet routing/switching application. Finally, we discuss the system perspective of all these modules and propose their possible incorporation in a packet switch architecture to provide low-level but high-speed functionalities. The goal is to perform as many operations as possible in the optical domain to increase node throughput and to alleviate the network from unwanted and expensive optical-electrical-optical conversions.
I. INTRODUCTION

D
URING THE PAST few years, we have witnessed significant changes in the optical communications industry, starting with an unprecedented explosive growth and followed with a recent recession. For wired networks, this growth was due to the remarkable improvement in the performance of photonic technologies as well as the proof of their maturity that has resulted in a spectacular decrease in transmission cost. This in turn has lead to the widespread adoption and construction of new fiber networks. At this critical point of the business cycle and in preparation for the next crest of the wave, it is appropriate to review research lines that have been maturing over the years and whose results may be incorporated in next-generation products.
At the time of fiber exhaustion due to the perceived bandwidth demand, carriers responded by adding more and more equipment and increasing traffic rates. The advent and penetration of wavelength-division-multiplexing (WDM) technology has resulted in fiber transmission capacities that have increased by a hundredfold or more. Switching speeds have, however, lagged far behind the increase of number of channels, resulting in networks in which fiber bandwidth is not efficiently exploited. This is because the hundreds of parallel channels in a fiber are employed as independent links, rather than as a shared resource. It is the router/switch throughput that really transforms the raw bit rates into effective bandwidth, and current switching technologies are capable of handling data rates of up to 10 Gb/s. While emerging asynchronous transfer mode (ATM) switches and Internet protocol (IP) routers can be used to switch data using the individual channels (typical rates of 2.5 or 10 Gb/s) within a WDM link, this approach requires tens or even hundreds of switch interfaces to terminate a single link with a large number of channels.
Research in all-optical switching and signal processing has made strides [1] - [4] so that the first commercially available units are now appearing on the market. Although general-purpose all-optical signal processing is still a long way off, there are specialized applications in high-data-rate telecommunications, where low complexity all-optical circuits are ideally suitable to provide functionalities that electronics cannot. In order for all-optical signal processing techniques to become a serious consideration, they must simultaneously possess the following four properties [5] : 1) considerable speed advantage and otherwise ability to simplify the circuit design; 2) switching energies that are similar to those of electronics; 3) capability for integration; 4) an application domain where these advantages may be of use. The purpose of this article is to present some of the recent advances in time-division-multiplexing (TDM) technology, demonstrating the important progress made so far toward the fulfillment of these requirements and more particularly the inroads made in the applications domain. Optical packet switching is a key application domain that all-optical signal processing can significantly contribute to increase switching speeds and dispense the network from the unwanted electrical-optical-electrical (E/O/E) signal conversions.
There are several critical subsystems that link the chain of the ultrafast time-domain technology and can be implemented all optically. These subsystems span from simple optical power supply modules and Boolean logic circuits up to complex routing/switching systems. Within this context, serious effort 0733-8724/03$17.00 © 2003 IEEE has been invested during the past few years in the development of optical interferometric gates [6] - [12] with Boolean logic capability [13] - [19] , short-pulse, high-repetition-rate laser sources [20] - [29] , and clock-recovery circuits [31] - [38] for extracting the timing information required at each network node. Going one step further, optical shift registers [39] - [42] with read/write capability [41] , [42] , optical packet buffers [43] , optical flip/flops [44] , exchange-bypass switches [45] , header separation and recognition techniques [46] - [51] , as well as self-routing switches [52] , [53] have been successfully demonstrated, and these are only a small part of the progress made up to now in the field. All these circuits take advantage of the fast nonlinearities in semiconductor optical amplifiers (SOAs) [4] , [54] , resulting in a whole new class of compact devices with low power requirements, which can be potentially integrated on a single-chip module [14] .
Within this context, in this paper, we review recent advances in the development of all-optical signal processing circuits and subsystems and discuss their possible use in optical packet switching. We present a 40-GHz SOA-based mode-locked ring laser source that produces 2.7-ps transform limited pulses [28] . The source has several attractive features, including that it is optically mode-locked by an external signal so that its synchronization to other optical sources is easy. This source can also be used for simultaneous, multiwavelength signal generation so that it can be used in hybrid WDM/TDM system applications as well. Multiwavelength operation has been achieved for ten simultaneously mode-locked channels at a 30-GHz repetition rate each [29] . Following this, we describe the principle of operation of semiconductor-based interferometric arrangements, capable of performing bitwise processing, and we demonstrate all-optical AND logic and XOR Boolean logic at 20 Gb/s, using the Sagnac and the ultrafast nonlinear interferometer (UNI), respectively [15] . Given that interferometric gates require signal synchronization to operate, clock-recovery units are of paramount importance. Two different clock-recovery techniques are described. The first one uses an SOA-based ring laser and is applicable to synchronous data traffic, and its operation has been evaluated with 30-Gb/s data signals [32] . The second technique uses a Fabry-Pérot (FP) etalon and an optical gate and is particularly applicable to packet traffic that may be asynchronous. Its operation has been successfully verified for 10-Gb/s synchronous and asynchronous data packets [36] , [37] . In order to verify the usefulness of the optical packet clock-recovery unit, a more complex all-optical circuit is demonstrated, capable of performing packet header from payload separation at 10 Gb/s. The header/payload separation scheme uses the packet clock-recovery circuit in combination with a digital, optical, AND gate and demonstrates the feasibility that optical logic modules can operate in subsystem applications. Similarly, using a single optical AND gate, we demonstrate an optically addressable 2 2 exchange-bypass switch. The AND gate has been configured to provide two inputs and outputs, and the switch has been shown to successfully operate with short data packets at 10 Gb/s. Finally, we discuss the way to use all these optical modules in an optical packet switch architecture so as to take advantage of their high-speed operation and "on the fly" processing capability. To this end, certain functions of the switch can be performed all optically, using the developed clock-recovery circuits, the header extraction units, a number of AND/XOR optical gates, and a number of cascaded 2 2 exchange-bypass switches.
The remainder of the paper is organized as follows. Section II presents the TDM key building blocks that include the ring laser sources, the digital logic modules, and the clock-recovery circuits. In Section III, the performance of the optical logic modules in subsystem applications is assessed, describing a header extraction unit and an exchange-bypass switch at 10 Gb/s. Finally, Section IV discusses the possible almost all-optical implementation of a packet switch architecture using the previously discussed optical modules.
II. TDM TECHNOLOGY BUILDING BLOCKS
A. High-Speed Laser Sources
High-repetition-rate short-pulse laser sources are essential for any TDM system application, in order to increase the link bandwidth utilization [55] . Performancewise, active mode locking of fiber ring cavities is one of the most promising methods for high-speed laser operation. These sources often use erbium-doped fiber amplifiers (EDFAs) as the gain medium and LiNbO modulators for mode locking, in ring configurations [22] - [27] and nonlinear compression schemes for pulse shortening at the output [22] . These sources have demonstrated impressive performance, but they are relatively complex, requiring special attention to ensure stability against environmental perturbations due to their long cavities that include polarization-dependent elements.
We have recently presented a novel SOA-based ring laser where the SOA provides both gain and gain modulation in the optical cavity [20] . Gain modulation is obtained by injecting an external low-repetition-rate signal into the cavity. A schematic diagram of this configuration is shown in Fig. 1 . The ring cavity consists of an SOA, a tunable filter for wavelength selection, isolatorsto ensure unidirectional oscillation, anda fibercoupler to insert/extract the external gain modulating and the mode-locked signal, respectively. The principle of operation of this concept relies on the fast gain saturation of the SOA by the external optical signal. The comparatively slow gain recovery of the SOA between two successive externally introduced optical pulses defines a short temporal positive net gain window within which the mode-locked pulse is formed. If the repetition frequency of the externally injected signal is an integer multiple of the fundamental frequency of the oscillator , the repetition rate of the laser source is equal to that of the external signal. Repetition rate multiplication can also be achieved by detuning the external signal frequency so that it satisfies the condition , where is an integer number greater than 1. With this technique the source mode-locks in a repetition rate equal to [24] and has been recently referred to as rational harmonic mode-locking [21] , [25] .
Successful 8-times multiplication has been demonstrated using an SOA-based ring laser. The ring cavity consisted of an SOA with 400-ps recovery time and 23-dB small signal gain when driven with a 250-mA current. The external gain modulating signal was provided by a gain-switched distributed feedback (DFB) laser at 1548.9 nm, producing 7-ps pulses with 5-GHz repetition rate. Fig. 2(a) shows the 40-GHz pulse train monitored on a sampling oscilloscope. The output pulses were not transform limited due to the frequency chirp imposed by the time-dependent refractive-index change of the SOA. Thus, they were linearly compressed to 2.7 ps. Fig. 2(b) shows the corresponding second harmonic autocorrelation trace. The pulsewidth-bandwidth product is 0.34. The source exhibited a 20-nm tuning range within which the 40-GHz pulse trains displayed nearly constant pulsewidth and output power [28] .
The same concept can be applied to obtain mode-locked pulses at several simultaneously oscillating wavelengths, by taking advantage of the heterogeneously broadened structure of the SOA. This can be realized by replacing the optical filter in the cavity with a fiber FP filter, allowing line oscillation at channel spacing equal to its free spectral range (FSR). Successful operation of the device has been demonstrated for ten oscillating channels, each one mode-locked at 30 GHz [29] . Fig. 3(a) shows the optical spectrum of the laser output, while Fig. 3(b) illustrates the second harmonic autocorrelation trace of the pulse train obtained at 1568.8 nm (ninth line), indicating 6.7-ps pulsewidth after linear compression, assuming again a squared hyperbolic secant profile. Spectral analysis also showed that the timing jitter of both the single-wavelength and multiwavelength source was less than 650 fs.
Both the single-and the multiwavelength laser configurations are simple to build from fiber-based components. A key feature of the sources is that they are capable of providing short optical pulses over a broad wavelength range, dispensing the need for high-frequency microwave drive circuits and thus are suitable for high-speed TDM application. In addition, a distinguishing feature is that the mode-locked channel(s) are phase-locked with the external modulating signal and are therefore ideally applicable in optical logic circuits. Finally, since their operation relies exclusively on the SOA, both sources can be integrated on a single-chip module. [30] .
B. All-Optical Logic Modules
For the realization of all-optical signal processing, digital optical circuits capable of providing Boolean combinatorial or sequential logic functionalities are necessary. This part describes implementations of optical gates to perform Boolean AND and XOR operations at data rates up to 20 Gb/s.
It was recognized relatively early that for the realization of all-optical gates, interferometric arrangements offered speed advantages and Boolean logic capability [6] , [13] . These arrangements consist, in general, of two separate optical paths where the phase of the optical field may be independently controlled in a nonlinear optical medium by optical means. Early research efforts used optical fiber as the nonlinear material exploiting the ultrafast Kerr nonlinearity [9] , [56] . However, the fiber nonlinearity is rather weak, requiring long pieces of fiber, making the devices difficult to integrate and their switching energies high. It was the realization that fast phase variations can also be obtained in semiconductor media (SOA), as a result of the gain and refractive-index dependence on the carrier density, which has more recently sparked off the development of compact and very low switching energy devices [12] , [15] - [18] . These can be broadly separated to either discrete SOA interferometric gates or integrated devices. The basic nonlinearity that these devices exploit is resonant and for this reason switching has been shown repeatedly to be possible with energies of few femtojoule for pulses of few picoseconds duration, in very compact devices. Fig. 4 shows such an interferometric arrangement drawn as a Mach-Zehnder interferometer, but which could in fact be any type of interferometer as the Sagnac Michelson, or UNI.
In the example of Fig. 4 , assuming that on each of the two interferometer paths the phase of the optical field of the clock signal, termed CLK in Fig. 4 , may be changed by , depending on whether one or both of the external controlling signals A and B are present, interference on the output coupler causes the result of the Boolean addition XOR between A and B to be written on CLK. The remainder of the Boolean operations may be implemented similarly.
In the sequence, we present AND and XOR logic operation at 20 Gb/s using the Sagnac and the UNI interferometer, respectively. Fig. 5 shows the experimental setup that uses the SOA-assisted Sagnac gate, configured to perform AND logic operation. The optical gate was formed using a 3-dB fiber coupler and two polarization beam splitters (PBSs) to couple in and out the orthogonally polarized pulses of the logical control input. The SOA used was a 1.5-mm-long bulk-type device with a small signal gain of 25 dB at 1550 nm and 80-ps recovery time, when driven with an 800-mA dc current. Optimum switching was obtained by spatially offsetting the SOA from the center of the loop by 25 ps, using a variable optical delay line. The signal to be switched, which is termed here as the CLK signal, was split into two components that counterpropagate the loop. The control pulses were temporarily synchronized with the clockwise CLK component using an optical delay line so as to induce a relative phase difference between the two CLK components. In that case, the CLK signal emerges at the switched (S) port of the gate. In the absence of the control pulse, the signal emerges from the other, the unswitched (U) port of the gate.
The CLK signal was obtained from a 20-GHz mode-locked fiber laser at 1545 nm, producing 4-ps optical pulses after linear compression, and was inserted into the gate via an optical circulator. The control signal was provided by a gain-switched DFB laser, driven at 10 GHz and generating 10-ps optical pulses at 1550 nm, after linear compression. The output pulse train was modulated with a pseudorandom bit sequence (PRBS) signal at 10 Gb/s, had its repetition rate doubled by bit interleaving in a fiber doubler, and introduced into the gate via a PBS.
For a successful Boolean AND operation between the control and the CLK signal, the switched port of the gate must record a logical "1", only when both control and CLK signals are "1", and a logical "0" if either one or both signals are "0". signals, while the U port provides the NAND operation. The contrast ratio of the U and the S port of the switch was found to be as high as 7:1 and 11:1, and the required energies for the CLK and the control pulses were 5 fJ and 25 fJ, respectively.
For dual-rail optical XOR logic, the UNI was used. The UNI gate is a very promising switching device, because it is a balanced single-arm interferometer and its structure compensates for the long-lived nonlinearities of the SOA [11] and allows for its cascadability [19] . For the XOR logic operation, three optical signals are used as inputs into the gate: control signals A and B and clock signal CLK. The CLK signal is held continuously to a logical 1 on input and the result of the logic A XOR B is imprinted on it. Fig. 7 shows the experimental setup of the UNI gate configured to perform XOR logic. The UNI gate itself consisted of two similar polarization split-and-delay (PSD) structures, which consist of a PBS, cross-spliced at 45 with a length of polarization-maintaining fiber (PMF). Upon entering the UNI, the clock signal is analyzed into its two orthogonal polarization components, which are relatively delayed for the purposes of measurement. Similarly, on exiting the SOA their relative delay is removed and the two clock components are forced to interfere in the output PBS. In this experiment, these PSD components are designed to add or subtract a relative delay of 25 ps. The two control signals A and B were arranged to counterpropagate the CLK signal in the SOA and were synchronized with the relatively delayed and advanced components of the CLK signal. Whenever the two control pulses are present, the relative phase difference between the polarization components of the clock becomes zero at the exit PBS. The active element in the UNI gate was a 1.5-mm, bulk, InGaAsP/InP, ridge waveguide SOA, with a small signal gain of 30 dB at 1558.9 nm and 80-ps recovery time, when driven with 750-mA dc current.
The optical signals were produced by two DFB lasers, LD1/LD2, gain switched at 10 GHz and emitting at 1545.2 nm and 1554.6 nm, respectively. Both lasers produced 9-ps optical pulses after linear compression. LD1 provided the CLK and control A signal, and LD2 provided control signal B. The output of LD2 was modulated using a LiNbO modulator, driven from a programmable pulse generator. The CLK signal from LD1 and the control B from LD2 had their repetition rate doubled by bit interleaving in a split-relative-delay-and-recombine fiber doubler. For the results shown here, control A was a 10-GHz full duty-cycle signal and control B a 20-Gb/s pseudodata pattern. For successful Boolean XOR operation between A and B, both control signals must be present in the gate and the switched port must record a logical "1" if either control pulse is "1", and a logical "0" if both control pulses are "1" or "0". Fig. 8 illustrates the logic output of the gate with data for the four logical combinations of control A and B. Bit-by-bit checking proves that correct operation has been obtained. The contrast ratio between the ON-OFF states of the switch port of the gate was 5:1 and the required energies of CLK and control A and B pulses were 2.5, 4, and 10 fJ, respectively. Note that the switching energy values are indeed low. The temporal switching window in this case was measured by varying the arrival time of the control B pulses into the gate, with respect to the other signal pulses. The resulting 3-dB timing window was found to be about 30 ps [15 ] .
In this part, AND/XOR logic operations have been demonstrated at 20 Gb/s with very low switching energies. This implies that the gates can in principle operate in an integrated circuit optimized for low losses without the need for external signal amplification [14] . These logic modules are the elementary processing units for a range of signal processing applications. Digital AND is the most common Boolean function and has been used in demultiplexing [54] , wavelength conversion [16] , as well as 3R-regeneration [4] . The dual-rail XOR logic gate is also of paramount importance because it allows the full complement of Boolean bitwise logic to be performed on ultrafast optical signals. XOR gates can be used to implementcomplexopticalprocessingcircuitslikeaddressrecognition schemes [48] - [52] , data encoders/decoders, or optical decision circuits [8] .
C. All-Optical Clock-Recovery Circuits
Clock-recovery circuits are front-end units used to generate the local clock signal, to assist in performing data recovery in a receiver, and to provide the means for synchronization in the following layers of gates used in digital signal processing. In particular, data recovery is of paramount importance, because of the induced timing jitter to the data pulses, either from the poor transmitter performance or from the transmission of the signal between two nodes.
Generally speaking, a clock-recovery circuit consists of a high-filter tuned to the data line rate, in order to average out the incoming data pattern, and an active element to generate the clock signal. In the case of optical clock-recovery circuits, these two functions are generally provided by a single element-a laser oscillator. Several alternative techniques have been proposed for optical clock extraction from both continuous and packet data, and these include mode-locked ring lasers, phase-locked loops, FP filters, and self-pulsating DFB lasers [31] - [38] . Recovering a clock signal from packet traffic, which may be asynchronous, is significantly harder because, ideally, the clock signal must persist only for the duration of each packet. Hereafter, we describe two clock-recovery circuits, one for continuous and a second one for asynchronous packet traffic, which are suitable for all-optical processing.
1) Clock-Recovery Circuit for Continuous Data:
This unit is based on the fiber ring oscillator presented in Section II-A [32] . This time instead of using a periodic external optical signal to provide the modulation, a pseudodata sequence was used. The experimental setup is shown in Fig. 9 . To test the ability of the laser source for clock-recovery, a flexible pseudodata generator was constructed, capable of generating patterns with long series of consecutive zeros. The continuous data-stream generator consists of two DFB lasers, gain-switched at 7.5 GHz which provided 7-ps optical pulses. The output of both lasers was modulated with a 7.5-Gb/s PRBS signal and bit-interleaved in a fiber doubler to form a 30-Gb/s continuous pseudodata stream. This data signal was then inserted into the ring laser via a fiber coupler.
The ring cavity is similarly constructed with the one described before, with the addition of an optical delay line, for precise matching of the repetition frequency of the clock-recovery circuit to the rate of the incoming data pattern. For clock recovery, the oscillator must mode-lock from the external data signal. Due to the cavity Q, the mode-locked pulse train persists even for sequences of consecutive "0"s, but with a modulation. Fig. 10(a) and (b) shows samples of the input PRBS data sequences with maximal length of and , respectively, while Fig. 10(c) and (d) shows the corresponding recovered clock. Note that the clock pulses in both cases have negligible amplitude modulation. The output pulses were also monitored on a second harmonic autocorrelator and found to be 2.5 ps wide, after linear compression. The performance of the circuit was also examined with very long sequences of "0"s, and the modulation on the the recovered clock was less than 12%, even for input data sequences with 212 consecutive "0"s [32] .
2) Asynchronous Packet Clock Recovery: This clock-recovery circuit was recently demonstrated and was specifically designed to handle asynchronous and short optical packets. In order to be able to generate a clock signal only during the packet duration, this method does not use an oscillator. Instead, it uses an FP filter with FSR equal to the line rate, which is then followed by a nonlinear optical gate [36] , [37] . The purpose of the Fabry-Pérot filter (FPF) is to partially fill the "0"s, thereby creating a packet clock-looking signal, which is, however, amplitude modulated. This signal is used as the control signal into an optical gate so as to remove this modulation via its nonlinear transfer function. With this arrangement, an asynchronous packet clock signal of nearly equal amplitude "1"s and very short rise and fall times is obtained. Fig. 11 shows the experimental setup used for the demonstration of clock recovery from asynchronous packets at 10 Gb/s. It consists of two main subsystems: the asynchronous packet flow generator and the packet clock-recovery circuit itself. The asynchronous packet generator consisted of a gain-switched DFB laser, driven at 1.290 75 GHz and emitting 9-ps pulses after linear compression at 1549.2 nm. This pulse train was modulated to form a PRBS signal using a PRBS generator, three-times bit-interleaved and modulated again using a programmable pulse generator to form a 10.326-Gb/s pseudodata synchronous packet stream. This stream was then split into two different optical paths via a 3-dB fiber coupler. The two paths were made so as to provide a maximum of 17.9-ns relative delay between the split signals. Variable optical delay lines ODL1 and ODL2 were used to independently control the relative arrival time of the packets at the clock-recovery circuit so as to investigate the minimum acceptable delay between successive packets and to assess the circuit ability to operate with successive packets that are asynchronous at the bit level. The asynchronous packet stream was then launched into the packet clock-recovery circuit, which consists of an FPF and a UNI gate, powered by a continous-wave (CW) signal at 1545 nm. The FPF had an FSR equal to the line rate and a finesse of 20.7, corresponding to a 1/e lifetime roughly of 7 b. The output of the filter was amplified and inserted into the UNI gate as the control signal so as to counterpropagate the CW beam. In this case, the two similar PSD structures of the UNI introduced and removed a relative delay of 50 ps before and after the SOA, allowing gate operation at 10 Gb/s. The active element was a 1.5-mm, bulk, InGaAsP/InP, ridge waveguide SOA, with a 27-dB small signal gain at 1550 nm, 24 dB at 1545 nm, and a recovery time of 100 ps, when driven with a 700-mA dc current.
This packet clock recovery implementation provides a different approach to the techniques proposed so far, since it separates the filtering function by using a FP etalon tuned to the line rate and a high-speed amplitude equalization function that is provided by the nonlinear switching function of a gate. At the output of the FPF, the asynchronous packet stream is convolved with the exponentially decaying response function of the filter so that a deeply amplitude modulated but, nevertheless, clock-resembling signal is obtained. This is used as the control signal into the UNI gate to induce an almost -phase shift between the orthogonal polarization components of the CW signal. This results in an asynchronous packet clock signal of nearly equal amplitude "1"s and very short rise and fall times. Fig. 12(a) shows a typical data stream, obtained from the asynchronous packet flow generator. More specifically, a sequence of four data packets is illustrated, each packet containing 41 b (approximately 4-ns duration). Packets #1 and #3 are traveling through the upper and packets #2 and #4 are traveling through the lower branch of the packet generator. The coarse relative delay between packets #1 and #2 was 1.5 ns and between packets #2 and #3 was 2.9 ns. Fig. 12(b) shows the corresponding recovered packet clocks as they appear at the S port of the gate. The packet clocks display rise and fall times of 2 and The effect of the filter and the gate in the frequency domain has been also examined using a 50-GHz microwave spectrum analyzer. Fig. 13(a) and (b) depicts the radio frequency (RF) spectrum from dc to 11.2 GHz of the optical signal at the output of the asynchronous flow generator and at the output of the FPF, respectively. It can be easily observed that the filter primarily suppresses all data modes outside a 500-MHz band around the baseline rate, which essentially corresponds to its bandwidth. Data-mode suppression within this 500-MHz band is achieved by taking advantage of the nonlinear transfer function of the deeply saturated SOA-based interferometric gate. This effect is illustrated in Fig. 13(c) and (d) , showing in more detail the output of the FP filter and the output of the gate around the 10.326-GHz clock component, respectively. Asynchronous operation is confirmed by the suppressed line rate component with respect to the adjacent 80-MHz subharmonics. Fig. 13(d) reveals data-mode suppression in excess of 35 dB with respect to the 80-MHz-spaced packet subharmonics.
The nonzero rise and fall times of the signal require that the sequence of data packets must include guard bands, and these depend on the finesse of the FPF. In turn, the finesse of the filter determines its ability to fill the missing "1"s in the data packet and must be determined by the expected sequence of continuous "0"s in the data stream. A longer sequence of consecutive "'0"s requires the use of a FPF with higher finesse at the expense of slower rise and fall times in the derived clock signal [36] . It is important to note the number of bits required, as the guard band is independent of the line rate since it scales with the bit period. Other practical advantages of the circuit are that it is self-synchronizing, it does not require high-frequency electronic circuitry, and, at least in principle, it may be integrated. Clock recovery with this approach is therefore particularly applicable to optical packet switching and is expected to assist the implementation of more complex processing circuits. To illustrate this, the following section describes a method for packet header extraction from short 10-Gb/s packets.
III. ADVANCED ALL-OPTICAL SUBSYSTEMS DESIGN AND IMPLEMENTATION
In this section, we present two advanced optical processing units, namely, a header extraction circuit and an exchange-bypass switch. These units use the aforementioned optical modules, whose performance is tested in a subsystem environment at 10 Gb/s. The header extraction and exchange-bypass switch are key building blocks for almost any data processing and routing application and have been implemented with emphasis to be used in a packet switch architecture to provide "on the fly" payload/header separation and switching within the bit period, respectively.
A. 10-Gb/s Header Extraction Circuit
Several methods have been proposed so far for payload/header separation [46] - [51] . In this section, we present a simple all-optical header extraction circuit that is applicable even to traffic of very short 10-Gb/s packets. The technique uses the previously described packet clock-recovery module, with an additional UNI logic gate at its output to perform the separation [47] .
The block diagram of the experiment is shown in Fig. 14. It consists of three subunits: a packet flow generator, a packet clock-recovery circuit, and an additional UNI gate (UNI 2) built identically to UNI 1, configured to perform AND logic operation. The packet flow generator is similar to the asynchronous packet flow generator described in Section-II-C, but modified so as to provide synchronous packets at its output by blocking one of the two branches. The output of this packaareet generator is split into two parts, one used as input in the clock-recovery circuit and one to enter as data signal in UNI 2. The extracted optical packet clock signal that persists for the duration of the original data packet is used as the control signal in UNI 2. For header separation, UNI 2 is configured to perform a simple AND logic operation between the original incoming packet stream and a delayed version of the recovered packet clock. Successful header extraction is achieved if the original packet and the extracted clock are relatively delayed in time by an amount equal to the packet address length plus the rise time required for clock acquisition. As such, the packet clock is delayed in an optical delay line so that only the payload bits of the original packet fall within the window of the recovered clock and are therefore switched at the output of the gate. In this fashion, header information appears at the U port of UNI 2, since there are no clock pulses to switch the gate, while the payload data exits from the S port. The insets of Fig. 14 detail the principle of operation of this scheme. Insets (a) and (b) show the two signals used as the CLK and the control signal, respectively, for the AND operation in UNI 2. Inset (a) depicts a data packet containing 2 "1"s as preamble bits to assist the clock extraction process, 8 b assigned to the header, a guard band of two "0"s to avoid switching by the first two imperfect bits of the clock circuit, and finally 18 b for the payload. Inset (b) shows its corresponding self-extracted clock, delayed by 1 ns with respect to the original packet. In this way, the header and the payload appear separated at the U and S port and are shown in the insets (c) and (d), respectively, of Fig. 14 . Fig. 15 shows a typical result from the circuit. In particular, Fig. 15(a) shows an incoming packet, containing 30 b, with approximately 3-ns duration arriving at 6.2-ns intervals. Fig. 15(b) illustrates the corresponding self-extracted clocks at the output of UNI 1 and are appropriately delayed for the AND operation with the packet of Fig. 15(a) . This figure shows that UNI 1 has reduced the amplitude modulation of the clock pulses to 1.5 dB, the packet clockriseswithin2b,andit fallsto1/ewithin8b.Switchingpower and mean energy per pulse for the CW and the control signal were 1 mW and 120 fJ, respectively. These two parameters determine the degree of saturation of the SOA and enable the control pulses to affect a differential phase change of roughly between the two polarization components of the CW signal in the UNI.
To complete the header/payload separation task, the properly synchronized packet and self-extracted clock signal are logically ANDed together in UNI 2. Fig. 15(c) and (d) shows separated the header and payload of the packet at the U and S port of UNI 2, respectively. The pulse energies for the signals interacting in UNI 2 were 2 fJ for the data pulses and 24 fJ for the packet clock pulses.
The average contrast ratio between the ON-OFF states of UNI 2 at the output of the circuit was 8:1 for the payload (S port) and 6:1 for the header(U port). The datapatterningseen in Fig. 15 is primarily due to imperfect rate multiplication and can be largely eliminated if a 10-Gb/s transmitter source is used.
This technique for header extraction offers the advantage of serial, bit-by-bit processing on a packet-by-packet basis with small associated bandwidth overhead, because of the fast self-synchronization capability of the clock-recovery circuit. Moreover, the circuit can easily handle packets of any length or with any address to payload ratio with no increase in its complexity. In fact, for packets with different address lengths, one has only to alter the temporal delay between the optical signals at the input of UNI 2. It should also be noted that operation of this scheme could be in principle extended to higher rates without an increase in guard bands, since they are determined by thepacketclock-recovery circuit and are independent of the line rate.Finally,the circuit is self-synchronizing and requires only the incoming packet as input to operate, dispensing the need of high-speed electronics.
B. 10-Gb/s Optically Addressable 2 2 Exchange-Bypass Packet Switch
Implementation of fast optical switches has long attracted the interest of numerous research groups. Toward this end, several switch matrices based on semiconductor materials in guided-wave structures have been demonstrated [57] - [61] . Although these structures have shown impressive performance in systems experiments, however, they are best suited for long packets because of their relatively slow switching speeds compared with the bit period. In this part, we demonstrate an optically addressable, 2 2 exchange-bypass switch based on an AND optical gate, operating with data packets of arbitrary length at a 10-Gb/s line rate. This switch is particularly applicable in optical packet switching because it possess the ability to change all optically its connectivity during the time scale of a packet and possibly to provide some limited processing at the bit level.
The principle of operation of an optically addressable exchange-bypass switch is shown in Fig. 16 . The 2 2 switch uses a UNI gate, which has been configured to provide two inputs and outputs for counterpropagating signals. Data signals 1 and 2 enter through the input ports 1 and 2, respectively, and travel through identical lengths of birefringent fiber that are at the end sections of the interferometer, spliced at 45 with the output fibers of the two PBSs. As a result, each data pulse is separated into two orthogonal polarization components that are relatively delayed by half a bit period or 50 ps. The two switch outputs, 1 and 2, were obtained by merging the unswitched and switched ports of the two PBSs. The control signal is launched via a fiber splitter so that it co-travels with data signal 1 and is temporarily synchronized to the preceding of the two orthogonal polarization components of each data pulse. If there is no control signal, the switch is in the BAR state, and both data signals pass straight through to the output ports 1 and 2. If the control signal is present, the switch is in the CROSS state, and the two data streams are interchanged at its output. The length of the bit sequence that is interchanged through the switch is determined by the length of the control signal and may be arbitrarily long or short, depending on the length of the incoming packet. In the present demonstration, the control signal packet had a duration of 3.2 ns and a period of 6.4 ns, while the data packet of signal 1 had a duration of 2.5 ns and a period of 6.4 ns. In this way, the input data signals can be viewed as sequences of data packets with fixed length of 3.2 ns or 32 b long, while the control signal as a packet of 32 consecutive 1s with 3.2-ns duration and a period of 6.4 ns. Fig. 17 shows typical results of the operation of the switch. In particular, Fig. 17(a) and (c) shows the input data signals 1 and 2 into the switch, whereas Fig. 17(b) and (d) shows the corresponding output signals for the BAR and CROSS states, respectively. Finally, Fig. 17(e) shows the control sequence. In the BAR state, the data packets from signals 1 and 2 depicted by the thick dashed and thin dotted lines, respectively, cross the switch unchanged. When the optical control signal is present, the switch is in the CROSS state, and the packets are interchanged in the output ports. The pulse energies for the data 1, data 2, and control pulses were 2 , 2, and 8 fJ, respectively. The quality of the output pulse streams in both switch states is high. The crosstalk of the switch in the BAR and CROSS states was 12 and 10 dB, respectively. The error rate of the switch was also evaluated in static operation, with a synchronous digital hierarchy/synchronous transport module (SDH/STM) 64 test signal from a network analyzer by Acterna, and it was found to be less than for both data signals in both switch states. The performance of the switch was evaluated with different packet lengths and periods including packet lengths in the 43-ns length range and always showed similar behavior. Operation at the 43-ns time range is important because it corresponds to ATM packets at 10 Gb/s that are expected to have the shortest duration. A distinguishing feature of this switch implementation is that it relaxes the requirements for guard bands between the packets, since it can switch within a bit period. In avoiding guard bands, the improvement in network capacity and bandwidth utilization becomes pronounced as the packet length decreases or the line rate increases. Finally, it requires low pulse energy to switch and is optically addressable, dispensing the need for high-speed electrical packaging requirements for the switching elements.
IV. SYSTEM PERSPECTIVE OF ALL-OPTICAL MODULES:
OPTICAL PACKET SWITCHING
The demonstration of the all-optical signal processing circuits presented in the previous sections indicates some the significant progress made in the field during the past few years. As a result, the exploitation of the advantages offered by all-optical implementations in system applications can be envisaged. In this section, we describe as an example a possible integration of such units in an optical packet switch architecture. The rudimentary functions of a packet switch include forwarding, synchronization, regeneration, routing, switching, and buffering operations [62] . The goal here is to perform as many of these processes as possible in the optical domain to exploit the distinguished properties that the previously mentioned optical modules possess, namely, speed, low switching energy, and integration capability, and to combine them in an application to offer a performance advantage as well as to avoid the unwanted O/E/O conversions. Fig. 18 displays the block schematic of an optical packet switch architecture, in which certain operations have been subtracted from the electrical domain and are performed all optically. The design is ideally suitable for slotted operation, while for unslotted operation, complexity rises significantly and the simplicity of the low functionality optical modules is lost.
Forwarding is the most complex task. For each incoming packet, the switch must process the header data and determine its output port. It can also make changes in the header itself and reinsert the new header data in the packet at the output. In the packet switch design shown in Fig. 18 , header extraction circuits, such as the one described previously, have been used to separate payload from the header in the data packets arriving in each of the input links. The extracted header information is fed in an electronic control unit for further processing, which makes the routing decisions. The header extraction circuit also provides the clock signal for all the following optical circuitry. For header data reinsertion, an XOR optical gate is used. In practice, the new header and payload would be synchronized before the XOR gate and would be sequentially imprinted on the packet clock.
Routing is performed electronically after header processing. The electronic control unit shown in Fig. 18 is commissioned to maintain up-to-date information of the network topology. This information is maintained in the form of a routing table and depending on the loop-up table result, the proper control signals are generated that determine the state of the switch fabric.
Synchronization is the actual process of aligning two signal streams in time. In optical packet switching, it refers to the packet alignment at the switch input interfaces. This is necessary in order to obtain good switching performance. Packet synchronization in the optical domain can be achieved by cascading 2 2 exchange-bypass switches to form a matrix of delays, as shown in Fig. 18 [59] , [62] . The state of each switch and, thus, each packet delay is determined by the control unit, whereas the number of the elementary switches is determined by the resolution required.
Switching is the actual process of forwarding the incoming packets to the appropriate outgoing links, determined by the routing process. In the electronic domain, routing and switching are coupled together, but in the optical domain, they are clearly separated. Several topologies have been proposed so far for implementing an switch fabric using 2 2 switches as the elementary switching elements [62 ] . All these topologies can be in principle all optically implemented using a series of the optically addressable 2 2 exchange-bypass switch, presented in Section III-B. Such an implementation offers a significant performance advantage since the internal optical paths of the switch fabric can be optically reconfigured on the bit level.
Buffering is important within any packet router. Buffering is necessary when multiple packets arrive simultaneously at the switch inputs and request the same outgoing link. However, only one packet can be switched to the same output port at a given time. Thus, the packet switch must have the ability to buffer the packets until they get their turn. In the packet switch design of Fig. 18 , buffering takes place within the switch fabric with a recirculating loop. In this way, a circulating shift register is introduced so that it can store the packet and forward it to the next stage of the switch fabric whenever there is an empty slot at the switch input [41] .
Regeneration is a relatively more straightforward task that is performed in an AND optical gate with the locally recovered optical packet clock.
Finally, the proposed scheme is wavelength independent, i.e., it operates successfully even if optical signals of different wavelengths are inserted in the incoming links. To this end, the scheme can be used in the dense-wavelength-division-multiplexing (DWDM) domain with the addition of a 1 arrayed-waveguide grating (AWG) at the input and output of the switch. The AWG should have as many input ports as the different incoming wavelengths. In this way, the input AWG demultiplexes the different wavelengths into signals, and subsequently, each set of signals is treated separately by the corresponding incoming link of the proposed design.
V. CONCLUSION
In conclusion, we have demonstrated the successful operation of a number of critical all-optical modules that form the basis for further development of the optical time-domain technology chain. The presented all-optical modules possess several distinguishing features and advantages compared with electronic equivalents and these are switching speed, femtoJoule switching energy and integration capability.
We have also presented an optical packet switch architecture within which certain functionalities are performed in the optical domain, using the developed optical modules. Such an optical design offers a significant cost and performance advantage that stem from its ability to process high-speed data signals bit by bit.
