We demonstrate the applicability of quantum-dot semiconductor-opticalamplifier based wavelength converters to the implementation of an ultra-high speed optical packet switching buffer. The buffer architecture consists of cascaded programmable delay stages that fully utilize the available wavelengths and thus minimize the number of wavelength converters that are required to implement the buffer. Physical layer simulations demonstrate error-free operation of the buffer with 3 cascaded Time-Slot-Interchager (TSI) stages at 160 Gbps in the 1.55um window.
Introduction
Quantum dot semiconductor optical amplifiers (QDSOAs) are envisaged as the next generation optical signal processing devices in ultra high speed networks. This originates from their unique features such as high differential gain and ultra-fast gain recovery (~100fs) [1] , which make them superior to conventional bulk or quantum well (QW) devices. Moreover, their enhanced properties make QDSOAs ideal candidates for signal processing applications at line rates that reach 160Gb/s.
In the current communication we demonstrate the applicability of QDSOAs in an optical buffer architecture that is suitable for ultra-high speed optical packet switching. Packet buffering is implemented in a multi-stage Time-Slot-Interchanger (TSI) that consists of QDSOA-based wavelength converters exploiting the cross-gain modulation (XGM) effect and feed-forward delay lines. Moreover, the TSI stages are designed to fully utilize the number of available wavelengths, which are defined in such a way to prevent spectral overlap in multi-hundred Gb/s line rates and also to fit within the spectral width of homogeneous broadening of a single-dot group. The use of many wavelengths is critical when considering the hardware cost of the buffer architecture and the signal quality degradation due to successive wavelength conversions. The buffer architecture is accompanied by physical layer simulations that derive the achievable buffer size in terms of cascaded TSI stages and illustrate its efficient performance in terms of output Q-factor and extinction ratio. Simulations reveal that up to 3 TSI stages may be cascaded for error-free operation at 160 Gbps.
Buffer Architecture and Control

Buffer Architecture
The proposed system architecture is presented in Fig. 1 . It comprises cascaded programmable delay stages and each stage consists of two Tunable Wavelength Converters (TWCs) and two delay line banks. Each TWC provides w separate wavelengths at its output, and each wavelength is routed to the respective delay line of the delay line bank by means of a wavelength de-multiplexer. Adjacent TWCs and stages are connected by wavelength multiplexers. The delays that are introduced at each TSI stage are a design parameter of the proposed architecture and we will evaluate them from the timeslot transition graph (TTG) of the TSI architecture [2] . The TTG consists of nodes located at columns and rows; columns i and i+1 correspond to the input and output, respectively, of TSI stage i, and rows correspond to timeslots, each being occupied by a single optical packet. An optical packet that has arrived at the input of stage i and accesses one of the delay lines is placed in an output timeslot that appears later in time, and this action is represented as a straight line (time transition) connecting an input and an output node on the TTG.
Buffering a packet in the proposed design corresponds to a path on the TTG. The origin node represents the input-slot on which the packet has arrived and the destination node represents the output-slot where the packet leaves from. Taking into consideration that more that one packets arrive at the buffer inputs within a timeframe, it is evident that during each timeframe, an interconnection pattern which maps input to output nodes is formed on the TTG. The aim is to engineer the time transitions, or equivalently the delay times D(i,j) at each stage, so that the interconnection pattern forms a log n -Benes graph as a subgraph. The log n -Benes graph is derived from the log 2 -Benes graph after replacing the 2x2 with nxn switches. Furthermore, it is considered a well-known re-arrangeably nonblocking interconnection topology.
The purpose of constructing the log n -Benes space-time graph is many-fold: the implementation requires a minimum number of serially connected stages that equals to
for buffering maximum N number of packets. Eq. (1) shows that by implementing the log n -Benes space-time graph, one can achieve a drastic reduction in the number of stages. This is of particular importance when considering the hardware cost of the implementation. Moreover, physical layer impairments aggravate the optical signal quality as the number of cascaded stages increases. An additional attribute of the Benes spacetime graph is that, it is re-arrangably non-blocking resulting in the capability of the proposed design to store packets without suffering internal collisions. Finally, finding collision-free paths within the Benes graph is a well studied problem [3] . The building blocks of the log n -Benes graph are nxn switches, and thus the first step for constructing it, is to determine the switch size. In a previous communication, we had proposed the deployment of a single TWC per stage [4] for constructing the switches. This resulted in poor wavelength utilization, which amounted to approximately 50% of the available wavelengths, for the switch formation. Thus, the number of available wavelengths limited by the relation of the single-dot bandwidth and the detuning between adjacent wavelengths is of key importance. Therefore, it is proposed to double the number of TWCs that are required per stage, with objective to achieve almost 100% wavelength utilization when forming the nxn switches.
The switches are formed out of time transitions on the TTG, as shown in Fig. 2(a) , which corresponds to the first stage (stage 0) of the buffer. A packet that has arrived at the input of stage 0 during the first timeslot may only access timeslots {1, …, w} at the output of the first TWC, since time transitions to previous timeslots are not allowed. In a similar fashion, the timeslots that are accessible by the aforementioned packet at the output of the second TWC are limited to {1, …, 2w-1)}. If n successive input packets are considered, then the timeslots at the output of the first and second TWC that are accessible by all input packets are {n, …, w} and {n, …, 2w-1}, respectively.
The switch formation requires that the output timeslots that are accessible by all input packets exceed the number of packets themselves, so that the nxn switches may be always formed on the TTG. This is equivalent to
Moreover, the interconnection network that corresponds to the nxn switches must be nonblocking, so that packets do not arrive simultaneously at a TWC and are therefore not lost. This is satisfied by ensuring that there are at least two disjoint paths between all input and output timeslot nodes inside the switch. To ensure this, we take into account that the midnodes (nodes at the output of the first TWC) that are accessible to all input nodes are {n,…,w}. Provided that there are at least two fully accessible mid-nodes, shown in white colour in Fig. 2 (a), there are always at least two disjoint interconnection paths towards the mid-nodes of the switch, and as a result
Additionally, the existence of the two disjoint paths between the mid-nodes and the output nodes is assured when the output nodes are limited to {w, …, 2(w-1)}. The switch is therefore formed (a) after selecting n-2 mid-nodes that are symmetrically located above and below the two fully accessible mid-nodes when n is even, or (1) after selecting 1 2 n − and 3 2 n − mid-nodes above and bellow the fully accessible mid-nodes, respectively, when n is odd. Eq. (3) shows that almost full utilization of the available wavelengths has been achieved with the proposed buffer architecture.
The next step for constructing the log n -Benes graph is to determine the time transitions that form the graph's switches in the respective stages. The process is shown in Fig. 2 
The delays account for all time transitions on the TTG, even though not all transitions contribute to the formation of the virtual switches. The inactive transitions introduce a constant delay after which the output timeframe commences (white squares in Fig. 2(b) ). At the output of each buffer stage, the delay equals
timeslots and as a result the delay that the packets experience when traversing the buffer is 
Eq. (5) may be viewed as a constant buffer access time.
Buffer Control
Following the discussion of subsection 2.1, packets are buffered after being converted to the appropriate internal wavelengths and accessing the respective delay lines at each programmable delay stage. As a result, buffering requires that the state of wavelength converters has to be set prior to sending the packets to the buffer. From a TTG perspective setting the internal wavelengths is equivalent to calculating the state of the switches in all intermediate stages of the log n -Benes graph of in Fig. 2 (d) so that the input packet sequence is routed to the respective output sequence.
To perform routing in a log n -Benes graph, we have proposed a modified parallel routing algorithm [4] that extends the parallel routing algorithm on a binary Benes graph [3] . The algorithm involved setting the state of the outermost switches (at stages 0 and s-1) of the Benes graph given the respective packet sequences. The outermost switches are then omitted, and the remaining network is partitioned into multiple Benes graphs of reduced size. The algorithm is recursively applied on the resulting graphs until the state of all switches is set. Having determined the state of the switches, it remains to calculate the interconnection pattern inside the switches. We consider that each mid-node k of the switch is described by a set S k that contains the input and output nodes it is connected to. Due to the symmetry of the switch, mid-nodes always connect to the same group of input and output nodes. Supposing that input node i must connect to output node π(i), there are at least two midnodes that allow for this connection, since there are at least two disjoint paths between input and output nodes. The mid-nodes k that enable this connection satisfy
An algorithm for selecting the mid-nodes so that all input-to-output connections are performed over disjoint paths (without collisions) is illustrated in Fig. 3 . The algorithm involves the following steps: The proposed buffer architecture aims operation at 160Gb/s line-rate, therefore the TWC realization is based on QDSOA technology which can support high bit rates without patterning effects as opposed to bulk and quantum well SOAs. Fig.4a illustrates the TWC configuration which is based on the cross-gain modulation (XGM) effect of the saturated gain of each amplifier. Two cascaded QDSOA devices have been used to enable wavelength conversion between the modulating pump and continuous wave (cw) probe signals, respectively. Both pump and probe signals are assumed to co-propagate the device. The available wavelengths must fit within the single-dot bandwidth for efficient XGM to occur. Typical values of the homogeneous broadening of QDSOAs at room temperature are 10meV-20meV [5] . In the present study this value corresponds to 16meV, that is 31nm at the 1.55um window. At 160Gb/s the detuning frequency between adjacent wavelengths has been assumed 640GHz to prevent spectral overlap and it is equivalent to 5.1nm for operation at 1.55um. A set of four available wavelengths distributed evenly with respect to the center of the homogeneous broadening has been chosen. The latter coincides with the center of the inhomogeneously broadened gain profile of the QDSOAs located at λ c = 1.55um. It should be noted that, λ c is used as a dummy wavelength for intermediate conversion between the two devices. The available wavelengths are related to λ c as follows: {λ 1 = λ c -2Δλ, λ 2 = λ c -Δλ, λ 3 = λ c +Δλ, λ 4 = λ c +2Δλ}, where Δλ=5.1nm corresponds to the detuning between adjacent wavelengths including λ c .
Ppump2, λ c Output,
Ppump2, λ c Output, The input signal represented by P pump1 carries the binary data at one of the available wavelengths and is used as a pump signal to modulate the carrier density and subsequently the gain of the first QDSOA (QDSOA1). A (cw) probe signal represented by P probe1 and located at λ c , enters the same device experiencing the gain modulation. Effectively, the binary information carried by P pum 1 is cop e (cw) signal with inversed polarity. The output of QDSOA1 constitu s the pu ignal that modulates the gain of QDSOA2 and it represented by P pump2 . A second (cw) signal enters the second QDSOA to achieve wavelength conversion from λ c to one of the available wavelengths. The output of the second QDSOA constitutes the output of the TWC. It is noteworthy that, tunable filters are placed at the output of each QDSOA in or pump the ons the s giving rise to 8 dB losses ion of the approach used to model bulk/QW devices based on four main p ied to th te mp s der to cut off the unnecessary input signals. For specific simulati device has been assumed as a simple passive element of 2dB loss. Additional losses of 6 dB have been included for the MUX/DEMUX element in total.
A saturable absorber is used at the input of each TWC to scale down the extinction ratio degradation along the cascaded subsystems in the buffer architecture. The specific element is also based on QD technology and therefore exhibits very fast response time (~1.5psec) [6] which enables support of signal processing applications at line-rates of 160Gb/s. A simple static transfer function is needed to represent the reshaping characteristic of the device [7] . The latter is introduced through the loss parameter α(t,P) that tracks the signal envelope according to α(t,P)=α 0 /(1+P(t)/P sat ), where P sat is the saturation power α 0 and is the steady state loss being -0.1dB in this case. Extensive simulations have indicated an optimum value of +20dBm for the P sat parameter. The rate equation model presented in [8] has been employed, to evaluate the performance of the QDSOA devices in this configuration setup. This is a generalizat assumptions. Firstly, the quantum dots are spatially isolated exchanging carriers only through the wetting layer. Secondly, the dots were grouped together by their resonant frequency in order to treat the spectral hole in the inhomogeneously broadened gain spectrum. Thirdly, the wavelength separation between the pump and probe signals depends on the width of homogeneous broadening of the single-dot gain. Finally, carrier dynamics within the dots is described by four rate equations corresponding to the energy levels of the wetting layer, the continuum state, the excited state and the ground state.
Results and discussion
This section presents the results of the physical layer dimensioning of the proposed buffer architecture. In depth optimization analysis has been carried out to determine the maximum number of stages that can be cascaded. To evaluate the signal quality along the cascade two different metrics as figure of merit functions have been used, corresponding to the Q-factor and the extinction ratio. The Q-factor ratio is directly related to the actual Bit-Error-Rate of the system, only when the signal degradation follows Gaussian statistics. Although this does not happen in the present case, this metric can still be used to reflect the efficiency of that process, due to the regenerative capabilities of the TWCs.
The operating conditions for the TWC subsystem have been decided after extensive simulations where various parameters have been optimized such as the density of each QDSOA [9] . The main objective is to achieve ultrafast XGM operation and to prevent bit patterning phenomena. For the active length and the current purpose of the present work, the simulations have been performed assuming input current density 36kA/cm 2 to ensure high carrier population at the upper energy states of the quantum-dots, which act as a reservoir of carriers for the lower energy states. Also, the QDSOAs have been assumed 10mm long and the average power of the pump channel has been +27dBm. The corresponding pulse stream consists of 2-psec 1 st order Gaussian pulses, modulated by a 2 7 -1 PRBS bit pattern at 160Gb/s. Also, a limited amount of amplitude distortion corresponding to Q in =10, has been assumed at the input, whilst the extinction ratio is 13dB. A detailed set of the QDSOA simulation parameters used in this study, can be found in [8] .
Important design issue for the subsystem of the TWC is the identification of the optimum power levels for the probe wavelengths at the input of the first and the second QDSOA represented by P probe1 and P probe2 , respectively. The goal of such an optimization process has three aspects. The first is to maintain the extinction ratio at the output at the same level as that of the input. The use of the saturable absorber has a significant impact on this. Secondly, the operation of the TWC should be regenerative, which practically indicates improvement of the Q-factor ratio. Finally, the output signal should maintain a high power level to ensure an efficient XGM performance at the following TWC. 5 (a-c) illustrates the extinction ratio, the relative Q-factor improvement and the peak power of the output pulse-stream as a function of P probe1 and P probe2 , respectively. It is clear that, extinction ratio is severely degraded as P probe2 increases, which has also been demonstrated for SOAs in XGM operation [10] . On the other hand, high power level of P probe1 is required to achieve sufficient pump power at the input of the second QDSOA and satisfactory extinction ratio at the output of the TWC. The regenerative efficiency of the subsystem is related to the Q-factor which shows opposite behaviour to the extinction ratio. In particular, increased level of P probe2 will accelerate the recovery of the saturated gain of QDSOA2, which in turn resu has alread be nt because it lts in reduction of patterning effects of the output pulse stream. It en mentioned that the output power level of the TWC is very importa y feeds the next TWC. Fig. 5c illustrates that, the peak power of +30dBm is maintained around a region where the extinction ratio has decreased to around 8 dB whereas, the Q-factor has increased about 6 dB with respect to the input value (Q in ≈ 10). It is noteworthy that, the operating conditions change when the TWCs erate in cascade. The worst cas hich conv on from λ 1 to λ 4 and λ 4 to λ 1 occurs successively at each stage. Fig. 6 illustrates the extinction ratio and the Q-factor as a function of the number of cascaded TWCs as well as the eye diagrams after the first four stages of the buffer architecture. The operating conditions have been identified {-15dBm, 0dBm} for P probe1 and P probe2 , respectively. The performance of the subsystem in terms of extinction ratio has improved along the cascade owing to the use of the saturable absorber which suppresses the amplitude steady factor at the output of each converter. Th s of the output pulse streams ill wa op ersi e scenario has been considered here, in w variation at the spaces. Furthermore, the extinction ratio increases towards a level of 20dB, 7dB above the extinction ratio at the input of the first converter (13dB).
However, the performance of the cascade is limited by the degradation of the Qe eye diagram ustrate that the pulse width increases along the cascade introducing a duty cycle distortion. Moreover, overshooting is experienced by the leading edge of the pulse which is related to the fact that the leading edge saturates the amplifier so that the trailing edge experiences less gain than the leading edge. This effect becomes more severe along the cascade and limits the performance of the buffer to 3 stages. In addition, the output peak power along the cascade is reduced and the Q-factor decreases. In particular, at the output of the third stage, that is, at the output of the sixth TWC, the Q-factor is equal to its input value indicating error-free operation. It is clear that, the information at the output of the fourth stage is distorted. The number of packets that the proposed buffer architecture can support by use of four available velengths and s = 3 stages is 9, as derived from equation (1) .
In this communication we have demonstrated the applicability of tunable wavelength converters (TWC) based on QDSOA technology to the implementation of an ultra-high speed optical packet switching buffer. The buffer architecture consists of cascaded stages of two TWC in series and makes use of all available wavelengths located within the homogeneous broadening of the single-dot gain of the QDSOA device. Effectively, the number of cascaded TWC needed to serve a certain number of input packets is significantly decreased, reducing the cost in terms of hardware implementation of the buffer architecture.
The cascadability results of the TWC configuration have been studied in terms of extinction ratio and Q-factor to ensure good quality of the output signal. The exti o utput of the TWC has been significantly improved by use of the saturable absorber. Finally, error-free operation has been shown up to 3 stages of the ncti n ratio at the o buffer architecture and thus, support of 9 input packets.
