Abstract: The authors demonstrate an optical buffer architecture which is implemented using quantum dot semiconductor optical amplifiers (QD-SOAs) in order to achieve wavelength conversion with regenerative capabilities, for all optical packet switched networks. The architecture consists of cascaded programmable delay stages that minimise the number of wavelength converters required to implement the buffer. Physical layer simulations have been performed in order to reveal the potential of this scheme as well as the operating and device parameters of QD-SOA-based wavelength converters. The results obtained have indicated that, up to three time-slot interchanger (TSI) cascaded stages show good performance at 160 Gb/s in the 1550 nm communication window.
Introduction
Quantum dot semiconductor optical amplifiers (QD-SOAs) are envisaged as the next-generation optical signal processing devices in ultra-high-speed networks. This originates from their unique features such as high differential gain and subpicosecond gain recovery [1] , which make them superior to conventional bulk or quantum well devices. Moreover, their enhanced physical and optical properties make QD-SOAs ideal candidates for signal processing applications at line rates that reach 160 Gb/s.
In the current communication, we demonstrate the applicability of QD-SOAs in an optical buffer architecture that can support ultra-high-speed optical packet switching. Packet buffering is implemented in a multi-stage timeslot-interchanger (TSI) that consists of QD-SOA-based wavelength converters exploiting the cross-gain modulation (XGM) effect and feed-forward delay lines. The TSI stages are designed to fully utilise the number of available wavelengths that is restricted by the spectral overlap of data packet streams in multi-hundred Gb/s line rates and the width of the homogeneous spectral region lying under the inhomogeneous gain curve of the QD-SOA. The number of wavelengths used is critical when considering practical aspects of the buffer such as, the hardware cost of the buffer architecture and the signal quality degradation after successive wavelength conversions because of XGM in the semiconductor devices. The proposed buffer architecture is accompanied by physical layer simulations that derive the achievable buffer size in terms of cascaded TSI stages and illustrate its efficient performance in terms of output Q-factor and extinction ratio. Simulations reveal that up to three TSI stages may be cascaded and also illustrate regenerative performance of the architecture at 160 Gb/s.
Buffer architecture and control

Buffer architecture
The proposed system architecture is presented in Fig. 1 . It comprises cascaded programmable delay stages, each consisting of two Tuneable Wavelength Converters (TWCs) and two delay line banks. Each TWC provides w separate wavelengths at its output, and each wavelength is routed to the respective branch of the delay line bank by means of a wavelength de-multiplexer. Adjacent TWCs and stages are connected by wavelength multiplexers.
The delays that are introduced at each TSI stage are a design parameter of the proposed architecture and they are evaluated from the timeslot transition graph (TTG) of the TSI architecture [2, 3] . The TTG consists of nodes located at columns and rows; columns i and i + 1 correspond to the input and output, respectively, of TSI stage i, and rows correspond to timeslots, each being occupied by a single optical packet. A packet arriving at the input of stage i and accessing one of the delay lines is placed in an output timeslot that appears later in time, and this action is represented as a straight line (time transition) connecting an input and an output node on the TTG.
Buffering a packet in the proposed design corresponds to a path on the TTG. The origin node represents the input slot on which the packet arrives at and the destination node represents the output-slot the packet leaves from. Taking into consideration that more than one packets arrive at the buffer inputs within a timeframe, it is evident that during each timeframe, an interconnection pattern that maps input to output nodes is formed on the TTG. The aim is to engineer the time transitions, or equivalently the delay times D(i, j) at each stage, so that the interconnection pattern forms a log n-Benes graph as a sub-graph, where n corresponds to the number of packets that can be fully interchanged in time within a buffer stage. The log n-Benes graph is derived from the Clos network in a fashion similar to the derivation of the log 2-Benes graph [4] . The derivation simply requires replacing the 2 × 2 switches with n × n switches, resulting in a log n-Benes graph that maintains the re-arrangeably non-blocking property of the initial log 2-Benes interconnection topology.
The purpose of constructing the log n-Benes space-time graph is many-fold: the implementation requires a minimum number of serially connected stages that equals to s = 2 log n N − 1
for buffering maximum N number of packets. Equation (1) shows that by implementing the log n-Benes space-time graph, one can achieve a drastic reduction in the number of stages. This is of particular importance when considering the hardware cost of the implementation. Moreover, physical layer impairments aggravate the optical signal quality as the number of cascaded stages increases. Additional attributes of the Benes space -time graph is that, it is re-arrangeably non-blocking resulting in the capability of the proposed design to store packets without suffering internal collisions. Finally, finding collision-free (disjoint) paths within the Benes graph is a well studied problem [5] . The building blocks of the log n-Benes graph are n × n switches, and thus the first step for constructing it, is to determine the switch size. In a previous communication, we had proposed the deployment of a single TWC per stage [6] for constructing the switches. This resulted in poor wavelength utilisation, which amounted to approximately 50% of the available wavelengths, for the switch formation. The number of available wavelengths in QD-SOAs is severely limited by the relation of the single-dot bandwidth and the detuning between adjacent wavelengths thus, total wavelength utilisation is of key importance. In the current work, we propose to double the number of TWCs that are required per stage, with objective to achieve almost 100% wavelength utilisation when forming the n × n switches.
The switches are formed out of time transitions on the TTG, as shown in Fig. 2a , which corresponds to the first stage (stage 0) of the buffer. A packet arriving at the input of stage 0 during the first timeslot may only access timeslots {1, . . ., w} at the output of the first TWC, since time transitions to previous timeslots are not allowed. In a similar fashion, the timeslots that are accessible by the aforementioned packet at the output of the second TWC are limited to {1, . . ., 2w 2 1}. If n successive input packets are considered, then the timeslots at the output of the first and second TWC that are accessible by all input packets are {n, . . ., w} and {n, . . ., 2w 2 1}, respectively.
The switch formation requires that the output timeslots that are accessible by all input packets exceed the number of packets themselves, so that the n × n switches may be always formed on the TTG. This is equivalent to
Moreover, the interconnection network that corresponds to the n × n switches must be non-blocking, so that packets do not arrive simultaneously at a TWC and are therefore not lost. This is satisfied by ensuring that there are at least two disjoint paths between all input and output timeslot nodes inside the switch. To ensure this, we take into account that the mid-nodes (nodes at the output of the first TWC) that are accessible to all input nodes are {n, . . . , w}.
Provided that there are at least two fully accessible midnodes, shown in white colour in Fig. 2a , there are always at least two disjoint interconnection paths towards the midnodes of the switch, and as a result
Additionally, the existence of the two disjoint paths between the mid-nodes and the output nodes is assured when the output nodes are limited to {w, . . ., 2(w 2 1)}. The switch is therefore formed (i) after selecting n 2 2 mid-nodes that are symmetrically located above and below the two fully accessible mid-nodes when n is even, or (ii) after selecting (n 2 1/2) and (n 2 3/2) mid-nodes above and bellow the fully accessible mid-nodes, respectively, when n is odd. Equation (3) shows that almost full utilisation of the available wavelengths has been achieved with the proposed buffer architecture. The next step for constructing the log n-Benes graph is to determine the time transitions that form the graph's switches in the respective stages. The process is shown in Figs. 2b -d for the first and second stages of the buffer. The formation of the log n-Benes graph crossbars requires that at each stage i, time transitions connect timeslots that are located n i positions apart. This corresponds to setting the time delays, in timeslots, equal to
The delays account for all time transitions on the TTG, even though not all transitions contribute to the formation of the virtual switches. The inactive transitions introduce a constant delay after which the output timeframe commences (white squares in Fig. 2b ). At the output of each buffer stage, the delay equals
timeslots and as a result the delay that the packets experience when traversing the buffer is
Equation (5) may be viewed as a constant buffer access time.
Buffer control
Following the discussion of the buffer architecture, packets are buffered after being converted to the appropriate internal wavelengths and accessing the respective delay lines at each programmable delay stage. As a result, buffering requires that the state of wavelength converters be set prior to sending the packets to the buffer. From a TTG perspective, setting the internal wavelengths is equivalent to calculating the state of the switches in all intermediate stages of the log n-Benes graph in Fig. 2d so that the input packet sequence is routed to the respective output sequence.
To perform routing in a log n-Benes graph, we have proposed a modified parallel routing algorithm [5] that extends the parallel routing algorithm on a binary Benes graph [4] . The algorithm involved setting the state of the outermost switches (at stages 0 and s 2 1) of the Benes graph given the respective packet sequences. The outermost switches are then omitted, and the remaining network is partitioned into multiple Benes graphs of reduced size. The algorithm is recursively applied on the resulting graphs until the state of all switches is set.
Having determined the state of the switches, it remains to calculate the interconnection pattern inside the switches. We consider that each mid-node k of the switch is described by a set S k that contains the input and output nodes it is connected to. Owing to the symmetry of the switch, midnodes always connect to the same group of input and output nodes. Supposing that input node i must connect to output node p(i), there are at least two mid-nodes that allow for this connection, since there are at least two disjoint paths between input and output nodes. The midnodes k that enable this connection satisfy
An algorithm for selecting the mid-nodes so that all input-tooutput connections are performed over disjoint paths (without collisions) is illustrated in Fig. 3 . The algorithm involves the following steps: † For a given node pair (i, p(i)) find all available sets S k that satisfy (7) . † Select the set S k with the smallest number of elements for the node pair. The proposed algorithm calculates hops (i, k) and (k, p(i)) for all connections (i, p(i)) inside the switch. The calculated hops correspond to the wavelengths that are provided by the two TWCs of each respective stage. Following the discussion of the buffer architecture on the switch formation, hop (i, k) corresponds to wavelength
In a similar way, hop (i, p(i)) corresponds to wavelength l (n+3)/2+p(i)−k , n is odd l (n+2)/2+p(i)−k , n is even (9) 3 Performance evaluation of wavelength converters
QD-SOA-based TWC setup
The main building blocks of the TWC setup are two identical QD-SOAs in serial configuration, as illustrated in Fig. 4a . Wavelength conversion functionality is based on the XGM effect between a strong data signal (pump) and a weak continuous wave (cw) signal (probe) at close wavelength proximity co-propagating every QD-SOA device. The tuneable bandpass filters are tuned at the wavelength of the converted output signal hence removing the spectral content of the input pump that appears amplified at every QD-SOA output. A saturable absorber (SA) is used between consecutive TWCs in order to compensate for the extinction ratio degradation stemming from the XGM effect. The delay bank between adjacent TWCs is part of the buffer architecture, as described in Section 2.1.
The available wavelengths (l i ) fit within the homogeneous bandwidth of the QD-SOA device lying under its inhomogeneously broadened gain spectrum centred at l c , as illustrated schematically in Fig. 4b . Wavelength conversion is feasible for wavelengths that lie under the same homogeneous spectrum region and it is facilitated in two steps: (i) from l i (i ¼ 1, 2, 3, 4) to l c in QD-SOA 1 and (ii) from l c to l i ′ (i ′ ¼ 1, 2, 3, 4) in QD-SOA 2, hence allowing for the possibility to maintain the same wavelength at the TWC output.
The performance evaluation of the TWCs is based on numerical simulations taking into consideration physical layer optimisation parameters such as the power levels of the data and the cw signals and the characteristic device parameters such as the length, homogeneous bandwidth and wavelength spacing. The necessary conditions for wavelength conversion based on the proposed scheme for the realisation of the TWCs of the buffer architecture are the following: . the available conversion wavelengths should be constrained within the same homogeneous dot bandwidth and evenly distributed on either side of its peak, 2. the centre of the homogeneous bandwidth should coincide with the centre of the inhomogeneous broadened gain of the QD-SOA (l c ) to ensure maximum gain and 3. the spectral content of the converted signal should be smaller than the spectral separation between adjacent wavelengths to eliminate crosstalk.
The input data signal (P pump1 ) comprises of 2 ps Gaussian pulses modulated by a 2 7 2 1 PRBS data pattern at 160 Gb/s. A cw signal at l c is fed into QD-SOA 1 together with P pump1 enabling wavelength conversion from l i to l c . In turn, the converted signal at l c with inversed polarity with respect to the input data signal plays the role of the pump in QD-SOA 2, co-propagating the device with a new cw signal (P probe2 ) thus enabling wavelength conversion from l c to l i ′ . Finally, the output is fed into the next TWC after passing through the SA.
The response of the QD-SOAs has been numerically simulated based on a rate equation model that describes the instantaneous carrier density along the propagation direction [7] . The SA is modelled based on a static transfer function representing its sigmoid amplitude reshaping characteristics assuming ideal instantaneous fall and recovery times and ultra-low steady-state loss of only 20.1 dB [8] . The tuneable filters are considered in the simulation as simple elements that introduce 2 dB loss without having any offset that would affect the quality of the wavelength converted output pulses. Similarly, the delay line is only considered as a loss element of 3 dB. The simulation parameters are given in Table 1 . Typical values of the homogeneous bandwidth at room temperature and the device length range from 10 to 20 meV and from 0.5to 25 mm, respectively [9] . In the present case, the homogeneous bandwidth of the quantum dots is considered 16 meV (equivalent to 31 nm at 1550 nm) and the wavelength spacing is 5.1 nm ensuring that there is no spectral overlap among adjacent wavelengths at 160 Gb/s. Finally, the peak pulse power of the input data signal is 20 dBm and the saturation power of the SA is 10 dBm.
Results and discussion
This section presents the results of the physical layer optimisation of the proposed buffer architecture aiming to define the parameters for achieving the maximum number of cascaded wavelength converters. The quality of the converted signal at the output of every TWC is quantified by two figure-of-merit functions: (i) the extinction ratio defined as the logarithmic ratio of the mean power level of the ones over the zeros and (ii) the ratio of the difference between of the mean values of the ones and zeros over the sum of the standard deviation of the zeros and the ones. Fig. 5 illustrates the gain recovery time of the QD-SOA as a function of the device length. It is noteworthy that, the QD-SOA response is significantly faster for longer length which is in agreement with the results demonstrated for their bulk counterparts [10, 11] . In the present simulation study the length of the QD-SOA device has been considered 10 mm for which the gain recovery time is shorter than 1 ps, yielding ultra-fast operation with no patterning at 160 Gb/s.
The operating power levels of the input signals have been extensively investigated for the TWCs cascade, considering successive wavelength conversions from l 1 to l c and from l c to l 4 , corresponding to the worst-case scenario since l c experiences the highest gain whereas l 1 and l 4 experience the lowest gain of all other available wavelengths. Figs. 6a and b illustrate the extinction ratio and the Q-factor of the converted signal at the output of the first TWC as a function of the input power levels of the cw signals P probe1 and P probe2 , respectively.
The extinction ratio and Q-factor of the input data signal to the first TWC are 13 dB and 7, respectively. The extinction ratio degrades for higher probe power levels owing to faster saturation of the QD-SOA and lower available gain under XGM. On the other hand, the Q-factor is improved after XGM at QD-SOA 2 implying acceleration of the carrier dynamics of the device. Hence, the power levels of the probe signals P probe1 and P probe2 must be carefully defined taking into consideration the trade-off between extinction ratio and Q-factor enhancement [12] . Furthermore, it is important that the operating point of the wavelength converters in cascade may differ from the optimum operating point of the single TWC subsystem. Figs. 7a and b illustrate the output extinction ratio and Q-factor as a function of the number of TWCs considering that P probe1 and P probe2 are 212 and 0 dBm, respectively. The performance of the subsystem along the cascade has been evaluated both with and without the use of the SA denoted with the solid and open squares.
The extinction ratio of the wavelength converted signal is greatly enhanced along the cascade of converters owing to the zero-level suppression induced by the SA and it reaches Fig. 5 Gain recovery time as a function of the QD-SOA length a plateau of 19 dB after the seventh TWC. In addition, the Q-factor increases after the first and the second TWC; however, it starts to decrease as the number of cascaded converters becomes larger, falling down to its input value after nine successive wavelength conversions. It is noteworthy that the performance of the cascade of converters would be dramatically degraded in the case that the SA is not considered to be used in-between consecutive TWCs. In particular, the extinction ratio would be relatively degraded more than 10 dB at the seventh TWC output and the Q-factor would decrease further with respect to its input value. Fig. 8 illustrates the eye diagrams of the wavelength converted signal along the cascade of converters. It is evident that, the amplitude of the zero level decreases along the cascade offering extinction ratio enhancement and the amplitude variation at the level of ones is suppressed Fig. 6 Performance evaluation of the wavelength converted signal at the output of the first TWC a Extinction ratio as a function of the input power levels P probe1 and P probe2 b Q-factor as a function of the input power levels P probe1 and P probe2 yielding Q-factor improvement. However, the quality of the wavelength converted signal is degraded because of the appearance of a peak at the leading edge of the pulse that is attributed to self-phase-modulation effects in the semiconductor and becomes even more pronounced after consecutive wavelength conversions [13] . The amplitude of this peak, reaching 1 dB the output of the sixth TWC, limits the number of cascaded stages and consequently, defines the maximum number of input packets served by the proposed buffer architecture. The output extinction ratio and Q-factor after the sixth TWC are 18.5 dB and 10, yielding enhancement of 5.5 and 1.43 dB with respect to their input values, highlighting the regenerative capabilities of the QD-SOA-based buffer. Considering the physical layer performance of the cascade of converters and (1) given in Section 2.1 assuming four available wavelengths and three cascaded stages, the maximum number of input packets supported by the proposed buffer architecture at 160 Gb/s is nine.
Conclusions
In this communication we have demonstrated the applicability of TWCs based on QD-SOA technology towards the implementation of an ultra-high-speed optical packet switching buffer. The buffer architecture comprises cascaded stages of two TWCs and utilises efficiently all available wavelengths lying under the homogeneous bandwidth of the single-dot gain of the QD-SOA. Effectively, the number of cascaded TWC needed to serve a certain number of input packets is significantly decreased, reducing the cost in terms of hardware implementation of the buffer architecture.
Numerical simulations have been carried out in order to define the physical parameters and operating conditions of the QD-SOAs comprising every TWC at 160 Gb/s. Wavelength conversion functionality has been extensively investigated in terms of signal quality of the converted output. The performance of the TWCs cascade exhibits regenerative characteristics mainly owing to the use of the SA that enables extinction ratio enhancement up to 6 dB with respect to the input value. Moreover, the number of cascaded converters has been defined considering the pulse shape characteristics of the output signal that exhibits a peak at the leading edge hence degrading its quality. Finally, the maximum number of input packets supported by the proposed buffer architecture at 160 Gb/s is nine, taking into account both the physical properties of the QD-SOAs and the physical layer performance of the wavelength conversion subsystem.
Acknowledgments
The work was supported by the Operational Program for Educational and Vocational Training (EPEAEK), PYTHAGORAS II Program and by EU via the IST/NoE e-Photon/ONe+ project, COST 291 and the project TRIUMPH (IST-027638 STP).
6 References
