Abstract: We develop a design method for creating a large-scale optical switch consisting of two sub-switch parts, i.e., delivery-and-coupling switches and wavelength-routing switches based on cyclic arrayed-waveguide gratings, where the available optical-power budget is optimally allocated to the two sub-switches to maximize the entire switch scale. The power budget necessary for each sub-switch is quantitatively evaluated via extensive computer simulations. The simulation results show that a 1000 × 1000 switch scale can be realized when the available optical-power budget is 23 dB. The simulation results are verified via proof-of-concept experiments.
Introduction
With the recent advances in big-data related business and cloud-computing services, intradatacenter traffic is growing at a rate of 22% a year [1] . The current intra-datacenter network is facing the problem of the power consumption increase driven by the ever-increasing traffic. The intra-datacenter traffic can be divided into two types; mice flows stem from e-mail and web searches, while elephant flows originate from virtual-machine migration and server and storage backup. To process these quite different flows cost-effectively, optical and electrical hybrid switching networks have been proposed, in which the mice flows are processed by electrical packet switches while the elephant flows are handled by optical circuit switches [2] - [4] . This approach can substantially reduce the power consumption of datacenter networking. One of the key devices is a large-port-count optical circuit switch that meets intra-datacenter network requirements.
Among various optical-switch technologies, the micro-electro-mechanical system (MEMS) is one of the candidates for intra-datacenter interconnection. However, its switching speed is around milliseconds [5] , which is excessive for intra-datacenter applications [6] . Furthermore, the number of optical paths to be calibrated when fabricating a MEMS switch increases with the square of the number of input/output ports (for example, one million MEMS mirror manipulations for a 1000 × 1000 switch), which is seen as a barrier to creating the large-port-number optical switches needed in the future warehouse-scale datacenters. Another technology is to use semiconductor optical amplifiers (SOAs). Although they can realize compact and lossless switching, power consumption can be high because of the large number of SOAs required [7] .
A combination of delivery-and-coupling (DC) switches and wavelength-routing (WR) switches utilizing cyclic arrayed-waveguide gratings (AWGs), can create large-scale (i.e., high-port-count) optical switches [8] . Indeed, a reliable and energy-efficient 270 × 270 non-blocking optical switch prototype has already been reported [9] . Here, "P × Q" non-blocking optical switch means that P input ports and Q output ports are connected without any blocking. The architecture does not include any mechanically active parts, and it can be compactly implemented with planar-lightwavecircuit technologies or silicon-photonics technologies. To create a large-scale optical switch on this configuration, we need to enlarge the DC-switch scale and/or the WR-switch scale because the total port count is given by the product of these sub-switch scales. However, enlargement of the DC-switch scale inevitably increases optical loss, whereas that of the cyclic AWGs induces passband-frequency deviation and crostalk which results in excess power loss and received power penalty. The allowable power loss and penalty are determined by the available optical power budget defined as the difference between the transmitter output power and the required receiver input power. We must, therefore, develop a design method that can maximize the whole switch scale, where the limited optical power budget should be optimally allocated to the DC-switch part and to the WR-switch part. To achieve this, thorough analyses of the power loss and penalty generated in each sub-switch part are needed.
The power loss in the DC-switch part simply increases with its scale mainly because of the optical-coupler part. In contrast, the power loss and penalty caused in the WR-switch part depends on its structure. We have already proposed a two-stage cyclic-AWG configuration that can reduce the passband-frequency deviation and crosstalk thanks to the use of cascaded smaller-scale cyclic AWGs. In return, the two-stage configuration suffers from enhanced loss and filter-narrowing effect since the signal traverses two cyclic AWGs. Quantitative analysis has hardly been done on these problems; we need examine the power loss and the power penalty caused in the wavelengthrouting-switch part so as to establish the design criterion. Furthermore, the possibility of increasing WR stages, i.e., three-stage configuration needs to be investigated.
In this paper, we analyze the power loss and penalty occurring in each sub-switch and establish a design that allows the entire switch scale to be maximized under the limited optical-power budget. First, we derive the power loss and penalty due to the passband-frequency deviation in the singlestage, two-stage, and three-stage cyclic-AWG configurations. Second, we derive the maximum sub-switch scales as a function of a given optical-power budget. Finally, based on the above results, we allocate the optical-power budget to the two sub-switch parts so as to maximize overall switch scale. From these results, we confirm that the two-stage configuration can attain the largest switch scale once the optical-power budget is specified. Proof-of-concept experiments measure the bit-error ratio (BER) and validate the simulation results. Based on our design criterion, a 23-dB optical-power budget, which can easily be realized by typical transponders, yields 1000 × 1000 optical switches.
The organization of this paper is as follows: Section 2 details sub-switch architectures, i.e. the DC switches and the WR switches utilizing cyclic AWGs. In Section 3, we overview optical-switch structures based on a combination of DC switches and WR switches, and establish a design criterion that attains the largest optical-switch scale. Section 4 shows results of simulations on the necessary power budget for each sub-switch scale and the attainable maximum switch scale for the given power budget. In Section 5, proof-of-concept experiments confirm the simulation results. Finally, we conclude this paper in Section 6. and coupled with other-wavelength signals. In this way, the DC switch can deliver any input signal to any arbitrary output port. As for the 1 × M optical switch, cascaded Mach-Zehnder interferometers (MZIs) can be utilized. The loss of the DC switch L DC is given by
Sub-switch Architecture

DC Switch
where the first term of the right-hand side denotes the intrinsic loss due to the optical coupler and the second term L EX represents the non-intrinsic excess loss including MZI, waveguide crossing, and fiber splicing losses. The intrinsic loss increases with the switch scale M; moreover, crossing/splicing losses also tend to increase with M, because a 1 × M optical switch is formed by cascading M-1 1 × 2 MZIs. The maximum scale of the DC switch is, thus, basically limited by the insertion loss. . . , and λ N are, respectively, output from port #1, #2, . . . , #N. If the adjacent input port is used, the output port for each wavelength is shifted in a cyclic manner. The interval of periodic frequencies is called the free spectral range (FSR). The signal can thus be transferred from an arbitrary input port to an arbitrary output port by changing its wavelength. This configuration allows simple and reliable operation since the cyclic AWG is a passive device. However, the cyclic AWG intrinsically suffers from the passband-frequency deviation problem, that is, the center of the passband of cyclic AWGs cannot exactly be aligned to a regularinterval frequency grid such as ITU-T grid frequencies. As a result, signal quality is degraded via excess filtering loss, spectral distortion, and inter-channel crosstalk. Such frequency misalignment increases as the cyclic-AWG scale increases as shown in Fig. 3 , where the passband-center frequency deviation is derived from analyses based on geometric optics [10] . The scattering of the maximum passband-frequency deviation originates from restriction where the diffraction order is integer. The passband-frequency deviation prevents us from constructing a large-scale optical switch. Although tracking the frequency deviation by adjusting the tunable-laser frequency can mitigate the problem, this approach is not suitable for cost-sensitive intra-datacenter networks; only cost-effective tunable lasers whose frequencies conform to the standardized ITU-T grid can be applied to datacenters.
WR Switch Based on Cyclic AWGs
To reduce the deviation of cyclic-AWG passband frequencies, we proposed the multistage cyclic-AWG architecture in which multiple smaller-scale cyclic AWGs are interconnected to create a large-scale cyclic AWG [11] , [12] . Fig. 4 (a) and 4(b) illustrate N × N WR switches that employ the two-stage cyclic-AWG configuration and the three-stage configuration, respectively. Note that N i is the scale of the cyclic AWG used in the i-th stage and the total switch scale is given by N i (i = 1, 2, . . . , k) when the number of stages is k; in addition, N 1,...,k must be coprime to each other otherwise multiple wavelengths will be output from the same port. These multistage configurations can suppress the passband-frequency deviation since the scale of each cyclic AWG is much smaller than that of the single stage WR switch. Moreover, crosstalk can be suppressed thanks to the smaller AWG port count at each stage and multiple-filtering process. However, insertion loss increases since the signal passes through multiple-stage cyclic AWGs. The insertion loss of a cyclic AWG is around 3-5 dB, and hence adding stages demands larger power budgets. In addition, passbands of the WR switch become narrower as the number of stages increases. Assuming the passband follows a Gaussian profile, the total bandwidth after k stages B k is given by B k = B 1 /sqrt(k), where B 1 is the bandwidth of a single cyclic AWG. Such a filter-narrowing effect degrades the signal quality in conjunction with the passband-frequency deviation. In this way, the available scale of the WR switch is limited by passband-frequency deviation, insertion loss, and filter passband narrowing. Parameters that determine these impairments are summarized in Table 1 . First, a wavelength is selectively generated from a tunable laser and then input to an M × M DC switch. The output from the DC switch is then delivered to the WR-switch part that contains the target output port. Finally, the WR switch routes the signal according to its wavelength. In this way, we can construct a large-scale optical switch by combining DC switches and WR switches.
Criterion for Switch-scale Maximization
In order to expand the entire switch scale, the DC-switch scale and/or the WR-switch scale must be enlarged. Expanding the DC-switch scale increases power loss mostly due to the optical couplers used. In the WR-switch part, the insertion loss increases due to the multistage cyclic-AWG configuration; moreover, the passband-frequency deviation of the cyclic AWGs simultaneously yields power loss and penalty through non-ideal filtering. The allowable loss and penalty are limited by the total optical-power budget, as shown by where L D C is the insertion loss of the DC switch; L A WG and P A WG denote, respectively, the power loss and penalty induced in the WR-switch part. The total power budget is defined as the difference between the transmitter output power and the required receiver input power. To achieve the largest switch scale for a given optical-power budget, the power loss and penalty of each sub-switch need to be analyzed and optimal allocation of the optical-power budget to the two sub-switch parts is needed.
Numerical Calculations
Numerical evaluations are done to maximize the total switch scale under the limit of a given opticalpower budget. The simulation setup is as follows: The signal format is 10-Gbps intensity modulation, and the signal-carrier frequencies lie on the 50-GHz grid defined by ITU-T; cost-effective tunable lasers comply with this fixed grid. Each cyclic AWG has a Gaussian transfer function and a 3-dB bandwidth of 25 GHz, which is common in 50-GHz grid systems. The insertion loss of an M × M DC switch is given by 10log 10 (M ) + 4 dB including excess loss and that of cyclic AWG is set to 4 dB per stage. Tables 2 and 3 summarize example combinations of cyclic-AWG scales for two-stage and three-stage configurations, respectively. We calculate Q values of the received signal and estimate the power penalty at BER = 10 −9 using the Q values. We calculate FSR based on geometric optics [12] , and then take the difference between FSR and the 50-GHz ITU-T grid frequencies as frequency deviation f. Fig. 6 depicts the optical-power budget required for the WR switch as a function of the passbandfrequency deviation f 1 in the single-stage configuration. The blue, red, and purple curves depict the power loss L A WG , the power penalty P A WG , and their sum L A WG + P A WG , respectively. We observe that the acceptable passband-frequency deviation is strictly limited in the range of | f 1 | < 20 GHz. This is because we cannot remove the inter-channel crosstalk by increasing the transmitter output power. On the other hand, the power budget allocated for power penalty (P A WG ) is 0 dB when f 1 = 0, since inter symbol interference and crosstalk are negligible (red line). However, since the AWG has intrinsic loss of 4 dB even when f 1 = 0 (blue line), the power budget of 4 ( = 0 + 4) dB should be allocated to the AWG part in total (purple line). Fig. 7(a) , smaller | f 1 | and | f 2 | provide lower power loss. On the other hand, the power penalty is minimized when f 1 + f 2 = 0 as shown in Fig. 7(b) . Note that the two-stage configuration also suffers a large penalty due to inter-channel crosstalk when f 1 + f 2 is large. Thus, the necessary optical-power budget for the two-stage configuration depends on the relation between f 1 and f 2 . Fig. 8 shows the power loss L AWG , the power penalty P AWG , and the total power budget L A WG + P A WG when the three-stage configuration is employed. The calculations thoroughly consider the frequency deviation of the first-stage cyclic AWG f 1 , that of the second-stage cyclic AWG f 2 , and that of the third-stage cyclic AWG f 3 ; however, only cases where f 3 = −10 GHz and f 3 = +10 GHz are shown for illustration simplicity. Fig. 8(a)-(c) plot, respectively, contour maps of L A WG , P A WG , and L A WG + P A WG when f 3 = +10 GHz, whereas Figs. 8(d)-(f) show the equivalents when f 3 = −10 GHz. We obtained similar results as in case of the two-stage configuration shown in Fig. 7 ; the insertion loss L A WG becomes smaller when | f 1 |, | f 2 |, and | f 3 | approach zero, and the power penalty P A WG is suppressed when f 1 + f 2 + f 3 is close to zero. Thus, performances of the multistage structures depend on the combinations of passband-frequency deviation; in other words, it is determined by the combination of an input/output port connection.
The bottleneck of the entire switching system is determined by the worst necessary optical power budget, therefore we should minimize the maximum necessary power budget to compare the configurations fairly. Let us consider the two-stage-based optical switch consisting of N 2 N 1 × N 1 cyclic AWGs and N 1 N 2 × N 2 cyclic AWGs and define #F, #R, #Out_F, and #In_R as head AWG number, rear AWG number, output port number of head AWG and rear port number of AWG, respectively. First, we consider function u(#F, #R, #Out_F, #In_R), where u(#F, #R, #Out_F, #In_R) = 1 if there is interconnection fiber between #Out_F of #F former cyclic AWG and #In_R of rear cyclic AWG, else u(#F, #R, #Out_F, #In_R) = 0. Given that the necessary power budget is expressed by Max_PB(#F, #R, #Out_F, #In_R) as a function which calculates maximum necessary power budget when interconnection of #Out_F of #F former cyclic AWG and #In_R of rear cyclic AWG is realized. This can be solved as a linear programming problem as follows.
Minimize Fig. 9 . Maximum sub-switch scale versus allocated power budget. subject to
This approach can be easily applied to the three-stage architecture, and we can obtain the minimum of the maximum necessary power budget of the overall system. The following results are obtained after such optimization. Fig. 9 plots sub-switch scale versus necessary optical-power budget, which is comprehensively calculated from the results shown in Figs. 6-8 . The red, green, and blue curves denote WR-switch scales in the single-stage, two-stage, and three-stage configurations, respectively; in addition, the black curve indicates DC-switch scale. When the available optical-power budget is small, the use of the single-stage configuration is the best solution thanks to its relatively small intrinsic loss. On the other hand, the multistage configurations can achieve larger switch scale when larger optical-power budgets are available, since the frequency deviation can be suppressed. We can also observe that the cyclic-AWG scale greatly increases with even a small budget increase, but it then saturates; this is due to the inter-channel crosstalk induced by the frequency deviation as shown in Figs. 6-8. In contrast, saturation does not occur in DC-switch expansion though its initial increases are sluggish. With these results, we can appropriately allocate the optical-power budget to each sub-switch part. Fig. 10 depicts the sub-switch scales when the power budget is optimally allocated, where the red, green, and blue curves corresponds to the single-stage, two-stage, and three-stage configurations; broken and solid curves denote DC-switch scales and WR-switch scales, respectively. We find that the DC switch cannot be used because of its relatively large loss when the available optical-power budget is very small. On the other hand, when the optical-power budget is large, the WR-switch scale should be limited first while the DC-switch scale should be enlarged for further total switch scale expansion. This is because the maximum cyclic-AWG size is strictly limited by inter-channel crosstalk as presented in Figs. 6-8, whereas that of the DC switch can be expanded by simply increasing the allocated optical-power budget. Introducing the DC switch consumes additional optical power, hence there is discontinuous change in WR-switch scale in each configuration. Fig. 11 shows the attainable total switch scale when we employ the optimized sub-switch scales shown in Fig. 10 . If the available optical-power budget is very small, the single-stage structure offers larger switch scale thanks to its lowest intrinsic loss and less filter narrowing; however, it is not suitable for creating large-scale optical switches due to the large passband-frequency deviation. On the other hand, the three-stage structure necessitates large power budget due to its relatively large intrinsic loss (this occurs even though its passband-frequency deviation is the smallest). In contrast, the two-stage structure can achieve the largest switch scale if a large opticalpower budget is available because it finely balances the increase in the intrinsic loss against the suppression of frequency deviation. To construct an optical-switch with scale of 1000 × 1000 the two-stage configuration requires around 23-dB optical-power budget in total, which is easily attained with typical tunable lasers and photodetectors. In contrast with the same optical-power budget, the single-stage and three-stage configurations offer only 800 × 800 and 600 × 600 switch scale, respectively.
Experiments
To validate our developed design method, we conducted proof-of-concept experiments. The experimental setup is shown in Fig. 12 . The wavelength of 1532.681 nm and its adjacent wavelengths with 50-GHz spacing were tested since inter-channel crosstalk mostly stems from the adjacent channels. Then, 10-Gbps test signals were formed with an intensity modulator driven by a pulsepattern generator (PPG). The target wavelength and the adjacent wavelengths were decorrelated with an interleaver, a fiber delay line of 10 ns, and an optical coupler. After the optical power of each wavelength was adjusted with a combination of erbium-doped fiber amplifiers (EDFAs) and variable optical attenuators (VOAs), the signals were injected into input port #1 of a 9 × 9 cyclic AWG. The target wavelength that appeared on output port #2 was evaluated with a BER tester. The passband-frequency deviation of the cyclic AWG was controlled in the range of 0-25 GHz with a thermal controller. Note that this proof-of-concept experiment did not evaluate characteristics of DC switches; this is because the extinction ratio of DC switches can be over 40 dB [13] , and hence, crosstalk at the DC-switch part is negligible. In other words, the DC switch can be regarded just as a simple loss element. Therefore, the impact of the DC-switch loss was included in the variable optical attenuator. It should also be noted that using thermal control to alter passband frequencies cannot resolve the frequency-deviation problem in real systems, because all passbands are shifted simultaneously.
In Fig. 13 , blue and purple curves depict, respectively, power loss and the total power penalty at BER = 10 −9 , which were measured as a function of the passband-frequency deviation; the red curve denotes the power penalty due to crosstalk and spectral distortion, calculated as the difference between the purple curve and the blue curve. We observe that these measured results agree well with the simulation results shown in Fig. 6 . Thus, the experiments validate our simulations of the single-stage configuration. Note that simulation results on the two-stage and three-stage configurations are also valid because the multistage configurations can be expressed as the superposition of multiple single-stage cyclic AWGs.
Conclusion
In this paper, we have established a design method for creating large-scale optical switches using DC switches and WR switches consisting of cyclic AWGs. The method enables us to optimally allocate the optical-power budget to the two key sub-switch components and, thus, maximize the total switch scale for the given available power budget. The results demonstrated the effectiveness of the two-stage cyclic-AWG configuration under the optimized power-budget allocation; it enables us to create practical switches with scale of 1000 × 1000. The validity of the simulation results was confirmed via proof-of-concept experiments.
