In wavelength division multiplexing (WDM) schemes, splitters must be used to combine and separate different wavelengths. Conventional splitters are fairly large with footprints in hundreds to thousands of square microns, and experimentally-demonstrated MMI-based and inverse-designed ultra-compact splitters operate with only two channels and large channel spacing (>100 nm). Here we inverse design and experimentally demonstrate a three-channel wavelength demultiplexer with 40 nm spacing (1500 nm, 1540 nm, and 1580 nm) with a footprint of 24.75 µm 2 . The splitter has a simulated peak insertion loss of -1.55 dB with under -15 dB crosstalk and a measured peak insertion loss of -2.29 dB with under -10.7 dB crosstalk.
1
1 Ginzton Laboratory, Stanford University, Stanford, California 94305, USA
In wavelength division multiplexing (WDM) schemes, splitters must be used to combine and separate different wavelengths. Conventional splitters are fairly large with footprints in hundreds to thousands of square microns, and experimentally-demonstrated MMI-based and inverse-designed ultra-compact splitters operate with only two channels and large channel spacing (>100 nm). Here we inverse design and experimentally demonstrate a three-channel wavelength demultiplexer with 40 nm spacing (1500 nm, 1540 nm, and 1580 nm) with a footprint of 24.75 µm 2 . The splitter has a simulated peak insertion loss of -1.55 dB with under -15 dB crosstalk and a measured peak insertion loss of -2.29 dB with under -10.7 dB crosstalk.
I. INTRODUCTION
Integrated silicon photonics can play key roles in many applications, including optical interconnects [1] and quantum technologies [2] . One of the advantages to using photonics is utilizing different wavelengths of light to carry information in order to dramatically increase the information bandwidth in a fiber or waveguide. In such wavelength division multiplexing (WDM) systems, wavelength demultiplexers are used to separate the different channels. Conventional demultiplexers, such as ring resonator arrays and arrayed waveguide gratings, have fairly large footprints [3, 4] . Nanophotonic inverse design has enabled the design of more compact devices [5] [6] [7] [8] [9] , but previous experimental demonstrations of inverse-designed wavelength demultiplexers have only achieved broadband demultiplexing of two channels with large channel spacings (>100 nm) [10, 11] . Frandsen et al. experimentally showed a drop-filter with 11 nm full-width-half-maximum (FWHM) [12] , but the device only filters out one wavelength over a larger footprint than we demonstrate here. Recent experimental demonstrations of multimode interference (MMI) devices have also achieved demultiplexing capabilities in ultracompact footprints but again with large channel spacings over two channels [13, 14] . Since the number of wavelengths available in a WDM system is inversely proportional to the channel spacing, WDM systems that utilize multiple wavelengths require demultiplexers with more channels and much smaller channel spacing. Here, using our nanophotonic inverse design approach [5, 10, 15] , we design and experimentally demonstrate a three-channel wavelength demultiplexer with 40 nm channel spacing and a 5.5 µm × 4.5 µm footprint for the silicon-oninsulator (SOI) platform.
II. DESIGN METHOD
The setup of the problem follows closely with our work described in the references [10] and [15] . For clarity, we provide a brief overview and highlight the salient points of the optimization in Section II A. Further details can be found in the supplementary information of [15] . In Section II B, we discuss modifications to the optimization process used to design the narrow-band demultiplexer. In Section II C, we describe the complete design algorithm with the specific parameters used in the optimization and design.
A. Setup
The design of a 3-channel wavelength demultiplexer is inherently a multiobjective problem: At each operating wavelength, it is desirable to maximize the transmission while at the same time minimizing the crosstalk. The relative importance of these goals are expressed through an objective function F . The exact form of F is detailed in the supplementary information of [15] .
Applying the fabrication-constrained optimization procedure outlined in [15] requires starting with a good initial condition as the optimization landscape is highly non-convex with many undesirable local optima. Consequently the optimization procedure is split into two stages. In the first stage, deemed continuous optimization, the discrete constraint is relaxed to allow the permittivities to vary continuously between that of silicon oxide and silicon. This optimization stage provides a structure that seeds the second stage, fabricationconstrained discrete optimization, which, as its name implies, produces a fabricable, discrete structure.
In the continuous stage, the structure is parametrized by a 2D image where each pixel of the image corresponds to the permittivity of the device at the corresponding location. A local optimum of the objective can be found by applying gradient descent. Naively calculating the gradient is an expensive operation since it involves computing the variation of the objective with every design component. However, using the adjoint method [16] , the gradient calculation is related to the result of a time-reversed electromagnetic simulation. Consequently, the gradient can be computed by running a single simulation, which is substantially more efficient than the naive approach.
In the discrete stage, the device is parametrized by a spatially continuous level set function, where the permitarXiv:1709.08809v1 [physics.app-ph] 17 Aug 2017 tivity of the device at a particular location depends on the sign of the level set function. The level set representation can characterize arbitrary boundaries and naturally handles merging and splitting of holes. A gradient-descentlike update can be performed on the level set and can be extended to impose fabrication constraints [15] .
B. Biasing
In its simplest form, the continuous optimization stage does not always generate a good initial condition for the discrete stage. Empirically, we found that the outcome of the continuous optimization stage provides a good initial condition for the fabrication-constrained discrete optimization stage if the output structure of the continuous optimization is nearly discrete, i.e. each pixel has a permittivity close to that of the device or that of the cladding. Unfortunately, applying gradient descent directly can lead to structures that have weakly-modulated permittivities that are somewhere in between, and this results in poorly performing structures in the discrete stage. It is possible to optimize in the continuous stage for more steps in the hopes of obtaining a more discrete structure, but in practice, this not only requires a significant overhead in computational resources but also offers no guarantee that a discrete structure will eventually form.
There are a wide variety of proposals to mitigate this issue of "gray" areas in the structure, including density filters [17] , sensitivity filters [18] , penalty functions [19] , artificial damping [20] , and morphological filters [21] . In this work, we mitigate this issue through a specific variant of penalty functions, which we call biasing. Specifically, we introduce the concept of self-biasing to produce more discrete pixels and the concept of neighbor biasing to produce discrete structures with larger feature sizes.
As noted previously, in the continuous stage, the structure is parametrized by a 2D image. More specifically, the structure can be described by a vector z ∈ [0, 1] n where n is the total number of pixels in the image. The permittivity at the ith pixel is given by i = ( hi − lo )z i + lo where hi is the permittivity of the device and lo is that of the cladding. These are silicon and silicon oxide, respectively, for the wavelength demultiplexer.
The gradient descent update can be described by the operation
where α is the gradient descent step size and F is the objective function. After the gradient update we perform a self-biasing update to the parametrization:
where the clipping function, applied element-wise to the vector, is defined as
and the self-biasing function b s is defined as
and 0 ≤ k < 1 2 is a parameter to the biasing function. The goal of the biasing function is shift the pixel values towards either 0 or 1, corresponding to lo and hi , respec-
The value of k determines the strength of this biasing: a large k corresponds to a dramatic change in z i whereas a small k corresponds to a gentle push in z i . The name self-biasing comes from the fact that each pixel is biased towards zero or one depending on its current value.
By combining the gradient descent update and the selfbiasing update into one update step, it is not hard to show that the self-biasing update is equivalent to adding a quadratic penalty function g to the objective of the form g(z) = cz T (1 − z) for some constant c. However, there are several advantages to expressing self-biasing as a separate update step. First, the discretization goal often opposes progress towards a well-performing device. A line search is often used in gradient-based methods in order to speed up (and ensure) convergence. Under a line search, the objective function is forced to decrease in value each iteration, and empirically, incorporating selfbiasing update into the objective function results in premature convergence to poor solutions. Second, expressing self-biasing as a separate update readily generalizes to more sophisticated types of biasing, as we will see shortly.
No biasing
Self-biasing Neighbor biasing
FIG. 1:
Optimized continuous structures with no biasing, selfbiasing, and neighbor biasing after 100 iterations. Black represents silicon (1) and white represents silica (0). With no biasing, the structure is not discrete at all. With self-biasing, the structure becomes much more discrete but at the expense of producing many small features. With neighbor biasing, the structure becomes discrete and avoids smaller features.
In Figure 1 , we see the results with and without selfbiasing applied. As one can see, self-biasing often produces nearly-discrete structures that have very small features. The reason is that self-biasing is a self-reinforcing action: A pixel that was biased towards zero/one in one iteration will likely be biased towards zero/one in the next iteration. In order to address this issue, we introduce the notion of neighbor biasing, in which the pixel values are biased based not only on their current values but the values of their neighbors as well. Mathematically, we choose to use the neighbor biasing function b n in place of b s where
and h is a function whose ith component arises from averaging the pixel values in a circle of radius r around the ith pixel. The averaging radius r controls the size of the holes in the structure.
To compute h(z), we first treat z as a 2D grayscale image. We convolve this image with a uniform circular disk of radius r, and denote the resulting image as z avg . We then define h(z) as . Moreover, note that in this biasing scheme, it is easier mathematically and more intuitive to work with the update step directly rather than expressing the result as a penalty function. Figure 1 compares the results of using no-biasing, selfbiasing, and neighbor biasing. Under no biasing, there a large regions of intermediate permittivity. With selfbiasing, these intermediate regions largely disappear but at the expense of small features, but with neighbor biasing, the structure is both discrete and mostly free of small features.
C. Design of 3-channel Wavelength Demultiplexer
The 3-channel wavelength demultiplexer is designed on single fully-etched 220 nm thick Si layer with SiO 2 cladding. Refractive indices of n Si = 3.48 and n SiO2 = 1.44 were used. The waveguide width was set to 500 nm for both the input and output waveguides. The demultiplexer was designed for operation at 1500 nm, 1540 nm, and 1580 nm. The power in the fundamental transverse-electric (TE) mode of the input waveguide at each wavelength was maximized at the corresponding output waveguide and minimized at the other two waveguides.
We solved the optimization problem given by minimize E1,E2,E3,φ
where E i is the electric field at the ith wavelength (i.e. 1500 nm, 1540 nm, or 1580 nm), J i is the total-field scattered-field (TFSF) current source to excite the TE mode of the input waveguide, and φ parametrizes the structure. The form of φ depends on the stage of the optimization (continuous or discrete). The objective function F is given by
where f i represents the sub-objective for the ith wavelength and is given by equation S16 in [15] : (8) where I + and I − are relaxed indicator functions as defined in equation S17 in [15] . Here, α ij and β ij are chosen to be 0.99 and 1, respectively, when i = j and chosen to be 0 and 0.01, respectively, when i = j. L(E i ) is given by
where E ij and H ij are the electric and magnetic fields of the jth output mode at the ith wavelength,n is the normalized vector in the waveguide propagation direction, and r ⊥ represents the coordinates orthogonal to the propagation direction. The supplementary material for [15] contains the full details on the definitions and mathematical derivation of the gradients.
The structure was first optimized in the continuous stage wherein = φ for 210 iterations. During this stage, neighbor biasing with k = 0.01, k avg = 0.2 and p = 3 was employed. The resulting structure was thresholded at a threshold level of 0.5 and used as the initial condition for the discrete stage optimization. The discrete stage optimization ran for 145 iterations and follows the procedure identical to our prior work [15] . The minimum radius of curvature was constrained to be 40 nm and minimum width of a hole to be 90 nm.
The device was designed in approximately 60 hours on a single computer with an Intel Core i7-5820K processor, 64GB of RAM, and three Nvidia Titan Z graphics cards. All electromagnetic simulations were performed using a graphical processing unit (GPU) accelerated implementation of the finite-difference frequency-domain (FDFD) method [22, 23] with a spatial step size of 40 nm.
III. EXPERIMENTAL RESULTS

A. Fabrication
The wavelength demultiplexer was fabricated on Unibond SmartCut silicon-on-insulator (SOI) wafers obtained from SOITEC, with a nominal 220 nm device layer and 3.0 µm buried oxide layer. A JEOL JBX-6300FS electron-beam lithography system was used to pattern a 330 nm thick layer of ZEP-520A resist spun on the samples. No proximity-effect correction step was performed. A transformer-coupled plasma etcher was used to transfer the pattern to the device layer, using a C 2 F 6 breakthrough step and HBr/O 2 /He main etch. The mask was stripped by soaking in solvents, followed by a HF dip. Finally, the devices were capped with 1.6 µm of low pressure chemical vapor deposition (LPCVD) oxide.
A multi-step etch-based process was used to expose waveguide facets for edge coupling. First, a chrome mask was deposited using liftoff to protect the devices. Next, the oxide cladding, device layer, and buried oxide layer were etched in a inductively-coupled plasma etcher using a C 4 F 8 /O 2 chemistry. Next, a protective 20 nm Al 2 O 3 coating was deposited using atomic layer deposition (ALD) in order to protect the waveguide facets during further processing. An anisotropic etch using a Cl 2 /BCl 3 /N 2 chemistry removed the Al 2 O 3 coating on the substrate. To provide mechanical clearance for the optical fibers, the silicon substrate was etched to a depth of around 100 µm using the Bosch process in a deep reactive-ion etcher (DRIE). Finally, the chrome mask was chemically stripped, and the samples were diced into conveniently-sized pieces.
B. Characterization Figure 2 shows the design, scanning electron microscope (SEM) image of the fabricated device, and simulated electromagnetic density at the operating wavelengths.
The transmission through the device was measured by edge-coupling input and output waveguides with lensed fibers. A polarization-maintaining fiber was used at the input to ensure that only the TE mode of the silicon waveguide was excited. Consistent edge-coupling was achieved by maximizing transmission with a 1570 nm laser, and the transmission spectra were measured with a supercontinuum source and spectrum analyzer. The spectra were normalized against transmission through a waveguide adjacent to the device.
The simulated and measured transmission are shown in Figure 3 . The consistency of the measurements across four identically fabricated devices indicate that the device is robust to fabrication imprecision. The peak simulated transmission was -1.56 dB at 1500 nm, -1.68 dB at 1540 nm, and -1.35 dB at 1580 nm. The peak average measured transmission was -2.82 dB at 1471 nm, -2.55 at 1512 nm, and -2.29 dB at 1551 nm. At peak transmission, simulated crosstalk was under -15 dB, and measured crosstalk was under -10.7 dB. The discrepancies between simulated and measured devices are a likely result of slight underetching and/or overetching during the fabrication process.
IV. CONCLUSION
By using a biasing technique in the optimization process, we have designed and experimentally demonstrated an efficient, compact, narrowband three-channel wavelength demultiplexer on SOI. The consistent performance across four fabricated devices indicate that the designs are also robust to fabrication errors. We expect that similar inverse design techniques can be used to design demultiplexers with more channels and smaller channel spacing while maintaining a relatively small footprint as compared to conventional demultiplexers. 
