The complexity of experimental quantum information processing devices is increasing rapidly, requiring new approaches to control them. In this paper, we address the problems of practically modeling and controlling an integrated optical waveguide array chip-a technology expected to have many applications in telecommunications and optical quantum information processing. This photonic circuit can be electrically reconfigured, but only the output optical signal can be monitored. As a result, the conventional control methods cannot be naively applied. Characterizing such a chip is challenging for three reasons. First, there are uncertainties associated with the Hamiltonian model describing the chip. Second, we expect distortions of the control voltages caused by the chip's electrical response, which cannot be directly observed. And third, there are imperfections in the measurements caused by losses from coupling the chip externally to optical fibers. We have developed a deep neural network approach to solve these problems. The architecture is designed specifically to overcome the aforementioned challenges using a Gated Recurrent Unit (GRU)-based network as the central component. The Hamiltonian is estimated as a blackbox, while the rules of quantum mechanics such as state evolution is embedded in the structure as a whitebox. The resulting overall graybox model of the chip shows good performance both quantitatively in terms of the mean square error and qualitatively in terms of the shape of the predicted waveforms. We use this neural network to solve a classical and a quantum control problem. In the classical application we find a control sequence to approximately realize a time-dependent output power distribution. For the quantum application we obtain the control voltages to realize a target set of quantum gates. The method we propose is generic and can be applied to other systems that can only be probed indirectly.
The complexity of experimental quantum information processing devices is increasing rapidly, requiring new approaches to control them. In this paper, we address the problems of practically modeling and controlling an integrated optical waveguide array chip-a technology expected to have many applications in telecommunications and optical quantum information processing. This photonic circuit can be electrically reconfigured, but only the output optical signal can be monitored. As a result, the conventional control methods cannot be naively applied. Characterizing such a chip is challenging for three reasons. First, there are uncertainties associated with the Hamiltonian model describing the chip. Second, we expect distortions of the control voltages caused by the chip's electrical response, which cannot be directly observed. And third, there are imperfections in the measurements caused by losses from coupling the chip externally to optical fibers. We have developed a deep neural network approach to solve these problems. The architecture is designed specifically to overcome the aforementioned challenges using a Gated Recurrent Unit (GRU)-based network as the central component. The Hamiltonian is estimated as a blackbox, while the rules of quantum mechanics such as state evolution is embedded in the structure as a whitebox. The resulting overall graybox model of the chip shows good performance both quantitatively in terms of the mean square error and qualitatively in terms of the shape of the predicted waveforms. We use this neural network to solve a classical and a quantum control problem. In the classical application we find a control sequence to approximately realize a time-dependent output power distribution. For the quantum application we obtain the control voltages to realize a target set of quantum gates. The method we propose is generic and can be applied to other systems that can only be probed indirectly. Machine learning has been a very active area of research recently, with focus on both the algorithms as well as the wide range of applications touching every field of science and beyond. Deep learning has particularly gained attention as it becomes more and more feasible due to today's enormous computational power as well as the availability of big data sets for training. The survey [Den14] covers the common architectures used in deep learning and the range of possible applications.
The physics community is also currently exploring the use machine learning to solve some practical problems faced in designing, controlling, and automating experiments. Some examples of recent work include the design of quantum optical setups using reinforcement learning [MNK + 18], and using deep learning and genetic algorithms [ONK18] . Deep learning was also used in Ref. [MLBZ18] to discover and characterize topological phases of matter and phase transitions. Techniques of both deep learning and reinforcement learning have been applied in quantum control [NBSN19, BDS + 18, OMBS19]. These works differ from ours by treating the entire learned model, including quantum dynamics, as a blackbox, with no detailed modeling of an experimental realization.
In this paper, we explore the use of a hybrid deep learning architecture to solve problems related to experimental modeling and control of quantum systems. Our approach can be considered very general, applying to many situations where there is a system that cannot be probed arbitrarily. Nonetheless, we focus on a particular system, currently being developed by some of the authors, which is an array of nearest neighbor coupled waveguides with a reconfigurable Hamiltonian. Characterizing such chip is a significant challenge as will be discussed later.
The structure of the remainder of the paper is as follows. The paper starts with an overview on the chip and its theoretical model in Section II A, and the experimental constraints and challenges that we will try to solve in Section II B. Next in Section III we present the proposed deep learning architecture in detail. After that, we present the numerical results of the simulations and discuss their significance in Section IV. Finally, we end with the conclusion and discuss the possible future extensions of this work in Section V.
II. PROBLEM STATEMENT
This section starts by describing the photonic circuit we are trying to model and control, and then we describe the challenges we face in characterizing it experimentally.
A. Chip model
The device we consider in this paper is an array of nearest neighbor coupled waveguides that implements a continuous time quantum walk on photons propagating along the array [ADZ93, PLM + 10]. In all previous work, static quantum walks were studied with fixed coupling parameters. Here, we demonstrate a reconfigurable waveguide array by exploiting the electrooptic control of Lithium Niobate. The waveguides are fabricated by reverse proton exchange and we apply local electric fields to change the properties of the coupled array. Figure 1 shows The waveguides in the array are separated by 10µm which enables nearest-neighbor evanescent coupling. The electric field between the electrodes causes a local change in refractive index to the waveguide or the cladding.
the schematic of the chip. We inject laser light into one input waveguide of the array and measure the output optical power distribution across all the waveguides. The electrodes can be controlled to alter the output distribution.
Numerical simulations of such a device shows a host of potential applications. The chip can operate as a classical device with possible applications in telecommunications such implementing a Mach-Zehnder interferometer or an electro-optic modulator. Being able to characterize and control such a device is important and has a strong economic impact, but at the same time is very challenging as will be discussed later. Additionally, the chip can work as a quantum device. This includes operating as a quantum router, where single photons can be directed to propagate and be detected at one of the output ports by dynamically changing the control voltages. It can also be used to generate W-States, and realize different quantum gates.
The chip with n-waveguides can be described quantum mechanically in C n Hilbert space, with the computational basis encoding the presence of photons in each waveguide. For example for n = 3 the state |0 = [1, 0, 0] T encodes a photon present at the first waveguide, the state |1 = [0, 1, 0] T encodes a photon in the second waveguide and so on. The evolution of the system represents the behavior of the chip when light propagates along the waveguides. So, the initial state of the system represents the mode distribution at the inputs of the waveguides, while the final state represents the distribution at the output of the waveguides. For example, if the system evolves from the the state |0 to the state |1 , then this means that we started with injecting a photon at the first waveguide (at one end of the chip), and the photon got perfectly transfered to the second waveguide after propagating along the chip until the output. This evolution can be described by the unitary
where l is the length of the chip, and H is the Hamiltonian of the chip. This Hamiltonian is described by the tridiagonal real-valued matrix
where β i is the propagation constant along the i th waveguide, and C i,j is the coupling coefficient between waveguides i and j. The propagation constant is given by
where λ is the wavelength, n 0 is the intrinsic refractive index of the waveguide, ∆n is a dynamical proportionality constant that determines how much the the propagation constant changes by changing the voltage across the waveguide ∆V i . The coupling coefficient is given by
where C 0 is the intrinsic coupling between two adjacent waveguides, ∆V i,j is the potential difference across the substrate between the two waveguides i and j, ∆V i and ∆V j are the voltages across waveguides i and j, and ∆C 1 and ∆C 2 are dynamical proportionality constants that determine the amount of change of the coupling between two waveguides by changing the voltages across them. These relations assume that Hamiltonian depends on the voltages linearly, and that the coupling is always between neighboring waveguides. In general, we can write the Hamiltonian in the form
where H 0 is the zero-voltage Hamiltonian, and H I is the interaction Hamiltonian which is a function of the voltages v applied on the electrodes. Note that the control voltages are time-dependent, however, the time scale of the change is much slower than the time scale of the photon travel across the chip. That is, each photon can see only one time-independent Hamiltonian from the moment it enters the chip until the moment it reaches the output. But the next photon to arrive can experience a different Hamiltonian. This assumption is plausible since it is impossible to change the voltage faster than the speed of light. This is what allows us to write the evolution as the matrix exponential of the Hamiltonian as in Equation 1, without the time-ordering operator.
In the basic experimental setup we can only measure output power distribution. For example, for an n = 3 chip, if the input state is |0 , and the output state after evolution is U |0 = α |0 + β |1 + γ |2 , then the output distribution we measure is (|α| 2 , |β| 2 , |γ| 2 ). However, to have access to characterize a fully quantum model, we need to measure phases at the output. One of the convenient ways experimentally to measure relative phase shifts between two optical paths is through Mach-Zehnder interferometery as shown in Figure 2 . Recall the basic idea is to construct a quantum circuit whose output probability amplitude depends on the phase shift required to be measured. With an initial state |0 , a standard calculation shows that the final state after the beamsplitter at the bottom-right of the diagram is
Now, if we measure the power at the detector, we get P (θ) = 
where ∠α denotes the phase of α. These two equations can be solved simultaneously to find the amplitude and phase of α. Now, the procedure can be repeated by placing the mirror at the top-right of the diagram at all other outputs of the chip and obtain the amplitude and phase of this part of the state. Since we have an n-dimensional pure state, it is completely defined by 2n degrees of freedom corresponding to real and imaginary part of each coefficient. (In fact, only 2n − 2 are needed since we have the normalization constraint, and a non-significant global phase shift). The same procedure can be executed to characterize the output state when other inputs are activated. Finally, it is worth mentioning that this setup for measuring phase is not the only possible way, may be there is a more efficient way to measure the phases at the output without requiring to move the optical components spatially. This is however out of scope of this paper.
B. Experimental challenges
There are many experimental challenges faced when characterizing a fabricated chip, as well as designing the control voltages to implement some desired behavior. The main problem is the drifting measured output optical power. This is caused by charges getting trapped at the interface between the Silicon Dioxide and the Lithium Niobate. These charges have very low mobility and therefore take a long time to accumulate and a long time to diffuse when the voltage is removed. These trapped charges are the central reason we have difficultly controlling and characterizing this device. The long diffusion time results in the voltage never 'resetting' to zero. In then becomes extremely difficult to infer what electric field is being applied to the waveguide. In any case, the chip has some equivalent electrical circuit model. But this is difficult to model and characterize experimentally, as we cannot measure physically the voltages the chip actually sense when we apply externally some control voltages. The only available measurements are the output waveguide power distribution, which depends non-linearly on the control voltages. This makes the problem a non-linear control and estimation problem and that is classically difficult to solve. These effects cannot be neglected as well because the distortions in the control voltages will be reflected on the measured power distribution. It will also have a memory effect in the sense that when we apply some control pulse, the output power will be affected by that pulse in addition to the previous pulses that were applied. This means that if at some point in time we set all the control voltages to ground, we will still observe variation of the power distribution in time. The classic way of overcoming this problem is during fabrication by etching the buffer layer between the electrodes [YM81] . However, for the particular chip we are working with, the dimensions are very small and technologically it is difficult to do this process. Thus, this problem has to be addressed differently.
Besides this major problem, there are three other difficulties. First, There are uncertainties regarding the Hamiltonian. For instance, we assumed it to have a tridiagonal form reflecting the fact that only adjacent waveguides are coupled. But there is a possibility that there are more off-diagonal terms leading to higher order than nearest neighborhood coupling. The other assumption about the linear dependence of the Hamiltonian of the control voltages as in Equations 3, 4 is also not necessary true as there might be higher order terms. Finally, there are losses at the output due to the coupling of the chip to the external optical fibers connected to the photodetector. These will cause inaccuracies in the measurements affecting any parameter estimations. These losses also have to be characterized so that we can make corrections for the detected power signals. We will model the losses bŷ
whereP k is the k th normalized measured power at waveguide k, and P i is the actual power at the output of the chip for waveguide i. The normalization is just to make the measurements construct a distribution.
As a result of all the previous challenges, estimating the Hamiltonian parameters from measured data is very difficult.
III. METHODS
In the previous section we described the challenges we face in experimentally characterizing the chip if we use conventional methods of model and parameter estimation. In order to address all these challenges, we propose to use a completely data-driven approach rather than a parametric approach. We are going to use graybox model where the Hamiltonian will be treated as a blackbox, while the quantum evolution and quantum measurement will be treated as whitebox. This is because all the uncertainties are in the Hamiltonian, while the all the laws of quantum mechanics are known. We will design a deep learning structure to implement this idea. The problem will be divided into two stages. The first stage, a set of known control voltages and corresponding power distribution will be used by a supervised deep learning algorithm to find a complete graybox model for the chip. The second stage will be creating another deep learning structure to find the control voltages that results in some desired behavior of the chip, using the estimated model from the first stage. This section starts with a detailed description of the architecture used to model the chip. Next, the training and testing procedures are presented. After that, the detailed description of the control voltages predictor for the chip is presented. Finally, the section ends with extending the proposed structure to account for a fully-quantum setting where phases can be measured at the output.
A. Chip model architecture
The deep learning architecture the chip is shown in Figure 3 . The first layer in the model is a Gated Recurrent Unit (GRU) [CVMG + 14] . This is a variant of the Long-Short Term Memory (LSTM) structure often used in sequence prediction and classification [HS97] . GRU is more efficient than LSTM as it has less number of parameters to be learned during the training stage. However, in terms of accuracy, it is not very clear which is better generally, and this remains an open topic under investigation within the machine learning community [CGCB14] . The number of inputs is equal to the number of electrodes which is 2n. For our implementation, the number of hidden units of the GRU is chosen to be 60. In general, more hidden units allow modeling more complex waveforms, but on the expense of more parameters to learn and thus more computational resources required. The objective of this layer is to learn the interaction Hamiltonian, i.e. learn how the Hamiltonian depends on the external voltages. This should also include the parasitic effects in the chip causing distortions of the applied voltage waveforms. The number of free parameters of any real-valued symmetric Hamiltonian of size n × n is n 2 (n + 1). However, the output of the GRU is the output of the each hidden node. So, to extract the required number of outputs, we add a neural network (NN) formed of a single layer that is fully-connected to all of the outputs of the GRU. The number of neurons is exactly equal to n 2 (n + 1), as each neuron generates one output. Linear activation is used for all neurons, to allow the output to take any value and not be restricted in some range if we use other activations such as sigmoid. Notice, that the GRU is a sequential layer, so the output has an extra dimension of time. However, the NN layer is static acting equivalently on each time slice of the output of the GRU. This means that weights applied to the GRU output at every time instant are the same. These two layers together act as a device to learn the free parameters of the Hamiltonian as a function of the input voltages.
The third layer in the structure is a custom-defined layer that has two functionalities. The first one is to reconstruct a symmetric matrix from the output of the previous layer. This is done by reshaping the outputs as an upper triangular matrix, and then sum it with its transpose. The second functionality is to add to the drifting Hamiltonian, that is the zero-voltage Hamiltonian that models the inherent coupling between the waveguides. The parameters of this drifting Hamiltonian are learned during the training process as will be illustrated later. The final output of this layer is therefore the full Hamiltonian of the system.
The next layer of the model is the quantum evolution layer. This is a custom defined layer, that takes some Hamiltonian as input, an initial quantum state as a defining parameter, and generates the probability amplitudes of the an evolved state as output. These probability amplitudes correspond to the waveguide power distribution. So, the layer first calculates the evolution matrix U = e −iHl . Next, it calculates the evolved state |ψ F = U |ψ 0 . Finally, it calculates the probability amplitudes of the evolved state m|ψ F , m = 0, 1, . . . , n − 1. Now, a problem arises if we train the model with the structure so far. Since, only one initial state is used in the quantum layer, then the learned Hamiltonian will be valid only for evolutions of this state. But, if we use the same Hamiltonian to evolve other initial states, we might not obtain a correct evolution. So, we will need to have learn a different Hamiltonian for each initial state. This is a major problem, since quantum mechanics is a linear theory, so the Hamiltonian should not depend on the quantum state being evolved. Thus, we have to constrain the Hamiltonian in some sense so that it works for all states. The way we propose to solve this problem is to have different copies of the quantum layer each parameterized by a different initial state. Then, we connect the input of all these layers to the same output of the previous Hamiltonian layer. In this case, during the training, the model will be enforced to generate a Hamiltonian that correctly evolves each of the initial states. Since a unitary can be completely characterized by knowing the outputs corresponding to each of the of the computational bases as input states, we only need n of 'parallel' quantum structures each generating n outputs. So, the total number of outputs for this whole layer is n 2 .
The final layer in the model is also a custom-defined layer that models losses during power measurements. This physically occurs due to coupling between between the chip and optical fibres connected to the photodetectors. The layer simply implements the calculationP k =
, whereP k is the k th measured power at waveguide k, and P i is the actual power at the output of the chip for waveguide i. The denominator in the expression is to ensure that the measured powers are normalized, (i.e. form a distribution). The coupling coefficients are learned during the training stage as will be discussed later. For each quantum block in the quantum evolution layer, we cascade one of these coupling layers. However, all of these copies of the coupling layers are identical (i.e. have the same parameters). This reflects the fact that the losses are independent of which waveguide was used as input, and just related to the hardware of the experiment.
B. Training and Testing
There are two stages to do the training phase for the model, where all the unknown parameters of the model are leaned by providing examples. The first stage is to learn all zero-voltage parameters, i.e the drifting Hamiltonian and the coupling losses coefficients. All these parameters are static and do not depend on the input voltages. For this training step we then detach the the GRU and NN layers from the model. The input of the model is then the input of full-Hamiltonian layer which is fixed to be all zeros. The output is the lossy power distribution. This is obtained experimentally by fixing the physical voltage on the chip to zero, using one of the waveguides as input and measure the power across each waveguide. The procedure is repeated for all input waveguides. Since, the distribution in this case is static, we get a total of n 2 readings. With this pair of training data (zero voltage as input, and n 2 readings as output), the model is trained by backpropagation using RMSprop [TH12] , and all the unknown parameters are learned. We use the mean square error (MSE) as the loss function and also as the performance metric.
The second stage of training is to obtain the dynamic behavior of the chip, (i.e. how to the waveguide power distribution change in time as a function of the input time-varying voltage. In this stage the full model is used, unlike in the first stage. All parameters learned from the first stage are fixed and not changed during this stage. Again backprogation is used to train the remaining unknown parameters using the pair of some voltage waveforms as input, and the corresponding measured power distribution waveforms as output. After the stage, all the learned parameters are fixed and can be used in the testing phase.
The testing phase of the model is where the trained model is given a new input that was not in the training set, and the predicted output is compared with the actual output. A good model is a model that generalizes well over new inputs. At this stage of the problem, we require a good generalization behavior since what we are obtaining at the end is graybox model for the chip, and so it should be able to predict the output measured output distribution for any input voltage waveform. However, in practice this is a hard requirement to be able to predict outputs for arbitrary waveforms. Therefore, we restrict all the voltage waveforms in this paper to be in the form of arbitrary synchronized pulses across the electrodes (i.e. pulses starting and ending at same time, but with different amplitudes).
The architecture of this model has a major advantage which is the possibility of monitoring the output of each layer during testing, each corresponds to a physically interesting and important quantity. So, the output of the first layer is a prediction of the interaction Hamiltonian as a function of the input voltages and time. The output of the second layer is a prediction of the full Hamiltonian. The third layer predicts the ideal power distribution, while the output of the last layer is prediction of the measured power distribution. This shows that relevance of this deep learning structure. For instance, had we combined all layers into one LSTM-based layer, then we would be able to predict the measured power distribution only, and not the ideal distribution, nor the Hamiltonians.
C. Controller Architecture
The second major task we target is to find be able to find the control voltages applied to the chip in order to to obtain some desired power distribution, corresponding to some target Hamiltonian. The architecture for the controller is shown in Figure 4 . The first layer is again a GRU layer followed by a fully-connected neuron layer similar to that used in the model architecture. However, the input is some desired target Hamiltonian, and output shall represent the control voltages which is a 2n vector. Since we need at least one of the electrodes to be connected to ground, we actual enforce the very first electrode to zero. Also, we enforce the last electrode arbitrarily to zero. This leaves out 2n − 2 control voltages to predict. For efficiency purposes, we actually input only the upper triangular part of the Hamiltonian flattened into an n 2 (n + 1) vector.
One major issue to consider is that the voltage across any two adjacent electrodes should not exceed in absolute value V max . So, all the neurons at the output have a scaled hyperbolic tangent sigmoid activation in the form f (x) = Next, we cascade a copy of the previously trained model without the couping losses layer. All the trained parameters of the model are fixed and do not change during the training of the controller. The reason behind leaving that layer is that the power loss is due to the measurement process, and not the operation of the chip. For instance, if two chips were connected in cascade with perfect coupling, then we would be interested to predict the control voltages for the first chip to produce some desired state at its output, and there will be no effects of the losses for the first chip. Connecting the pretrained model enforces the whole controller structure to train the new GRU and NN layers such that when the target Hamiltonian is applied at the input, the control voltages are calculated such that the target ideal power distribution is generated. In doing this, all the distortions that appear in the power distribution due to voltage distortions are handled automatically by the controller. Because the target output power distribution is distortion-free, if the controller wants to minimize the total MSE, then it is enforced to produce voltage waveforms that actually undo the effects caused by the parasitics of the chip. So, in some sense we are learning an inverse model of the equivalent circuit model of the chip and at the same time making sure the final quantum state is correct. In other words, this structure in some sense does both classical control (undo distortions) and quantum control (obtain the target quantum state) at the same time. Now, the output of the GRU+NN layer is in fact the desired control voltage.
For this case of constructing the controller model, it is not a requirement that it should generalize to every possible target Hamiltonian/target-distribution pair. Whenever we are interested to realize some sequence for the operation of the chip, we just run the learning procedure, and probe the output of the GRU+NN layer. So, in some sense we are using backpropagation as a direct optimization procedure rather than a learning procedure.
The last point to note is that not every possible Hamiltonian can be realized with the chip model. Some Hamiltonians may require voltages that exceed the maximum allowed range. An open question is what kind of quantum gates can be actually implemented using this chip given the constraints. This is however outside the scope of this paper.
D. Fully-quantum model
The architectures described so far are somehow not fully quantum in the sense that the Hamiltonian is assumed to be real, and that we can only measure powers at the output (corresponding to probability amplitudes). However, it is very easy to extend the proposed method to the fully quantum case, if we perform the Mach-Zehnder type of measurements as discussed previously. The same overall architecture is quite similar, with only the following modifications:
• The neural layer after the GRU is set to produce n 2 output instead of the n(n + 1)/2, to account for the imaginary part of the Hamiltonian matrix elements.
• The Hamiltonian layer reshapes the output of the neural layer to an n × n matrix, where the lower triangular part represents the imaginary part of the Hamiltonian while the upper triangular part represents the real part. So, by multiplying the lower triangular part by i and adding the whole matrix to its Hermitian conjugate, we end up with an n × n Hermitian matrix. Also, the zero-voltage Hamiltonian H 0 is manipulated similarly to account for the possibility of complex-valued entries.
• The quantum layer now instead of outputting the probability amplitudes, it outputs the Mach-Zehnder power measurements. So if the final state is k α k |k , then the layer's output are P k (0) = 1 4 |α k + 1| 2 , and P k π 2 = 1 4 |α k + i| 2 , for all k = 1, ...n. So, the total number of outputs for this layer is 2n, and for the whole model is 2n 2 . We do not need to explicitly calculate the amplitude and phases from the interferometer measurements, we will just use the measurements directly for training.
• For simplicity, we removed the last coupling layer as our focus in this application was on exploring the possibility of learning a full quantum system. However, in general we can include it.
The training follows the same procedure as previously, only taking into account that we training set will include the interferometer measurements rather than the powers. As for the controller, there will be no difference in the architecture since all modifications are already implemented in the chip model which is fixed after training. In other words, the controller is independent on the system model. Again it should be noted that the target power distribution used for training the controller should be interferometer outputs distributions.
IV. SIMULATION RESULTS
This section discusses the implementation details of our method and the results of the numerical simulations. A discussion on the significance of the results is given afterwards.
A. Implementation
For implementing the proposed architecture we used the "Tensorflow" Python package[AAB + 15], and its high-level API package "Keras" [C + 15]. The Python implementation of our algorithm is publicly available 1 .
In order to do training and testing, we created a dataset consisting of control voltages in the form of random pulses, and the corresponding waveguide output power distribution for different input waveguides. We generated a total of 4000 examples, 3500 of which were used for training and 500 for testing. The amplitudes of the pulses are from -5 Volts to +5 volts and the time domain is limited to the interval 0 ≤ t ≤ 200(ms) with sampling time of 0.2(ms). In each example, the voltage on the first and last electrode are fixed at zero, while the pulses are actually applied randomly at the remaining electrodes. The restriction on these pulses is that they have to be synchronized across the different electrodes, starting and ending at the same time. However, the durations and amplitudes are random from one example to another. The experimental setting would be to generate these pulses and apply them physically to the chip, then measure output power distribution and do the learning process. However, in this paper, we restrict the study to computer simulations. So we created a simulator for the chip that can generates the waveguide power distribution given a set of control voltages, using the Hamiltonian model described in Section II A. The simulator takes into account the non-ideal effects due to the equivalent circuit behavior of the chip, as well as coupling losses.
B. Results
For the task of modeling the chip, the MSE obtained after 10 4 iteration was about 2.1 × 10 −4 for the training dataset. Figure 5 shows the MSE versus the number of iterations. For the testing dataset, the MSE evaluated is 3.4 × 10 −4 . Figures 6,7 , and 8 show examples selected randomly of the testing dataset including the control voltages, simulated measured waveguide power distribution and the predicted power distribution. To test the control part, we defined as an example a sequence of target unitaries in the time interval 0 ≤ t ≤ 300(ms), given by
where the unitaries are defined in Table I . The Hamiltonian is then evaluated for each time interval by taking the matrix logarithm H = i l log U . After training the controller model for 500 iterations, the MSE was 2 × 10 −2 . The MSE versus the number of iterations is plotted in Figure 9 . The resulting control voltages are shown in Figure 10 , and the resulting predicted ideal power distribution in Figure 11 . The initial state is |0 , i.e. full power at the first waveguide.
For the extension to the fully-quantum setting, we use the same dataset of pulses, but now we have the interferometer power measurements as the model output. The number of iterations is 1.3 × 10 4 , which is more than the other model to account for doubling the size of the outputs. Figure 12 shows the performance of the training in this case. The MSE evaluated for the testing dataset it 2.88 × 10 −4 , while it was 1.74 × 10 −4 . This is an indication for the the ability of the model to fit the training dataset as well as generalize to the testing dataset. Figures 13 and 14 show the result of the predicted waveforms using the same control pulses as in Figures 6 and  7 . Now, since the phase is also measured, then we can have a complete quantum description of the output state, and thus we can construct the evolution unitary. A commonly used measure for the closeness of two quantum gates U and V of dimension d, is the gate infidelity defined as
Infidelity is thus a number between 0 and 1, with 0 representing complete overlap (i.e. same matrices). Figure 15 shows the infidelity between predicted unitary and actual unitary as a function of time. Finally, for the testing the control algorithm in this setting, we used as an The initial state is |2 , i.e. full power at the first waveguide.
example the following sequence for 0 < t < 280
(12) Figure 16 shows the infidelity between the desired quantum gates and the controlled quantum gates. The control voltages are shown in Figure 17 . The training history is shown in Figure 18 .
C. Discussion
The results presented shows the accuracy of the proposed architecture in modeling the chip with all the constraints mentioned earlier. Quantitatively, the loss represented by the MSE decreases on average by increasing the number of iterations during the training phase, reaching a small value that is in order of 10 −4 . However, this is not sufficient to completely asses the behavior of the proposed algorithm. The plots of the waveforms in The initial state is |1 , i.e. full power at the first waveguide.
power distribution is almost negligible. More importantly, since the model has not been trained on the testing set, it proves that the proposed structure can generalize. This important for the task of modeling. The architecture doesn't allow to give explicit mathematical expression for the Hamiltonian. But, due to its ability to generalize, we can just use it directly to estimate the Hamiltonian given the control voltages. Also, quantitatively the MSE evaluated for the testing set is also in the order of 10 −4 , without much degradation than the value for the training set.
The qualitative results also show that the architecture is able to handle all the challenges described in Section II B. We were able to model the distortions caused by the equivalent circuit without the need to explicitly define a particular circuit model or how the Hamiltonian depends on the circuit response. This also, saves us from having to characterize these parasitics experimentally which will be difficult as discussed previously.
For the control task, we see also the proposed method was very successful in obtaining the required control voltages as reflected in Figure 11 . We see that the distortions that were present in the power distributions are not there anymore, and at the same time we were able to achieve required functionality. The control voltages were also limited to the desired operating range. However, we see that for the X 12 gate, we couldn't do full transfer between waveguides 1 and 2. We believe that this related to the fact that not all gates are possible to implement, which as hinted before remains outside the scope of this paper. A final thing to notice is that all the examples in the training set were limited to the time range 0 ≤ t ≤ 200(ms). However, the target control sequence has a wider range 0 ≤ t ≤ 300(ms), and still we are successful in our task. This is a result of using the GRU layers, and shows how the whole model generalizes quite well.
The proposed modifications in the architecture to account for fully-quantum models was also very successful. This is evident from the low MSE value for both training and testing datasets with small difference between both. This supported qualitatively through the plots of the power waveforms as well as the infidelity. Also, the control part seems promising with the same issue remaining about the class of possible gates to implement.
V. CONCLUSION
In this paper, we proposed a deep learning structure that is suitable to model a reconfigurable integrated waveguide array chip. The architecture addresses three major problems faced when characterizing the chip experimentally. The uncertainty in the Hamiltonian model, the presence of undesired macroscopic dynamics causing distortions, and losses due to imperfect measurements. The proposed architecture followed a graybox model approach, where the Hamiltonian as a function of control voltages is treated as a blackbox utilizing a GRU network as a main component. The waveguide power distribution as function of the Hamiltonian is treated as a whitebox since the laws of quantum mechanics are known. We also proposed another complementary deep learning structure to obtain the control voltages required to achieve some target sequence of gates. The qualitative as well as quantitative results showed a very promising performance for both tasks.
There are many possible extensions to the presented work. On the theoretical side, it would be interesting to know the set of gates that are possible to implement on this chip given the constraints introduced in the model. We would like also to validate the numerical results shown in the paper experimentally on the physical chip which is currently in progress. Another interesting extension is to explore the use of fidelity as cost function to do the training rather than the MSE, and see whether or not would it yield better results. Finally, it would be worth looking into extending the methods introduced in this paper to model and control other quantum systems. 
