Placement, routing, and scheduling are essential tasks for near-optimal performance of programs for noisy quantum processors. Reliable execution of an arbitrary quantum circuit on current devices requires routing methods that overcome connectivity limitations while meeting data locality requirements. However, current devices also express highly variable noise levels in both the quantum gates and quantum registers. This requires any routing algorithm to be adaptive to both the circuit and the operating conditions. We demonstrate near-optimal routing methods of noisy quantum states that minimize the overall error of data movement while also limiting the computational complexity of routing decisions. We evaluate our methods against the noise characteristics of a 20-qubit superconducting quantum processor.
INTRODUCTION
Quantum computing offers an approach to computation in which the features of quantum physics are used to store and process information. 1 By controlling the dynamics of quantum physical systems expressed as an addressable register, an encoded quantum state can be transformed using a well-defined sequence of operations known as gates. The available gates are composed into logical sequences to carry out computation and, for a growing set of problems, 2-5 these constructions may require fewer steps, less time, and less energy than conventional methods. 6, 7 Currently, several available devices provide platforms in which a restricted set of gates can be composed to carryout computations on small capacity quantum registers. These realizations of quantum processors include variants derived from superconducting electronics [8] [9] [10] [11] [12] [13] as well as trapped ion processors. [14] [15] [16] [17] [18] [19] Each quantum processor offers its own set of intrinsic gates that take advantage of the available platform physics for carrying out computations on the encoded quantum state. 20 However, these intrinsic gates are characterized currently as demonstrating non-trivial errors that lead to incorrectly prepared quantum states. These errors arise from imprecise control of the quantum register that are due, for example, to poorly characterized environmental coupling or fluctuations in the applied control fields. In addition, the resulting gate errors may depend on the type of gate applied as well as when it is applied and to which register element. The information encoded in the quantum state is also highly susceptible to loss, namely decoherence, arising from uncontrolled coupling to the surrounding environmental. The timescale for decoherence places a naturally upper bound on the longest duration of reliable gate sequence and, collectively, all these sources of error place a limit on the number of gate operations that may be executed before the accumulated noise dominates the computational output. Methods to optimize the schedule of these noisy, intermediate-scale quantum (NISQ) devices are essential to the development of quantum computing applications as well as the task of benchmarking platform performance.
Existing hardware architectures are also characterized by hard constraints on interactions between register elements. The interactions are limited, for example, by connectivity constraints on the proximity between fundamental physical systems as well as controllability constraints that arise from the necessary physical processes. For example, registers in superconducting electronics platforms typically adhere to a two-dimensional design arising from layout restrictions from fabrication techniques. Similarly, ion trapping control methods have upper limits on the number of available registers (ions) that can be addressed simultaneously. Connectivity constraints can be mitigated by moving or transporting quantum states between register elements but at the expense of additional gates and longer program duration. Routing decisions must therefore be made alongside noisy gate scheduling decisions to ensure the register elements are properly configured to support the scheduled operation.
Recently, several closely related works have investigated performance of gate scheduling and state routing for select hardware platforms. 21, 22 Tannu and Querreshi have investigated how variability in gate noise and register layout within a superconducting electronics device may be accounted in routing methods. The variation-aware qubit movement (VQM) routing method addresses the task of moving a quantum state from register q src to register q dest , i.e., swap, while respecting connectivity constraints and minimizing the probability of failure. In particular, VQM routing accounts for variations in the errors from the available gates for carrying out the swap operation. The related task of allocation addresses the selection of the q src and q dest . Tannu and Querreshi applied these methods to a 20-qubit superconducting electronics register characterized by highly variable error rates and sparse connectivity. Contemporaneous work from Murali et al. investigated the compilation based on satisfiability modulo theory (SMT), which parses a series of scheduling and routing constraints were developed to satisfy logical correctness and minimize total program execution time. As SMT became time intensive for register sizes above 30, Murali et al. also developed routing heuristics based on the relative frequency with which registers interact that were shown to be near optimal in performance.
In this contribution, we develop near-optimal routing techniques to maximize swap success with respect to the final state fidelity, and we test the performance against an existing superconducting transmon quantum processor. We cast the routing task within the framework of linear programming by constructing heuristic objective functions that account for variations in the device gate noise. The paper is organized as follows: in Sec. 2, we briefly review notation and specification for routing and scheduling tasks as well as characteristics of current NISQ devices; in Sec. 3, we present the development of routing heuristics based on novel shortest-path decisions; in Sec. 4, we verify the accuracy of the heuristics using experimentally characterized hardware; and in Sec. 5 we discuss how to validate our heuristics as good using current and future NISQ devices.
ROUTING AND SCHEDULING TASKS
Consider a quantum processing unit (QPU) to consist of a register q composed from n addressable elements labeled q j for j = 1 to n. The information stored in each register element q j is defined with respect to the computational basis states, |0 j and |1 j , which corresponds to eigenstates of the Pauli operator Z j . We denote the composite quantum state for the n-element register q as |Ψ q (t ) in which t expresses the -th time step. Let the vector a = a(q) denote the physical address for each element in q; for convenience, we use the element labels j = 1 to n as the address information but more hardware specific information may be used for this purpose. We then consider a quantum program P = (a, I) to consist of a series of instructions I = {I } applied to a list of element addresses a that transform the register from a well-defined initial state to a sought-after final state. The instruction I specifies such a transformation during the -th time step and we specify each as I = ( , G, r), where the gate G denotes either a non-unitary operation, such as a projective measurement, or a unitary operation, such as a multi-element rotation, and the vector r denotes the target addresses of G. The size of r depends on the arity of the gate G, e.g., a two-qubit gate such as cnot will have a register set of size 2.
We define three related tasks for synthesizing a program P for execution on quantum hardware: placement, routing, and scheduling. Placement evaluates assignments of the addresses in a to the set of possible register addresses available during program execution. Routing transforms the gate sequences and target addresses to be consistent with placement and optimal with respect to constraints in gate connectivity. Scheduling orders the relative sequence by which instructions are executed to avoid resource conflicts. While all three tasks are strongly related, we will focus on the routing task. We consider an initial program P 0 = (a 0 , I 0 ) that consists of a register a 0 = (a src , a dest ) and a single swap instruction. The purpose of this program is to move the quantum state stored at address a src to address a dest . However, constraints within the hardware connectivity may not permit this two-qubit operation to be implemented directly, and a sequence of interdependent swap operations may be necessary to carry out the equivalent logic within these constraints. Routing will construct a program P 1 = (a 1 , I 1 ) that meets the logical requirements while also satisfying these hardware constraints. Given a Figure 1 : Placement and routing of the circuit on the left has many possible solutions including the two shown here. Given the source and destination nodes as registers 1 and 12, respectively, then the first solution offers the fewest number of swap operations while the second avoids the abnormally noisy register 3. The decision of which path to select can be cast as an optimization problem formulated in terms of a cost function that accounts for time, distance, and fidelity measures. specific hardware connectivity, multiple routes may be available to meet these requirements and selection of the optimal route is motivated by the need to complete the operation as quickly as possible and with the highest state fidelity. This is in order to mitigate the loss of coherence in the quantum state and perform the necessary logical transformation. The underlying physical swap operations may not be homogeneous with respect to these metrics and therefore different routes will prepare the final state with different fidelity.
As a guiding example, we consider the swap routing task with connectivity constraint in the presence of heterogeneous errors for the gates within a device having constrained connectivity. Our approach is to modify the concept of shortest path between the source and destination address to account for the errors in each sitedependent swap instruction. Under ideal conditions, we assume the register is initialized in the separable pure state
with ψ the qubit stored initially in register q 1 . At the final time t N , an ideal swap program will prepare the state
We denote the idealized swap instruction acting on sites q i and q j as swap(i, j). We then consider the route r = {( , (i, j))} to be the ordered sequence of swap instructions acting a series of register elements. An example of different routing choices r is shown in Fig. 1 .
ROUTING HEURISTICS
We construct heuristic methods for selecting the optimal route(s) that account for the most salient features influencing state fidelity. For example, when assigning qubits to register elements, it is typically beneficial to localize spatially those elements in a high fidelity portion of the device. In addition, the length of the route for a quantum state may be minimized by placing dependent registers in a localized cluster so as to minimize the movement on the device during execution. Solving the placement and routing problems as a single task permits optimal solutions that minimize the noise rates as compared to naive, independent approaches. However, a major disadvantage of such an approach is the computational time required to find such solutions. For example, one execution of the problem is formulated as an SMT, which uses details about the device to find an optimal route. Whether or not a device characteristic should be included in the noise model is dependent on its associated error rate relative to other characteristics. The level of detail required when utilizing this characteristic in the resulting optimization problem is dependent on the variance of the characteristic.
Although placement may be performed to minimize the amount of routing required later on, even idealized placement is unlikley to eliminate the need for run-time routing given the variability in current NISQ device connectivities and the complexity of interesting quantum programs. Of course pathological programs can be written to avoid routing after placement but we anticipate those programs will not be encountered in real world situations. Additionally, there may be programs in which a measurement will yield classical data that is used to branch the execution of the quantum program. In this case, the scheduler must rely on speedy algorithms to solve such problems on the fly. One application example arises during fault-tolerant operations which requires classical decision-making to determine the next sequence of quantum operations. We focus our attention on the routing problem, only briefly exploring how placement may help or hinder the routing process. We examine heuristic algorithms in which intermediate results may be cached to facilitate rapid execution of the routing process, as in the case of on-the-fly decision making.
We consider two examples of routing heuristics. The first assumes that routing occurs with no characteristic information about the noise in the hardware. This first case represents a naive and synthetic approach to quantum program execution. The compiler and run-time only have topological data to place and route program qubits. Placement on hardware qubits is entirely arbitrary. Intuitively we expect placement to perform extremely poorly without error characteristic information, and it is reasonable to expect any compiler or runtime to expand a statement into the minimum number of terms and therefore routing between hardware qubits may be based on the shortest path between two endpoints. As a second case, we consider that the compiler has characteristic information about the hardware, including its topology as well as gate errors within some model. These errors are taken to represents the current state of the hardware, but may not be accurate depending on when they were collected. We first present a model for how noise impacts routing fidelity and then describe these heuristics in more detail.
Characterizing Noisy Operations
In order to account for a broad range of errors, we may model the actual quantum state in the NISQ context using a density matrix with the initial value given as
and intermediate values defined by
where we use the channel operator ξ ( ) acting on a density matrix ρ −1 to describe noisy transformations for = 0 to N . In general, the channel operator may model interactions between all registers as well as an implicit environment. However, we will restrict ξ ( ) to model interactions between a pair of registers that corresponds with the -th swap operation.
We characterize the accuracy of the prepared final state ρ N using the fidelity defined with respect to the ideal outcome Ψ(t N ) as
The fidelity has the benefit of being a proper measure of distance between the actual and ideal states, but it is difficult to calculate experimentally. Notable approaches include using numerical calculations with tomographic reconstructions of the actual state but at a substantial cost in measurement configurations.
As a crude alternative characterization of state preparation accuracy, we use sampling of the measurement outcomes observed following the swap operation. Specifically, we calculate the fidelity for transferring classical states from the source to destination registers. We then characterize individual swap operations relative to the ideal truth table and construct the corresponding channel operator that satisfies this model. In practice, the test hardware makes use of the logical identity
where i and j denote the source and destination elements, respectively, of the original gate, and we therefore develop this approach around the classical truth table for cnot. This is given in tabular form as
while the observed truth table may be characterized by the stochastic matrix
where p(i|j) is the probability to observe outcome i given input j using the integer representations i = 2b c + b t and j = 2b c + b t of the binary string. The probabilities for each row of the table are normalized, i.e., 
We compare the observed truth , in order to characterize the gate error in cnot. We characterize the similarity between the observed and ideal probabilities using the total variation distance (TVD) by calculating
The TVD vanishes when the observed probabilities match the ideal probabilities for input i. We average over all inputs to calculate an average TVD of the gate, i.e., for a two-qubit gate D avg = i D(i)/4.
However, observed error rates for any gate will depend on errors in the measurement of individual registers. We also model these errors using a stochastic matrix, which is presented in tabular form as
where p m (i|j) is the probability to measure the binary state i given binary input j. Rows sum to unity due to normalization of the probabilities, and we again use TVD to characterize the similarity between the observed and ideal probabilities with the latter defined as p m (0|0) = p m (1|1) = 1.
Our model for the observed behavior of a single-qubit measurement uses a channel operator, ξ m , acting on a density matrix ρ j as
with Π i = |i i| the projective operator onto the i-th binary outcome. The observed probabilities p (obs) m are collected from preparing each register element in either the 0 or 1 binary state and measuring the corresponding outcomes. We then fit these probabilities to the single-qubit channel operator defined by
Our model for multi-qubit measurements uses tensor products of the single-qubit model to construct the corresponding transition matrix M . The resulting mixed-unitary form of the composite measurement channel can be inverted to remove measurement errors and recover the transition probabilities of gates preparing ρ j as
where M i,j = p m (i|j) is the transition probability from (classical) input state j to output state i. 
Heuristic Cost Functions
Shortest-path routing tracks the number of edges traversed and it is expressed as
with x ij a measure of distance between registers i and j. The constraint ensures the path starts and terminates at the source and destination and that all motion is forward. The model behind shortest-path routing ignores the spatial and temporal variance in errors across the device and assumes that gate operations are homogeneous. This is to say, the variance in gate noise is assumed sufficiently small that the shortest path will yield the lowest error rate since, for any two path segments of the same length, the error rate would be approximately the same.
The advantage of such an approach is that prior noise characterization is not required by the algorithm, though characterization would be required to determine if such a formulation is valid in the first place. This formulation would perform well for stable devices with lattice-like connectivity, where a metric as simple as the L 1 norm could be used to evaluate route optimality.
A disadvantage of shortest-path routing is that it excludes the possibility that it is not necessarily the best path with respect to the state transmission fidelity. We may modify the problem statement to account for 
where e ij represents the relative error of the swap operation on nodes i and j in hardware graph A. We define the gate error with respect to the composite TVD for the cnot gates used to compose the operation, e.g., e ij = 1 − (1 − D cnot ) 3 when the TVD is identical for all three gates. In addition to gate error, the quantum state experiences an overall decoherence due to coupling with the environment. While certain longer paths may provide better fidelity based on the aggregate gate error, the longer path will incur more decoherence. We introduce the timed, shortest weighted-path routing to account for the penalty associated with longer paths as the multi-objective optimization This multi-objective program we solved using a basic scalarization technique that attributes weights to each objective terms. We will consider this case in detail elsewhere.
SIMULATED ROUTING FOR EXPERIMENTAL HARDWARE
As a test case for our heuristics, we evaluate the superconducting transmon device from IBM known as Tokyo. This QPU supports registers of size n = 20 but have a limited connectivity that varies with calibration settings. An example of the Tokyo connectivity user in our study is shown in the left panel of Fig. 2 . Neither device natively supports the swap operation but instead replaces this gate with a sequence of three cnot gates, which are themselves decomposed into 1-and 2-qubit gates. Details about these gate sequences are abstracted away in our experiments so that our characterizations account for the cumulative errors.
We have characterized the Tokyo processor using the classical characterization methods described in Sec. 3.1. Figure 2 shows the characterization results for measurement of each register of the Tokyo processor. Each register element is characterized by the probabilities to recover the wrong output bits based on the prepared input state, either '0' or '1'. It is important to note that preparation of the '1' state requires application of the X operator, which we consider to have negligible noise for this program. It is apparent from the right panel of Fig. 2 that there is significant variation in the measurement errors for the register with element 7 the most egregious. These error rates parameterize the channel operator presented in Eq. (13). We next characterize the individual cnot operations observed between each pair of register elements. Connectivity constraints limits these tests to nearest neighbor, connected elements, but directionality plays an important role in these characterizations as the cnot gate between pair (i, j) may differ from that between (j, i). Four different input states are tested, (0, 0), (0, 1), (1, 0) and (1, 1) , and the observed two-qubit measurement outcomes for each input state are recorded over an ensemble 8, 192 samples. The expected error rate for each input state is shown for all possible pairs in Fig. 3 . It is particularly notable from this figure that ever gate operation involving qubit 7 has a much larger error rate compared to others.
We use the measurement noise model to post-process the results shown in Fig. 3 and recover the underlying truth table that characterizes the cnot gate. This requires inverting the matrix in Eq. (14) to recover the gate probabilities shown in Fig. 7 . As expected, the errors in the transitions probabilities characterizing the truth table for cnot indicate a distribution of non-ideal average behavior across the device connections. Dependence on input state appears modest, though a lack of measurement precision and numerical instabilities produces nonphysical results for certain cases of the (1, 0) input state (and only this input state). We believe these discrepancies may be addressed with better averaging of the measurement models. We also note that those gates that include element 7 are prone to the largest errors, a behavior that is consistent with earlier observations for measurement. These results quantify the heterogeneous noise observed in a modern example of quantum processors and, within our error model, they emphasize that noise parameters may vary across sites and direction.
Simulation
Given insights into the noise present in current QPUs, we have tested the heuristics based on shortest path routing and shortest weighted path routing. We have tested these approaches using the connectivity of the IBM Tokyo processor shown in Fig. 2 along with a noise model that corresponds to three limiting different scenarios: 1) no noise, 2) homogeneous noise, and 3) heterogeneous noise. For the latter two cases, we assume noise distributions characterize the error rates for the swap operations between sites, in which the distributions are identical for the homogeneous case. For the heterogeneous case, we consider a specific instance in which all interactions with register element 7 have a much higher error rate as this corresponds with the observed behavior in the physical device. Our simulated noise model is based on a truncated normal distribution, i.e.,
Intuitively, the distribution represents a Gaussian with mean µ, standard deviation σ, and normalization constant N with support x ∈ [0, 1]. These parameters can be tuned to account for normally distributed behavior that provides a fair approximation to the swap errors expected in a physical device. In our three modeling scenarios, a given instance of the routing problem takes as input samples from the noise distributions for all pair operations and finds the shortest weighted path. Recall that the first scenario corresponds to the noiseless case. We generate an ensemble of such routing solutions with each corresponding to a different noise instance of the hardware, and we quantify the average routing length as well as objective costs. For our simulations, we use a uniform random distribution to model the noise of each swap operation, in which the range of the distribution is set by a maximum error rate. We increase the maximum error rate for each site, which increases the variance in the noise, and we monitor the change in routing length and cost.
We consider the instance of the routing program when the source element is 0 and the destination element is 19 as labeled in Fig. 2 . The shortest path between source and destination requires 5 swap operations as can be seen from inspection. However, by adding noise to the individual sites, we observe that the shortest weight path has a longer average length relative to the shortest path.
The average path length recovered using the shortest-weighted path routing is shown in the left panel of Fig. 5 . These results correspond to a truncated noise model with µ = 0.01 and varying standard deviation. For the range of σ ∈ (0., 0.2], the average path length is approximately 6 ± 1 for 200 simulated instances at each point. This should be compared to the constant noise case of σ = 0, which always recovers the shortest path with length 5. By comparison, the right panel of Fig. 5 shows the distribution of path lengths for two non-zero values of σ. Notably, the distributions in path length are approximately equivalent despite corresponding with different noise distributions.
The average length of shortest-weighted path is typically larger than the shortest path in the constant noise case. However, as shown in Fig. 6 , the cost of these longer paths yield a better route when compared to the shortest route. The left panel shows two distributions of the cost for the obtained shortest-weighted path relative to the cost of the shortest path over the same noisy hardware. The orange distribution corresponds to σ = 0.005, while the broader blue distribution is for σ = 0.01. The region less than zero indicates the frequency with which paths provide lower overall cost relative to the shortest path. It is apparent that the cumulative probability for a path to be cost efficient is greater with the larger values of σ. This trend can be seen more clearly in the right panel of Fig. 6 , which plots the average cost difference with respect to variance in the truncated noise model. Each point is averaged over an ensemble of 100 samples and there is a clear trend to negative cost difference. A similar plot for the heterogeneous case has a more pronounced drop in the cost difference owing to a 3x increase in the error rate at element 7.
DISCUSSION
We have presented heuristics for near-optimal routing of noisy quantum states based on a shortest weighted path objective. Our analysis has been motivated by the noise profile found with current quantum processors, as evidenced by our characterization of the IBM Tokyo device. We used a classical characterization method to quantify the noise profile and we have used simulated noise to test the routing heuristics. As compared to the baseline of shortest path routing, we find that routes in noisy quantum hardware can be more optimal when they account for variations in the gate error rates as decided by the cost difference. However, this advantage is not always found -for cases of constant or narrowly distributed noise parameters, the shortest path was generally more favorable in cost. The reason lies in the shortest weight path decision often taking longer routes due to greedy optimization. When the variance in noise is small, the likelihood of finding the shortest path is low even for the low-degree nodes used in our study. By contrast, when the variance is large, as in the case of the Tokyo device, near-optimal routing of noisy quantum states is obtained.
The further study of this problem will need to address the actual noise distribution that characterize real devices. Our choice of a truncated normal distribution may be intuitive, but we have yet to confirm it is a good representation of the observed noise. We have performed preliminary comparison with a uniform random noise model over truncated support, which demonstrate a weak difference in average path length compared to the truncated normal distribution. Studies with experimentally characterize noise will be informative. Finally, our cost objective based on the error rates e ij are only a proxy for the state fidelity, and an experimental study of routed states using tomographic reconstructions will provide further insight into the benefits of shortest weighted path routing. 
