Many physical implementations of quantum computers impose stringent memory constraints in which 2-qubit operations can only be performed between qubits which are nearest neighbours in a lattice or graph structure. Hence, before a computation can be run on such a device, it must be mapped onto the physical architecture. That is, logical qubits must be assigned physical locations in the quantum memory, and the circuit must be replaced by an equivalent one containing only operations between nearest neighbours. In this paper, we give a new technique for quantum circuit mapping (a.k.a. routing), based on Gaussian elimination constrained to certain optimal spanning trees called Steiner trees. We give a reference implementation of the technique for CNOT circuits and show that it significantly out-performs general-purpose routines on CNOT circuits. We then comment on how the technique can be extended straightforwardly to the synthesis of CNOT+Rz circuits and as a modification to a recently-proposed circuit simplification/extraction procedure for generic circuits based on the ZX-calculus.
Introduction
Quantum circuits give a de facto standard for representing quantum computations at a low level. They consist of sequences of primitive operations, called quantum gates, applied to a register of quantum bits, or qubits. Increasingly, noisy intermediate-scale quantum (NISQ) computers with 10-80 qubits are becoming a reality. Popular physical realisations such as superconducting quantum circuits [22, 13, 26] and ion traps [3, 2, 10, 11] consist of qubits stored in the physical states of systems arranged in space, where two-qubit operations are typically only possible between pairs of adjacent systems. Hence, when it comes to actually running a quantum computation on these architectures, logical qubits must be mapped to physical memory locations, and the circuit must be modified to only consist of 2-qubit operations between adjacent qubits in the physical architecture. Naïvely, this can be achieved by simply inserting swap gates to move a pair of qubits next to each other before each 2-qubit operation. However, this approach comes with an enormous overhead in terms of 2-qubit operations, each of which introduces a great deal more noise than a single qubit operation on most realistic architectures [2] . More sophisticated approaches incorporate techniques from computer aided design [27] and machine-learning [12] in order to minimise the extra operations needed by making good choices of initial and intermediate memory locations for the qubits involved. Nevertheless, these are simply refinements of the basic 'search and swap' approach. Most approaches only take the topological structure of the circuit into account (i.e. which qubits are being acted upon) rather than semantic structure (i.e. the unitary being implemented), and hence miss out on opportunities for more efficient circuit mapping.
We present a new approach to quantum circuit mapping based on constrained Gaussian elimination, and apply it in the simplest case of mapping CNOT circuits. The main idea is to modify a familiar technique for synthesising CNOT circuits using Guassian elimination in such that a way that primitive 3 mapping for NISQ.
Results
We work with a fixed set of randomly-generated CNOT circuits on 9, 16, and 20 qubits. The 9-qubit CNOT circuits have either 3, 5, 10, 20 or 30 gates, whereas the other circuits have either 4, 8, 16 , 32, 64, 128 or 256 gates. For each of these gate counts, we generated 20 different random circuits, yielding a test set of 380 random circuits.
We compared our algorithm to the QuilC [24] and t|ket [7] compilers. The former uses CZ gates as basic two-qubit gates instead of CNOT gates. Since a CNOT gate can be formed by conjugating the target bit of a CNOT gate with Hadamards, the amount of CZ gates can be compared directly to the amount of CNOT gates.
As architectures, we used a 9-qubit square grid, the 20-qubit IBM Q20 Tokyo, a 16-qubit square grid, the IBM QX5 and the Rigetti 16Q Aspen architectures. Their respective gate counts and percentage of added gates (overhead) can be found in Table 1 1 . For the genetic algorithm, we used different parameters depending on the size of the architecture. For the 9-qubit architecture, we used a population of 30 and 15 iterations. For the 16-qubit architectures, we used a population of 50 and 100 iterations. And for the 20-qubit architecture, we used a population of 100 and 100 iterations. The constant crossover and mutation probability had a constant value of 0.8 and 0.2, respectively. The values population and iteration were simply found by trial and error, and could probably be tuned further. A larger population and more iterations improves the chances of finding a better permutation, but increases the time that the algorithm needs to run. Our running times range from a few seconds to about a minute on a mid-range laptop computer from 2017.
Note that the overheads of our mapping process are sometimes negative. This is because the process we use computes the parity map associated to a CNOT circuit and re-synthesises the circuit using Gaussian elimination, as described in Sections 3.1 and 3.2. In the unconstrained case, Patel et al. [21] gave a heuristic for (re-)synthesising generic CNOT circuits on n qubits with O(n 2 / log(n)) CNOTs. Indeed, using their algorithm for 9-qubit circuits, we see in Fig. 1 that the average CNOT count stabilises around the asymptotic bound of 9 2 / log 2 (9) ≈ 25.6. Our approach also stabilises, but at a higher number of CNOTs (∼ 42 for 9 qubits). Interestingly, this is still well below the complexity of naïve unconstrained CNOT synthesis, which is O(n 2 ).
Methods

Background: Parity maps and Steiner trees
Our approach to CNOT mapping is based on re-synthesising the CNOT circuit from its corresponding parity map. By a parity map, we mean any reversible linear map on bitstrings. That is, we mean a bijective mapping from N-bitstrings to N-bitstrings where each bit in the output is a parity (i.e. XOR) of the input bits. It is a well-known fact that such maps exactly correspond to the action of CNOT circuits on computational basis states. It is therefore convenient to represent the action of a CNOT circuit on N qubits as an N × N matrix over GF (2) .
If we consider an arbitrary such parity map, it is straightforward to check that post-composing a CNOT gate with a control on the j-th qubit and the target on the k-th qubit has the overall effect of A plot of input vs. output CNOT gates when mapping on to a 9-qubit square grid, when computing and re-synthesising the circuit, both in the unconstrained case using the method described in [21] and respecting nearest-neighbour constraints using our method.
adding the j-th row to the k-th row:
Hence, there is an evident way to construct a CNOT circuit that realises an arbitrary parity matrix P.
Simply perform Gauss-Jordan elimination on P, post-composing CNOTs for each primitive row operation. In the end, we will obtain CP = 1, where C is a known CNOT circuit. Then, P = C −1 , where C −1 is obtained from C just by reversing the order of CNOT gates. In other words, in order to synthesise a CNOT circuit which realises parity map P, we simply perform Gauss-Jordan and store the primitive row operations used. Then the CNOT circuit corresponds exactly to that sequence of row operations, in reverse order. This technique, when combined with a simple heuristic for choosing appropriate row operations, is able to obtain asymptotically optimal CNOT realisations of a given parity map [21] . In this paper, we modify the question: how can we construct a CNOT realisation of an arbitrary parity map if only certain CNOT gates are allowed? For example, suppose we have 9 qubits arranged in a 3 × 3 grid, and we only wish to allow CNOTs between nearest neighbours. Clearly this problem is equivalent to asking: how do we perform Gauss-Jordan elimination on a given GF(2)-matrix, when only certain primitive row operations are allowed?
Clearly, our only recourse is to use more row operations in order to allow distant rows to essentially be added together via some intermediate steps. We present a strategy for doing this based on special kinds of spanning trees called Steiner trees.
Note that, by a graph we always mean an indirected, simple graph with no self-loops, by a tree we mean a graph with no cycles, and by a root tree we mean a tree with a chosen vertex called the root. For rooted trees, we use the standard terminology of parent and child to denote vertices adjacent to a given vertex which are closer to or farther from the root, respectively. Definition 3.1. For a graph G and a subset S of the vertices V G of G, a Steiner tree T is a minimal subgraph of G that is furthermore a tree and has the property that S ⊆ V T .
Computing Steiner trees is NP-hard in general and the related decision problem is NP-complete [14] . Indeed, the Steiner tree problem can be seen as a generalisation of the travelling salesperson problem, which allows an arbitrary tree to span a set of vertices rather than a single path, which can be seen as a tree with no branching. In practice, we will not need exactly optimal Steiner trees for our strategy to work, and many efficient heuristics exist for computing approximate Steiner trees [23, 4] . For our purposes, we use a very simple heuristic based on the Floyd-Warshall shortest-path algorith [6] and minimal spanning trees.
A useful refinement of Steiner trees will be the following notion.
Definition 3.2. For a graph G, a total ordering ≤ of the vertices V G , and a subset S ⊆ V G , a decreasing Steiner tree T is a minimal rooted subtree of G such that S ⊆ V T and every vertex in T is larger its children with respect to ≤.
Constrained CNOT circuit extraction
In this section, we will describe the algorithm STEINER-GAUSS, which performs Gauss-Jordan elimination of a parity matrix using only nearest-neighbour row operations for a given graph G. Consequently, this procedure can be used to synthesise a CNOT circuit implementing a given parity map using only nearest-neighbour CNOTs. The algorithm itself consists of two phases, STEINER-DOWN and STEINER-UP, which respectively produce an upper triangular matrix and produce the identity matrix from an upper triangular matrix. Initially, we will consider only graphs G which have a Hamiltonian path, i.e. graphs G which contain a connected path P that visits each of the vertices in G exactly once. For example, the 3 × 3 grid:
has a Hamiltonian path given by [0, 1, 2, . . . , 8]. We will assume this path also provides a total ordering ≤ on the vertices of G. We will remove the assumption of a Hamiltonian path in the next section, providing a recursive algorithm capable of handling arbitrary graphs. We begin by labelling the rows of our parity map P by the vertices of the constraint graph G. The first stage of our algorithm, STEINER-DOWN, computes an upper triangular matrix. To do this, we wish to remove the non-zero elements below the diagonal. We do this one column at a time, starting with column k := 0 and proceeding left to right. Let S be the set containing k itself, as well as all of the vertices j such that j > k and P jk = 1. That is, S contains the diagonal element and all of the rows which contain 1s below the diagonal. For example, in the following parity map, S = {0, 2, 7}: These vertices are not adjacent in the graph, hence when we compute the Steiner tree T containing S, we get some extra vertices, corresponding to rows that have 0s below the diagonal: (1)
These extra vertices which need to be added to get a spanning tree are sometimes called Steiner points.
We consider the numbers in boxes above as decorating the corresponding vertices of the Steiner tree T . Initially there some 0s in the Steiner tree corresponding to Steiner points (and possibly the diagonal element), so we first 'fill' the Steiner tree. That is, we add a row with a 1 to any neighbouring row in T with a 0. Since the tree is connected, after finitely many iterations this will propagate 1s into every location in T : 
After that, we can 'empty' the Steiner tree by setting every location except for the diagonal to zero. We do this by regarding the diagonal as the root of the tree. For each leaf v in T with parent w, perform the row operation R v := R v + R w , then remove v from T . This terminates when there is only one vertex left in T , the root. 
Since we only perform row operations along edges of T , which are a subset of the edges of G, the corresponding CNOTs in the circuit we synthesise will only be between neighbouring qubits. For example, the six row operations above (read from right to left) yield the following part of a CNOT circuit:
Note that in this phase, we have some freedom to choose which row operations to perform. Here we have taken a greedy strategy for maximising the number of row operations that can be done in parallel.
Having put the first column in upper triangular form, we delete the corresponding root vertice from G and then proceed to the next column, building up our CNOT circuit from right-to-left. Since we proceed in order along a Hamiltonian path, the graph never becomes disconnected and always has a Hamiltonian path. Hence it is always possible to find a Steiner tree for any combination of points. STEINER-DOWN terminates after we remove the last vertex from G with a matrix in upper triangular form.
The second stage of our algorithm, STEINER-UP, starts with the original graph G and a parity map P in upper triangular form and removes any 1s that are above the diagonal, yielding the reduced echelon form (which in our case is always an identity matrix). It works in almost the same way, except that some care must be taken not to destroy the existing upper triangular structure. To do this, we must always perform decreasing row operations. That is, we must perform operations of the form R j := R j + R i only when j < i. This is where Definition 3.2 comes in. Starting with the last column, let the set S consist of the diagonal element and the rows which contain 1s above the diagonal. We then compute a decreasing Steiner tree for S whose root is the diagonal element. This is always possible, because in the worse case we can just take the Hamiltonian path for T , but in general we can take shortcuts. For example: 1 1 1 1 0 0 0 0 1 1 0 1 1 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 1 1 1 0 1 Here, we have S = {2, 4, 8} and T has vertices {2, 3, 4, 7, 8}. Note there is indeed a smaller Steiner tree which does not contain vertex 7, but in that case 4 would need to be a child of 3, hence it is not a decreasing Steiner tree. Once we have a decreasing Steiner tree, we can 'fill' the tree with 1s by decreasing row operations. This is always possible since at this stage, the root always contains a 1 and all edges from parents to children are decreasing. We can then 'empty' the tree again just as we did in STEINER-DOWN, noting that every row operation at this stage is applied from a parent to its child (and hence is decreasing). The resulting row operations in the above example are thus: 
We can therefore prepend the following section to our CNOT circuit:
Once a column is done, we can delete it from G and take the next highest column. Again, since we traverse along a Hamiltonian path (backwards this time), the graph G never becomes disconnected and always has a Hamiltonian path. STEINER-UP then terminates with P equal to the identity matrix. The corresponding CNOT circuit then implements the original parity map using only nearest-neighbour CNOT gates.
Extensions
Mapping to general graphs
We can extend to graphs that do not contain a Hamiltonian path by using a recursive version of the algorithm described in Section 3.2. For this, we will define parametrised versions, STEINER-DOWN(k,V ) and STEINER-UP (k,V,W ), of the procedures from Section 3.2. In both cases, the procedures now take a parameter k which gives the current vertex/column on which to perform the downward or upward Gaussian elimination, respectively. They are also given a set V ⊆ V G over which all operations are restricted. Hence, rather than progressively deleting vertices from G, we start with V := V G and progressively remove vertices from V . The second procedure, STEINER-UP, also has a parameter W ⊆ V where the Steiner tree and row operations are allowed to be non-descending. That is, all edges in the rooted Steiner tree T must be descending unless both vertices adjacent to that edge are in W . Similarly, we require that row operations should be descending unless the rows corresponding to both vertices are in W .
Let G be an undirected graph and R a spanning tree. Fix some leaf of R and number the vertices consecutively by depth-first traversal (DFT), ensuring that each vertex is labelled after its children (i.e. post-ordering). We then choose vertex 0 as the root of R. Such a numbering has the property that removing vertices in ascending order never results in a disconnected graph. Here are some examples: Note that, since we use a post-ordering, the starting point of the DFT will not end up as the root of R. In the graphs above, the starting points for the DFT end up labelled 10 and 20, respectively, whereas the root of R is always taken to be 0. For a parity map P whose rows are labelled by the vertices of G, the recursive version of the previous procedure, called STEINER-GAUSS-REC, is defined as follows:
2. If R is empty, we are done. Otherwise pick the maximal vertex k ∈ R and maximal leaf k ′ ∈ R.
3. Let W be the set of vertices in the shortest path from k ′ to k (inclusive). Apply STEINER-UP (k ′ ,V R ,W ).
4. Apply STEINER-GAUSS-REC on the subgraph of G restricted to W .
5. Remove k ′ from R and go to 2.
Note that, when the spanning tree R is actually a Hamiltonian path, it will always be the case that k = k ′ , hence the recursive part is trivial and it reduces to the algorithm described in Section 3.2. If k is not a leaf, note that, because of our choice of numbering, the set W is maximal in V R with respect to ≤. This corresponds to a block matrix B in the bottom-right corner of the region of P which is not yet 'finished'. That is, P decomposes as:
Note there are zeroes to the left of B because step 1 made P upper-triangular.
Step 3 makes everything above P k ′ ,k ′ into 0, but it might mess up the upper-triangular structure of B itself, since we allow non-decreasing row operations between vertices in W . Hence, the recursive call in step 4 fixes B up afterwards. At this point, the k ′ -th row and column only contain a single 1 and since it is a leaf, k ′ will not be needed for another Steiner tree. Hence it is 'finished', so we can remove it from R. Note that, in successive iterations of this procedure, B might not necessarily consist of consecutive rows/columns, but it will always contain all of the maximal 'unfinished' rows/columns in R. Hence, the reasoning is identical.
Since each iteration of steps 2-5 remove a vertex from R and the recursive call in step 4 is always restricted to a proper subgraph of the current graph, the procedure terminates when R is empty and P is in reduced echelon form.
Mapping from general circuits
We can also extend the technique we proposed for CNOT circuits to more general families of circuits: namely CNOT+Rz circuits and even generic Clifford+Rz circuits. We first explain the simpler case, as it does not require going outside of the circuit model. By Rz gates, we mean a Z-phase rotation by a generic phase α:
Certain special cases have common names in quantum circuit literature, e.g.
]. As explained in e.g. [1] , circuits consisting of CNOT gates and arbitrary Z-phase rotations can be described efficiently in two parts: a GF(2)-linear map describing the action of the circuit on basis states and a set of pairs of the form (α, v) where α ∈ [0, 2π) is an angle and v is a vector in GF(2) describing the parity of input states on which that angle is applied.
First, note that it is straightforward to efficiently calculate the behaviour of CNOT+Rz circuits on computational basis states. One simply labels the input wires by variables (x 0 , . . . , x n−1 ) and propagates these labels from left-to-right using the rule that R Z [α] does not change the labels, whereas CNOT sends labels (x, y) on its inputs to labels (x, x ⊕ y) on its outputs:
We can then read off the output basis state from the final labels on the wires and the parities associated with each phase from the wire that the phase gate is on. For example, the circuit above acts as follows on computational basis states:
for all x i ∈ {0, 1}. The expression φ in e iφ above dictates how the phase depends on the input basis state, and is sometimes referred to as a phase polynomial. Clearly the relevant data for the overall unitary is the (GF(2)-linear) action on basis states as well as this phase polynomial. We can then represent a parity of input variables as a bitstring, e.g. x 0 ⊕ x 2 ⊕ x 3 is the bitstring (1, 0, 1, 1) , indicating this parity depends on the first, third, and forth input variables. Hence data associated with the map (2) is a parity matrix P action on basis states and a set of angle/vector pairs P giving the terms of the phase polynomial: Much like the CNOT case, there is an evident circuit extraction procedure based on Gaussian elimination for CNOT+Rz maps. One can use CNOT gates to perform primitive row operations on sets of parity vectors to reduce them to unit vectors, which then correspond to applying R Z [α] gates on single qubits. If this Gaussian elimination is performed using the STEINER-GAUSS procedure, these primitive row operations, and hence the synthesised circuit, will respect nearest-neighbour constraints, just as in the CNOT case.
A very similar procedure based on Gaussian elimination is described in a recent article by one of the authors [8] as a means of extracting a quantum circuit from a simplified tensor-network-like representation called a ZX-diagram. The extraction phase proceeds from right-to-left on a directed acyclic graph, and exploits the fact that post-composing CNOT gates is able to perform primitive row operations on the adjacency matrix of the graph:
. . . This rule follows from a graph-theoretic transformation on ZX-diagrams called pivoting. While the details of how this actually works are not relevant here, an important observation is that this procedure makes use of Gaussian elimination to produce CNOT gates, so substituting the STEINER-GAUSS algorithm immediately gives a circuit extraction procedure that respects nearest-neighbour constraints of the architecture. Since the technique described in [8] , and the corresponding PyZX circuit optimisation tool [15] , take an arbitrary quantum circuit as input, this will give a general-purpose, optimising routine for circuit mapping.
We leave a detailed exposition (and implementation) of these techniques for future work.
