A polynomial size model with implicit SWAP gate counting for exact qubit
  reordering by Mulderij, Jesse et al.
ar
X
iv
:2
00
9.
08
74
8v
1 
 [q
ua
nt-
ph
]  
18
 Se
p 2
02
0
A polynomial size model with implicit SWAP gate counting
for exact qubit reordering
J. Mulderij∗† K.I. Aardal∗ I. Chiscop† F. Phillipson†
email: j.mulderij@tudelft.nl
Abstract
Due to the physics behind quantum computing, quantum circuit designers must adhere to the
constraints posed by the limited interaction distance of qubits. Existing circuits need therefore
to be modified via the insertion of SWAP gates, which alter the qubit order by interchanging the
location of two qubits’ quantum states. We consider the Nearest Neighbor Compliance problem
on a linear array, where the number of required SWAP gates is to be minimized. We introduce
an Integer Linear Programming model of the problem of which the size scales polynomially in the
number of qubits and gates. Furthermore, we solve 131 benchmark instances to optimality using
the commercial solver CPLEX. The benchmark instances are substantially larger in comparison
to those evaluated with exact methods before. The largest circuits contain up to 18 qubits or over
100 quantum gates. This formulation also seems to be suitable for developing heuristic methods
since (near) optimal solutions are discovered quickly in the search process.
1 Introduction
The rules that govern physical interactions in a quantum setting allow quantum computing to provide
algorithms with a better complexity scaling than their classical counterparts for many naturally arising
problems. Exploiting the properties of phenomena such as superposition and entanglement, one can
search in a database [19], factor integers [46] or estimate a phase [39] more efficiently than previously
possible.
The many advantages of quantum computing come at the price of physical limitations in circuit design.
First, relevant coherence times (what is relevant depends on the technology) indicate that information
on qubits is perturbed or even lost after some time due to a qubit’s interaction with its environment
[12]. It is therefore, for a fixed number of qubits, desirable to do calculations with as few gates as
possible. A second limitation is induced by nearest neighbor constraints, where 2-qubit quantum gates
can only be used when the qubits are physically adjacent. The nearest neighbor constraints have
been considered in proposals for a range of potential technological realizations of quantum computers
such as ion traps [4, 32, 38], nitrogen-vacancy centers in diamonds [38, 55], quantum dots emitting
linear cluster states linked by linear optics [10, 21], laser manipulated quantum dots in a cavity [26]
and superconducting qubits [13, 40, 34]. They are also considered in realizations of specific types
of circuits and architectures, such as surface codes [49], Shor’s algorithm [15], the Quantum Fourier
Transform (QFT) [48], circuits for modular multiplication and exponentiation [35], quantum adders
on the 2D NTC architecture [8], factoring [42], fault-tolerant circuits [33], error correction [16], and
more recently, IBM QX architectures [50, 56, 57, 14].
∗Faculty of Electrical Engineering, Mathematics & Computer Science, Delft University of Technology, Delft, The
Netherlands
†Cyber Security & Robustness Department, TNO, The Hague, The Netherlands
1
Up to now, the design of quantum circuits consists of manual work in elementary cases and for specific
circuits. As the complexity of the algorithms increases, however, manual synthesis will no longer
be feasible. When constructing a circuit from scratch, using only the set of elementary gates, even
without considering nearest neighbor constraints, one is solving specific instances of the PSPACE-
complete Minimum Generator Sequence problem [25], where the group consists of all unitary matrices
and the elementary gate operations form the set of generators. Here one tries to find the shortest
sequence of generators to map an input to a given output. A lot of work was done in this area using
boolean satisfiability [18], template matching [43, 36] and methods for reversible circuits [53, 3] as all
quantum gates perform unitary operations [39]. Other methods consider already designed circuits that
do not comply with nearest neighbor constraints. In these approaches, SWAP gates, which swap the
information of two adjacent qubits, are inserted into the circuit. The goal herein is to minimize the
number of required SWAP gates to make the whole circuit compliant. Within this branch of research
there are two approaches to the topic, global and local reordering. Global reordering determines the
initial layout of the qubits such that there are as few SWAP gates as possible required in the remainder
of the circuit. In order to elude the micromanagement that local reordering is concerned with, the
global reordering problem is generally approximated with the NP-complete [17] problem of Optimal
Linear Arrangement (OLA) on the interaction graph of the circuit with edge weights taking the Nearest
Neighbor Cost [31]. Here the gate sequence is either disregarded [45] or encoded in the weights [30].
The local reordering problem allows for any change in the qubit order before each gate, resulting in
a vast feasible region, even for small instances. The more general problem of SWAP minimization
where qubits are placed on a coupling graph (2 qubits can share a gate if their corresponding nodes
share an edge) is shown to be NP-Complete [47] via a reduction from the NP-complete token swapping
problem [27, 6]. The problem we consider, where the graph is a simple path, is widely believed to
be NP-complete (as conjectured in [22]) but to the best of the authors’ knowledge, no formal proof is
given yet.
Four research areas are distinguished in [23], each corresponding to either local or global reordering
and to either a single quantum computer or a network thereof. In [23], the focus lies on networks of
quantum computers, relating to the field of distributed quantum computing. This work proposes a
new model for the local reordering problem on a single quantum computer. Many heuristics have been
developed in this area of research, including receding horizon [29, 44, 52, 22], greedy [22, 1], harmony
search [1] and OLA on parts of the circuit [41]. Only a few works have dared to approach the problem
with exact methods, all of which embody an explicit factorial scaling in the number of variables or
processed nodes either through the use of the adjacent transposition graph [37], exhaustive searches
[11, 22] or explicit cost enumeration for each permutation [54]. The exact approaches have delivered
small benchmark instances to compare the heuristics’ results to. The size of these benchmark instances
typically does not exceed circuits of about 5 qubits and 16 gates due to the vast scaling of the number
of variables in the optimization model.
In this work we will provide an exact Integer Linear Programming (ILP) formulation of the Nearest
Neighborhor Compliance (NNC) problem that does not entail a factorial scaling in the number of
qubits, by implicitly counting the number of required SWAP gates at each reordering step. The
power of the commercial optimization solver CPLEX [9] is used to optimally solve the problem for
123 instances from the RevLib library [51] and 8 QFT circuits. The considered benchmark instances
include the largest circuits to be exactly solved up to this point. They include the QFT for 10 qubits
and even a circuit with 18 qubits. The evaluation of the bigger benchmark instances finally allows for
heuristics to be compared to exact solutions on larger circuits.
The remainder of this paper is structured as follows. In Sec. 2 we introduce basic concepts of quantum
computing. In Sec. 3 the problem of NNC is formulated. Next, in Sec. 4, the proposed mathematical
model is introduced. The results are presented and discussed in Sec. 5. Finally, conclusions are drawn
in Sec. 6.
2
q1 • q1
q2 q2 ⊕ q1
Figure 1: A CNOT gate. Qubit q1 is the control qubit and q2 is the target qubit.
2 Background
In this section we will first introduce some basic concepts of quantum computing. A more detailed
explanation can be found in [39]. Then, a description of decomposing multi-qubit gates is given.
2.1 Building blocks of QC
The quantum version of the classical basic unit of computation, the bit, is the quantum bit (qubit).
The qubit has the special property that it does not have to take value 0 or 1, but it can be in a
superposition of the computational basis states |0〉 ≡ [1, 0]T and |1〉 ≡ [0, 1]T . The state of a qubit |φ〉
is denoted by a vector in C2 where in general we write
|φ〉 = α|0〉+ β|1〉, (1)
where α, β ∈ C. When information about the state’s value is extracted by the means of measurement,
the state collapses to a single value. If, for example, the measurement is done in the standard basis,
one would obtain |0〉 with probability |α|2 and |1〉 with probability |β|2. Necessarily, |α|2 + |β|2 = 1.
In n-qubit systems, the combined state is the tensor product of individual states, which is an element
of C2
n
. Calculations are done by executing quantum circuits, which consist of a set of qubits and a
list of quantum gates. The initial qubit states are the input of the calculation. The gates operate, in
order, on specified qubits. Afterwards, a measurement is performed on one or more of the qubits to
determine the probabilistic outcome of the calculation. Quantum gates are inherently reversible and
are denoted by linear operators in the form of invertible matrices. Their action on the combined qubit
state is simply the matrix vector product.
Below we will introduce some of the most common quantum gates, starting with the controlled NOT
(CNOT) gate, see Fig. 1.
The controlled CNOT gate is one of the most commonly used gates. It is also used to construct the
SWAP gate by placing three CNOT gates consecutively such as in Fig. 2.
q1 • • q2
q2 • q1
⇐⇒
q1 × q2
q2 × q1
Figure 2: A decomposed and composite SWAP gate. The operations are equivalent, the gates inter-
change the states of two qubits.
The SWAP gate will be the main tool used to overcome the physical constraints that limit quantum
circuit design. We will, for the remainder of this text, not make a distinction between interchanging
two qubits and interchanging the quantum states of two qubits.
2.2 Decomposing multi-qubit gates
Many circuits make use of composite gates that resemble entire circuits themselves. They are often
performed on more than two qubits at once, take for example the quantum Fourier Transform (QFT),
3
which can act on any number of qubits. In order to describe what it means for a gate to act on adjacent
qubits, it only makes sense to consider 2-qubit gates. To achieve this without losing the meaning of
the circuit, we have to do a modification in the following two cases:
1. Gates that only act on a single qubit are ignored for the rest of this research. These gates are of
no interest in this context.
2. Gates that act on more than two qubits are decomposed into 2-qubit gates. The fact that this
is always possible can be found in [39].
The second point can be implemented in a great variety of ways and doing this “optimally" is outside
the scope of this work. We therefore make two straightforward design choices: 1) We only consider
circuits using multiple-control Toffoli gates, Peres gates and multiple-control Fredkin gates up to a
certain size; 2) We always decompose a given circuit in the same way. There is clearly room for
improvement here, but the search space we consider is large enough as it is. We ignore all single-
qubit gates during the modification to a nearest neighbor compliant circuit. The normal Toffoli gate’s
decomposition, with two control qubits can be found in [5] in the section “Three-Bit Networks”. In
the same work, the decomposition of a 3-control Toffoli gate is shown in the section “n-Bit Networks”.
The decomposition of the 4-control Toffoli is the direct extension of the previous decompositions.
The Peres gate is decomposed as in the circuit “peres_8.real” from RevLib [51]. The Fredkin gate is
decomposed as in the circuit “fredkin_5.real", also from RevLib. The two-qubit controlled Fredkin
gate is decomposed into a controlled-NOT gate, a Toffoli gate and another controlled-NOT gate as
shown by [3] in Fig 2.4c. Larger composed gates do not make an appearance in the circuits that are
considered in this work.
q0 •
q1 •
q2 •
q3
→
q0 • • • • •
q1 • • • •
q2 • • • •
q3 V V † V V † V V † V
Figure 3: The decomposition of a Toffoli gate with 3 control qubits and one target qubit into only
2-qubit gates. Here we have V 4 = X , where X is the usual Pauli-X gate.
3 Problem Definition
In this section, some basic definitions will be introduced in order to formalize the NNC problem.
For the NNC problem, the actual operation corresponding to a gate that is being used has no influence
on the problem. Only the qubits on which the gate acts matter. Some definitions are introduced
below. Denote the set Q of n qubits as the set of integers Q = {1, . . . , n}. Since all the qubits have
one physical location in a one dimensional array, the locations are numbered as L = (1, . . . , n) and are
in a fixed order. To keep track of the location of each qubit before every gate, the notion of a qubit
order will be introduced.
Definition 3.1. Let Sn be the permutation group and [n] the vector (1, . . . , n). Then a qubit order
is a permutation denoted by the vector τ([n]) with τ ∈ Sn, which maps the qubits to locations. We
call τ t the qubit order before gate t.
Now that the qubit orders are defined, one needs a way of altering such an order. This is done via the
previously mentioned SWAP gates.
4
Definition 3.2. A SWAP gate is an adjacent transposition τ ∈ Sn that permutes a qubit order,
τ ◦ (q1, . . . , qi, qi+1, . . . , qn) = (q1, . . . , qi+1, qi, . . . , qn), by interchanging the positions of two adjacent
qubits.
The number of SWAP gates that one minimally requires to “move” from one qubit order to another is
inherently equal to the Kendall tau distance between the corresponding permutations.
Definition 3.3. Given two permutations τ1, τ2 ∈ Sn for some fixed n, the Kendall tau distance
between τ1 and τ2 is defined as
I(τ1, τ2) ≡ | {(i, j) | 1 ≤ i, j ≤ n, τ1(i) < τ1(j), τ2(i) > τ2(j)} |. (2)
This metric counts the number of inversions between two orderings τ of items. It states that the
number of adjacent transpositions required to sort the array is equal to the number of inversions in
the array. The nearest neighbor interaction constraints can only be formulated once the concept of
quantum gates has been properly introduced in this setting.
Definition 3.4. Let qi, qj ∈ Q be two qubits such that i 6= j. Let gij be an unordered pair g = {qi, qj}.
Then we say that gij is a quantum gate, or simply a gate, that acts on qubits qi and qj . When the
specific qubits do not matter in the context, the subscripts may be omitted. When multiple gates are
present and their order is important, this will be reflected with a superscript as gt.
Please note that this definition only allows for quantum gates that act on pairs of qubits. If a gate (in
the more general sense) acts on more qubits, we assume it to be decomposed, whilst if it only works
on one qubit, the gate can be ignored.
To describe an entire quantum circuit, multiple gates are needed and their order is important. To this
end, a gate sequence is introduced.
Definition 3.5. Let g1, . . . , gm be m gates. Let G be the finite sequence of gates G = (g1, . . . , gm),
then we say G is a gate sequence of size m.
We also assume the gate sequence to be given and fixed. Allowing changes in the gate order when
some commutative rules are satisfied, as was done in [37, 24, 20], is beyond the scope of this work.
Now we can introduce the concept of a quantum circuit more formally.
Definition 3.6. Let Q be the set of qubits and G be a gate sequence. Let QC be a tuple of the set
of qubits and the gate sequence QC = (Q,G). Then we say that QC is a quantum circuit.
At the core of the problem are the nearest neighbor (NN) constraints. Formalizing these requires a
number of the above definitions. These constraints are what make the problem difficult.
Definition 3.7. Given are a gate gtij and a qubit order τ
t before that gate. We say that the gate
complies with the NN constraints if |τ(i) − τ(j)| = 1, i.e., if the qubits on which the gate acts are
adjacent in the qubit order. If, given a qubit order for each gate, all the gates in a quantum circuit’s
gate sequence comply with the NN constraints, we say that the quantum circuit complies with the NN
constraints.
Now that all these concepts have been formalized, we can continue with defining the problem of NNC.
Nearest Neighbor Compliance Problem
Input: A quantum circuit QC = (Q,G) with |Q| = n qubits and |G| = m gates and an integer
k ∈ Z≥0.
Question: Do there exist qubit orders τ t, t ∈ [m], one before each gate of QC, such that
the sum of the Kendall tau distances between consecutive qubit orders satisfies∑m−1
t=1 I(τ
t, τ t+1) ≤ k and such that the quantum circuit complies with the NN con-
straints?
5
In the minimization version of the problem, which we model in the next section, we seek to find the
smallest integer k such that Problem 1 is still answered affirmatively. Considering the problem in this
way, we do not require the qubits to end up in the same qubit order as they started out in. We also
do not allow for changes in the gate order and do not optimize over different ways of decomposing
multi-qubit quantum gates. The objective function in the minimization problem simply counts the
number of required SWAP gates.
Note that calculating the Kendall tau distance between two permutations can be naively done in O(n2)
time, following the steps of the bubble sort algorithm [28]. A faster computation of the distance, in
O(n√logn) time, can be found in [7].
We will however not be concerned with explicitly listing the Kendall tau distances for all n! permu-
tations. In order to avoid the listing, the metric should be implicitly calculated in the model. The
objective function, variables and constraints that allow us to do so, will be introduced in the next
section.
4 Mathematical Model
In this section the proposed ILP formulation of the NNC minimization problem will be discussed in
detail. First, the variables and constraints are presented and explained. Finally, the complete model
is given, along with a linearization of the constraints.
Given a quantum circuit QC = (Q,G), we introduce integer variables xti ∈ L = {1, . . . , n} for the
location of each qubit i ∈ Q before each gate gt ∈ G. Since the goal is to avoid the explicit n! scaling
in the number of variables and constraints, we make use of the Kendall tau metric to count the number
of required SWAP gates when going from one qubit order τ t to the next τ t+1. To accomplish this,
keeping track of the pairwise order of the qubits is essential. We introduce binary variables to do
precisely this,
ytij =
{
1 if location xti is before location x
t
j in qubit order τ
t
0 else.
(3)
The y-variables are only defined for i < j, so that every pair of qubits is only compared once. Keeping
track of changes in the y-variables when moving from one qubit order to the next allows us to count
the number of SWAP gates needed. The x- and y-variables are related through the following big-M
type constraints,
xti − xtj ≤Mytij − 1 ∀i, j ∈ Q, i < j, t ∈ [m] (4)
xtj − xti ≤M(1− ytij)− 1 ∀i, j ∈ Q, i < j, t ∈ [m] (5)
whereM is a big enough constant,M = (n+1) being sufficient in this case. Note that these constraints
also enforce two important features:
1. No two qubits can be at the same location at the same time.
2. The definition of the y variables is enforced by the constraints.
For fixed i, j and t, one of the two constraints is always trivially satisfied due to the large value of
M . The −1 term in the right-hand side even ensures that the location indices differ by at least one
from each other. This allows us, later on, to relax the x variables to be continuous without losing the
property that feasible solutions have integer x variables.
6
To make sure that the result also complies with the NN constraints, the following constraints need to
be added:
−1 ≤ xti − xtj ≤ 1 ∀gtij ∈ G (6)
For each gate that acts on qubits qi and qj , the qubit order that is assumed just before the gate, it is
required to have the qubits in adjacent locations.
The objective is to minimize the total number of absolute changes in the y variables,
min
∑
i,j∈Q
i<j
∑
t∈[m−1]
|ytij − yt+1ij |. (7)
Notice that the objective function exactly computes the Kendall tau distance between every two
consecutive qubit orders. Currently, the objective function is not linear. Extra binary variables ktij are
introduced to linearize the objective function. These substitute |ytij − yt+1ij | in the objective function
and are constrained in the following manner
−ktij ≤ ytij − yt+1ij ≤ ktij ∀i, j ∈ Q, i < j, t ∈ [m− 1] (8)
Now the k-variables can be substituted into Expression (7), which, together with the constraints, result
in the ILP model:
min
∑
i,j∈Q
i<j
∑
t∈[m−1]
ktij
subject to (4), (5), (6), (8)
xti ∈ {1, . . . , n} ∀i ∈ Q, t ∈ [m]
ytij ∈ {0, 1} ∀i, j ∈ Q, i < j, t ∈ [m]
ktij ∈ {0, 1} ∀i, j ∈ Q, i < j, t ∈ [m− 1]
(9)
Simply counting the number of variables in this formulation gives
# variables = n2m− n
2 − n
2
, (10)
and the number of constraints in the ILP is equal to
# constraints = 2(n2 − n)m− n2 + n+ 2m, (11)
which is polynomial in the number of qubits and gates. In order to improve running times in practice,
it helps to relax variables to take continuous values. We state the following about this relaxation:
Proposition 4.1. Allowing the x- and k-variables to take continuous values does not change the
optimal value.
Proof. The x-variables must take values that are pairwise separated from each other by at least 1 due
to constraints (4), (5). There are n variables that all have to take a value in a connected interval of
length n, all spaced at least 1 from each other. This can only be done if the x’s are all integer and
all integer values are taken. The k-variables are constrained by (8). Since the y-variables are binary,
their difference is also binary (or −1, in which case k = 0 is allowed). Since we are minimizing over
the k-variables, their value will always assume the smallest possible allowed value by the constraints,
which is integer.
7
Even though relaxing these variables does not impact the objective value of optimal solutions, it reduces
the number of integer-restricted variables which improves the running time in practice. This stems
from the underlying fact that it is an NP-complete problem to find an optimal solution to a general
ILP, while doing so for a linear program (LP) is polynomially solvable with interior point methods.
5 Experimental Results
In this section, the results of evaluating the proposed ILP model are presented. The time of finding
the optimal solution in the proposed ILP model is compared to the time required by the previous best
exact approaches. The attained objective value of a multitude of heuristic approaches is also compared
with the solution that our model provides.
5.1 Experimental setup
The mathematical model as described in the previous section has been implemented in Python and
solved with the commercial solver CPLEX 12.7 through the Python API. All but the quantum Fourier
transform instances, which were constructed following the circuit of [39], were obtained form the RevLib
[51] website. The evaluations were conducted using up to 16 threads of 2.4 GHz each, working with
16 GB of RAM. All instances were solved to optimality.
The benchmark instances are subdivided over three tables, according to the number of qubits addressed.
In the first column of each table, the name of the circuit is provided, and in the second column, n
denotes the number of qubits in the circuit. In the third column, |G| denotes the number of 2-qubit
gates present in the circuit after gate decomposition and the removal of single-qubit gates. The optimal
value of the local reordering problem, i.e., the minimum number of needed SWAP gates to make the
circuit nearest neighbor compliant, is provided in the fourth column. The column “Time” denotes the
run time in seconds. The column entitled “Time E” denotes the running time of other exact methods,
also in seconds. Exact running times with subscript a are from [54], subscript b from [37]. Heuristic
solution’s objective values are presented in the last column, denoted by “# SWAPS H”. Here the
subscript c indicates the results are from [30], subscript d from [44], subscript e from [2], subscript
f from [29] and subscript g from [52]. An asterisk as superscript indicates that for the other exact
solution methods, either the objective value differs, or the number of gates differs or they both differ.
For the heuristic results, the asterisk indicates that the number of gates differs or the objective value
of the heuristic is lower than that of the proposed exact method. These anomalies are believed to find
their roots in differing gate decomposition methods, resulting in slightly different instances.
5.2 Results
The running time required to solve the instance is heavily dependent on three factors:
1. The number of qubits in the quantum circuit,
2. The number of gates in the quantum circuit,
3. The minimal number of required SWAP gates.
The number of qubits and gates is expected to heavily influence the running time. The number of
qubits is the term that influences the run time the most. This is due to the fact that the number of
feasible solutions scales factorially in the number of qubits. Surprisingly, the run time also scales quite
badly with the number of required SWAP gates. During the Branch & Bound tree search, the upper
bound determined by CPLEX, which is the best feasible solution found up to that point, converges to
8
the optimal value (or close to it) rather quickly. The best known lower bound, however, takes a long
time to improve. When the number of required SWAP gates increases, the time needed to improve
the lower bound all the way to the optimal value increases as well. This phenomenon is analyzed for
two of the benchmark instances that require a lot of SWAP gates:
1. mod8-10_177 The search method found a feasible solution with an objective value within 10%
of the optimal value in 2.4 · 106 iterations, found an optimal solution in 4.0 · 107 iterations, and
proved optimality by a matching lower bound after 1.7 · 108 iterations.
2. decod24-enable_126 The search method found a feasible solution with an objective value
within 10% of the optimal value in 5.3 · 106 iterations, found an optimal solution in 1.0 · 107
iterations, and proved optimality by a matching lower bound after 5.8 · 107 iterations.
If a 10% optimality gap would suffice, only less than 2% of the total number of iterations would
be needed in the first case, and 10% in the second case. This observation indicates that running
an incomplete Branch and Bound algorithm might be an interesting and easy-to-implement heuristic
algorithm.
The 131 evaluated benchmark instances are listed in the tables below. The improvement in compu-
tation time with respect to previous exact methods is significant. The results have been compared
to other exact and heuristic methods. There is no standard set of benchmark instances, so not every
benchmark instance has been evaluated with every method. Sometimes methods slightly differ in the
problem they are solving by allowing alterations of the quantum circuit as a prepossessing step for
example. The latter point may result in slightly differing optimal solutions, this is indicated with an
asterisk in the tables. In the table, the running time of solving our integer linear programming model
is compared to the running time of the other exact solution methods.
Other exact solution methods can only solve the smaller circuits, making it impossible to compare
their performance as the circuit size increases, apart from the binary statement that our method can
indeed solve the instance. The results show exact solutions that are obtained for much larger circuits
than previously held possible. The largest instance with respect to the number of qubits has as much
as 18 qubits. Furthermore, for the first time, NNC has been solved to optimality for circuits with more
than 100 quantum gates.
We also compare our algorithm to existing heuristic solution methods. These are much faster than
our exact solution method, but do not guarantee optimal solutions to the problem. The comparison is
most interesting in Tab. 4, where the considered quantum circuits require a higher number of SWAP
gates to comply with the NN constraints. Here we see that the heuristic methods have an optimality
gap of 42.1% averaged over the comparable benchmark instances in Tab. 4.
9
Table 1: Benchmark instances with three or four qubits.
Benchmark n |G| # SWAPS Time Time E # SWAPS H
QFT_QFT3 3 3 1 0.02 - -
peres_10 3 4 1 0.14 0.1a -
peres_8 3 4 1 0.06 0.1a -
toffoli_2 3 5 1 0.12 0.2a -
toffoli_1 3 5 1 0.1 0.1a -
peres_9 3 6 1 0.02 2463a -
fredkin_7 3 7 1 0.16 - -
ex-1_166 3 7 2 0.08 0.1a -
fredkin_5 3 7 1 0.15 0.1a, 0.1∗b -
ham3_103 3 8 2 0.04 - -
miller_12 3 8 2 0.14 745.6a, 0.1b -
ham3_102 3 9 1 0.05 0.1∗a -
3_17_15 3 9 2 0.04 630.2a, 0.1∗b -
3_17_13 3 13 3 0.12 0.1∗a 4
∗
c , 4d, 3e, 6g
3_17_14 3 13 3 0.15 0.1∗a -
fredkin_6 3 15 3 0.06 4.6a -
miller_11 3 17 4 0.15 0.1∗a -
QFT_QFT4 4 6 3 0.17 - -
toffoli_double_3 4 7 1 0.11 0.9a, 0.1∗b -
rd32-v1_69 4 8 2 0.16 0.1a -
decod24-v1_42 4 8 2 0.12 7.7a, 0.1∗b -
rd32-v0_67 4 8 2 0.07 1.6a 2c, 2d
decod24-v2_44 4 8 3 0.07 0.1∗b -
decod24-v0_40 4 8 3 0.06 0.1∗b -
decod24-v3_46 4 9 3 0.09 0.1a, 0.1∗b 3c, 3d
toffoli_double_4 4 10 2 0.07 2002a -
rd32-v1_68 4 12 3 0.24 0.4∗a -
rd32-v0_66 4 12 0 0.09 0.4∗a -
decod24-v0_39 4 15 5 0.53 0.5a -
decod24-v2_43 4 16 5 0.23 0.1∗a -
decod24-v0_38 4 17 4 0.57 19.2a -
decod24-v1_41 4 21 7 0.5 - -
hwb4_52 4 23 8 0.97 - 9c, 10d, 9e, 9f
aj-e11_168 4 29 12 5.36 - -
4_49_17 4 30 12 6.1 - 12∗c , 12d, 16e
decod24-v3_45 4 32 13 6.25 - -
mod10_176 4 42 15 7.94 - -
aj-e11_165 4 44 18 9.36 - 36d, 33∗g
mod10_171 4 57 24 27.18 - -
4_49_16 4 59 22 24.23 - -
mini-alu_167 4 62 27 23.7 - -
hwb4_50 4 63 23 17.61 - -
hwb4_49 4 65 23 21.64 - -
hwb4_51 4 75 28 75.09 - -
6 Conclusion
In this paper we consider the local reordering scheme for nearest neighbor architectures of quantum
circuits. We propose a new mathematical model that counts the number of required SWAP gates
implicitly, by using specific properties of the constraints. The implicit counting improves upon previous
exact approaches in which costs were explicitly determined for each permutation, leading to a factorial
scaling of the model size, and therefore, a high running time. The presented innovations result in a
great improvement in the model size, such that the resulting ILP only contains O(n2m) variables and
constraints.
The benchmark instances with available exact solutions known in the literature were no larger than
circuits with five qubits and no more than twenty gates, due to the excessive running times. The
proposed method can handle quantum circuits with five qubits and 112 gates or up to eighteen qubits
and sixteen gates. In total 131 benchmark instances are evaluated, most of which have not been solved
to optimality.
Because the implicit counting is based on counting inversions in permutations, the formulation is not
easily translated to the popular higher dimensional cases where qubits are placed on a 2D or 3D
grid. To the authors’ best knowledge there is no known polynomial time algorithm that, in 2- or 3-
dimensional grids, solves the subproblem of calculating the minimum number of required SWAP gates
when transforming one qubit order into another. Such a method could have great impact on exact
solution methods in the higher-dimensional setting.
Practical experience with the Branch & Bound tree search indicates that finding a (near) optimal
feasible solution does not consume the most computation time. This means that solving the ILP
heuristically, with a restriction in running time or iteration count for example, could make for a good
heuristic solution method.
References
[1] AlFailakawi, M.G., Ahmad, I., Hamdan, S.: Harmony-search algorithm for 2d nearest neighbor
quantum circuits realization. Expert Syst. with Appl. 61, 16–27 (2016)
[2] AlFailakawi, M.G., AlTerkawi, L., Ahmad, I., Hamdan, S.: Line ordering of reversible circuits for
linear nearest neighbor realization. Quantum Inf. Process. 12(10), 3319–3339 (2013)
[3] Alhagi, N.: Synthesis of Reversible Functions Using Various Gate Libraries and Design Specifica-
tions. Tech. rep., Portland State University (2000)
[4] Amini, J.M., Uys, H., Wesenberg, J.H., Seidelin, S., Britton, J., Bollinger, J.J., Leibfried, D.,
Ospelkaus, C., VanDevender, A.P., Wineland, D.J.: Toward scalable ion traps for quantum infor-
mation processing. New J. Phys. 12(3), 033031 (2010)
[5] Barenco, A., Bennett, C.H., Cleve, R., DiVincenzo, D.P., Margolus, N., Shor, P., Sleator, T.,
Smolin, J., Weinfurter, H.: Elementary gates for quantum computation. Phys. Rev. A 52(5),
3457–3467 (1995)
[6] Bonnet, E., Miltzow, T., RzÄĚÅĳewski, P.: Complexity of Token Swapping and Its Variants.
Algorithmica 80(9), 2656–2682 (2018)
11
Table 2: Benchmark instances with five qubits
Benchmark n |G| # SWAPS Time Time E # SWAPS H
4mod5-v1_25 5 7 1 0.26 11705.3a -
4gt11_84 5 7 1 0.06 16.6a 1c, 1d, 1e
4gt11-v1_85 5 7 1 0.09 - -
4mod5-v0_20 5 8 2 0.08 45.5a -
4mod5-v1_22 5 9 1 0.08 548.8∗a -
QFT_QFT5 5 10 6 0.41 1.6a 7c, 6d
mod5d1_63 5 11 2 0.12 - -
4mod5-v0_19 5 12 3 0.84 55.3∗a -
4gt11_83 5 12 3 0.15 9∗a -
4mod5-v1_24 5 12 3 0.28 - -
mod5mils_65 5 12 4 0.26 - -
mod5mils_71 5 12 2 0.15 - -
alu-v2_33 5 13 4 0.45 - -
alu-v1_29 5 13 4 0.61 - -
alu-v0_27 5 13 4 0.48 - -
mod5d2_70 5 14 5 0.43 - -
alu-v3_35 5 14 5 0.38 - -
alu-v4_37 5 14 5 0.37 - -
alu-v1_28 5 14 4 0.26 - -
4gt13-v1_93 5 15 5 0.69 489.3∗a 7
∗
c , 6d, 4
∗
e
4gt13_92 5 15 6 0.53 - -
4gt11_82 5 16 6 0.89 - -
4mod5-v0_21 5 17 8 2.84 - -
rd32_272 5 18 7 0.94 - -
alu-v3_34 5 18 4 0.4 - -
mod5d2_64 5 19 6 1.81 - -
alu-v0_26 5 21 8 3.56 - -
4gt5_75 5 21 6 1.1 - 9∗c , 12d
4mod5-v0_18 5 23 8 3.35 - -
4mod5-v1_23 5 24 9 5.06 - 9c, 9d, 15e
one-two-three-v2_100 5 24 7 5.37 - -
one-two-three-v3_101 5 24 7 2.96 - -
rd32_271 5 26 11 7.37 - -
4gt5_77 5 28 10 6.2 - -
4gt5_76 5 29 10 5.45 - -
alu-v4_36 5 30 9 6.34 - 15∗c , 18d, 17e
4gt13_91 5 30 8 4.46 - -
4gt13_90 5 34 12 6.77 - -
4gt10-v1_81 5 34 13 12.38 - 18∗c , 20d, 16e, 24
∗
g
one-two-three-v1_99 5 36 15 17.27 - -
4gt4-v0_80 5 36 19 43.45 - 34d, 33f
4mod7-v0_94 5 38 12 12.83 - -
alu-v2_32 5 38 16 22.05 - -
4mod7-v0_95 5 38 14 14.59 - 19∗c , 21d, 22e
4mod7-v1_96 5 38 14 13.49 - -
12
Table 3: Benchmark instances with five qubits and more than 40 gates
one-two-three-v0_98 5 40 15 15.67 - -
4gt12-v0_88 5 41 20 34.01 - -
4gt12-v1_89 5 44 22 52.36 - 35d, 26e, 32f
sf_275 5 46 18 21.42 - -
4gt4-v0_79 5 49 22 80.16 - -
4gt4-v0_78 5 53 26 167.03 - -
4gt4-v0_72 5 53 24 49.7 - -
4gt12-v0_87 5 54 22 45.88 - -
4gt4-v1_74 5 57 29 84.87 - -
4gt12-v0_86 5 58 26 108.35 - -
mod8-10_178 5 68 37 389.47 - -
one-two-three-v0_97 5 71 32 76.8 - -
4gt4-v0_73 5 89 40 699.65 - -
mod8-10_177 5 93 48 3650.26 - 72d
alu-v2_31 5 100 49 2906.35 - -
hwb5_55 5 101 48 2264.0 - 59c, 63d, 60e, 66g
rd32_273 5 104 50 4631.7 - -
alu-v2_30 5 112 55 13558.87
Table 4: Benchmark instances with six or more qubits.
Benchmark n |G| # SWAPS Time Time E # SWAPS H
graycode6_47 6 5 0 0.02 - -
graycode6_48 6 5 0 0.02 - -
QFT_QFT6 6 15 11 7.43 - 11c, 12d
decod24-enable_124 6 21 5 1.86 - -
decod24-enable_125 6 21 5 1.83 - -
decod24-bdd_294 6 24 7 9.37 - -
mod5adder_129 6 71 34 534.38 - -
mod5adder_128 6 77 36 1103.51 - 45∗c , 51d, 46
∗
g
decod24-enable_126 6 86 37 1954.28 - -
xor5_254 7 5 3 0.61 - -
ex1_226 7 5 3 0.25 - -
QFT_QFT7 7 21 16 28.26 - 28c, 26d, 18g
4mod5-bdd_287 7 23 7 4.3 - -
alu-bdd_288 7 28 8 20.65 - -
ham7_106 7 49 28 495.43 - -
ham7_105 7 65 34 1613.33 - -
ham7_104 7 83 42 3238.82 - 56∗c
QFT_QFT8 8 28 23 334.6 - 32c, 33d, 31g
rd53_139 8 36 11 76.29 - -
rd53_138 8 44 11 100.86 - -
rd53_137 8 66 35 6271.11 - -
QFT_QFT9 9 36 30 1482.53 - 52c, 54d, 49g
QFT_QFT10 10 45 39 39594.99 - 64g
mini_alu_305 10 57 23 1711.75 - -
sys6-v0_144 10 62 19 887.71 - -
rd73_141 10 64 21 845.05 - -
parity_247 18 16 14 5762.29 - -
13
[7] Chan, T.M., PÄČtraÅ§cu, M.: Counting Inversions, Offline Orthogonal Range Counting, and Re-
lated Problems. In: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete
Algorithms, pp. 161–173. Society for Industrial and Applied Mathematics (2010)
[8] Choi, B.S., Van Meter, R.: An $\Theta(\sqrt{n})$-depth Quantum Adder on a 2d NTC Quantum
Computer Architecture. J. Emerg. Technol. Comput. Syst. 8(3), 1–22 (2012)
[9] Cplex, IBM ILOG: V12. 1: UserâĂŹs Manual for CPLEX. International Business Machines
Corporation, 46(53) (2009)
[10] Devitt, S.J., Fowler, A.G., Stephens, A.M., Greentree, A.D., Hollenberg, L.C.L., Munro, W.J.,
Nemoto, K.: Architectural design for a topological cluster state quantum computer. New J. Phys.
11(8), 083032 (2009)
[11] Ding, J., Yamashita, S.: Exact Synthesis of Nearest Neighbor Compliant Quantum Circuits in 2d
architecture and its Application to Large-scale Circuits. IEEE Trans. on Comput.-Aided Des. of
Integr. Circuits and Syst. pp. 1–1 (2019)
[12] DiVincenzo, D.P., IBM: The Physical Implementation of Quantum Computation. Fortschr. der
Phys. 48(9-11), 771–783 (2000)
[13] DiVincenzo, D.P., Solgun, F.: Multi-qubit parity measurement in circuit quantum electrodynam-
ics. New J. Phys. 15(7), 075001 (2013)
[14] Dueck, G.W., Pathak, A., Rahman, M.M., Shukla, A., Banerjee, A.: Optimization of Circuits for
IBM’s five-qubit Quantum Computers. pp. 680–684 (2018)
[15] Fowler, A.G., Devitt, S.J., Hollenberg, L.C.L.: Implementation of Shor’s Algorithm on a Linear
Nearest Neighbour Qubit Array. arXiv:quant-ph/0402196 (2004). ArXiv: quant-ph/0402196
[16] Fowler, A.G., Hill, C.D., Hollenberg, L.C.L.: Quantum Error Correction on Linear Nearest Neigh-
bor Qubit Arrays. Phys. Rev. A 69(4), 042314 (2004)
[17] Garey, M.R., Johnson, D.S.: Computers and Intractability; A Guide to the Theory of NP-
Completeness. W. H. Freeman & Co., New York, NY, USA (1979)
[18] GroÃ§e, D., Wille, R., Dueck, G.W., Drechsler, R.: Exact Multiple-Control Toffoli Network
Synthesis With SAT Techniques. IEEE Trans. on Comput.-Aided Des. of Integr. Circuits and
Syst. 28(5), 703–715 (2009)
[19] Grover, L.K.: Quantum Mechanics Helps in Searching for a Needle in a Haystack. Phys. Rev.
Lett. 79(2), 325–328 (1997)
[20] Hattori, W., Yamashita, S.: Quantum Circuit Optimization by Changing the Gate Order for 2d
Nearest Neighbor Architectures. In: J. Kari, I. Ulidowski (eds.) Reversible Computation, Lecture
Notes in Computer Science, pp. 228–243. Springer International Publishing (2018)
[21] Herrera-MartÃŋ, D.A., Fowler, A.G., Jennings, D., Rudolph, T.: A Photonic Implementation for
the Topological Cluster State Quantum Computer. Phys. Rev. A 82(3), 032332 (2010)
[22] Hirata, Y., Nakanishi, M., Yamashita, S., Nakashima, Y.: An Efficient Conversion of Quantum
Circuits to a Linear Nearest Neighbor Architecture. Quantum Inf. and Comput. 11(1&2), 25
(2011)
[23] van Houte, R., Mulderij, J., Attema, T., Chiscop, I., Phillipson, F.: Mathematical formulation
of quantum circuit design problems in networks of quantum computers. Quantum Information
Processing, 19(5), 1-22 (2020).
14
[24] Itoko, T., Raymond, R., Imamichi, T., Matsuo, A., Cross, A.W.: Quantum circuit compilers using
gate commutation rules. In: Proceedings of the 24th Asia and South Pacific Design Automation
Conference on - ASPDAC ’19, pp. 191–196. ACM Press, Tokyo, Japan (2019)
[25] Jerrum, M.R.: The Complexity of Finding Minimum-Length Generator Sequences. Theor. Com-
put. Sci. 36, 25 (1985)
[26] Jones, N.C., Van Meter, R., Fowler, A.G., McMahon, P.L., Kim, J., Ladd, T.D., Yamamoto, Y.:
Layered Architecture for Quantum Computing. Phys. Rev. X 2(3), 031007 (2012)
[27] Kawahara, J., Saitoh, T., Yoshinaka, R.: The Time Complexity of the Token Swapping Problem
and Its Parallel Variants. In: S.H. Poon, M.S. Rahman, H.C. Yen (eds.) WALCOM: Algorithms
and Computation, Lecture Notes in Computer Science, pp. 448–459. Springer International Pub-
lishing (2017)
[28] Knuth, D.E.: The Art of Computer Programming, Volume 3: Sorting and Searching, Second
Edition, vol. 3, 2nd edn. (1974)
[29] Kole, A., Datta, K., Sengupta, I.: A Heuristic for Linear Nearest Neighbor Realization of Quantum
Circuits by SWAP Gate Insertion Using$N$-Gate Lookahead. IEEE J. on Emerg. and Sel. Top.
in Circuits and Syst. 6(1), 62–72 (2016)
[30] Kole, A., Datta, K., Sengupta, I.: A New Heuristic for $N$ -Dimensional Nearest Neighbor
Realization of a Quantum Circuit. IEEE Trans. on Comput.-Aided Des. of Integr. Circuits and
Syst. 37(1), 182–192 (2018)
[31] Kole, A., Datta, K., Sengupta, I., Wille, R.: Towards a Cost Metric for Nearest Neighbor Con-
straints in Reversible Circuits. Rev. Comput. 9138, 273–278 (2015)
[32] Kumph, M., Brownnutt, M., Blatt, R.: Two-dimensional arrays of radio-frequency ion traps with
addressable interactions. New J. Phys. 13(7), 073043 (2011)
[33] Lin, C., Sur-Kolay, S., Jha, N.K.: PAQCS: Physical Design-Aware Fault-Tolerant Quantum Cir-
cuit Synthesis. IEEE Trans. on Very Large Scale Int. Syst. 23(7), 1221–1234 (2015)
[34] Linke, N.M., Maslov, D., Roetteler, M., Debnath, S., Figgatt, C., Landsman, K.A., Wright, K.,
Monroe, C.: Experimental comparison of two quantum computing architectures. Proc Natl Acad
Sci USA 114(13), 3305–3310 (2017)
[35] Markov, I.L., Saeedi, M.: Constant-Optimized Quantum Circuits for Modular Multiplication and
Exponentiation. arXiv:1202.6614 [quant-ph] (2012). ArXiv: 1202.6614
[36] Maslov, D., Young, C., Miller, D., Dueck, G.: Quantum Circuit Simplification Using Templates.
In: Design, Automation and Test in Europe, pp. 1208–1213. IEEE, Munich, Germany (2005)
[37] Matsuo, A., Yamashita, S.: Changing the Gate Order for Optimal LNN Conversion. In: A. De Vos,
R. Wille (eds.) Reversible Computation, Lecture Notes in Computer Science, pp. 89–101. Springer
Berlin Heidelberg (2012)
[38] Nickerson, N.H., Li, Y., Benjamin, S.C.: Topological quantum computing with a very noisy
network and local error rates approaching one percent. Nat. Commun. 4(1) (2013)
[39] Nielsen, M.A., Chuang, I., Grover, L.K.: Quantum Computation and Quantum Information. Am.
J. of Phys. 70(5), 558–559 (2002)
[40] Ohliger, M., Eisert, J.: Efficient measurement-based quantum computing with continuous-variable
systems. Phys. Rev. A 85(6), 062318 (2012)
[41] Pedram, M., Shafaei, A.: Layout Optimization for Quantum Circuits with Linear Nearest Neigh-
bor Architectures. IEEE Circuits and Syst. Mag. 16(2), 62–74 (2016)
15
[42] Pham, P., Svore, K.M.: A 2d Nearest-Neighbor Quantum Architecture for Factoring in Polylog-
arithmic Depth. arXiv:1207.6655 [quant-ph] (2012). ArXiv: 1207.6655
[43] Saeedi, M., Wille, R., Drechsler, R.: Synthesis of quantum circuits for linear nearest neighbor
architectures. Quantum Inf. Process. 10(3), 355–377 (2011)
[44] Shafaei, A., Saeedi, M., Pedram, M.: Optimization of quantum circuits for interaction distance
in linear nearest neighbor architectures. In: 2013 50th ACM/EDAC/IEEE Design Automation
Conference (DAC), pp. 1–6 (2013)
[45] Shafaei, A., Saeedi, M., Pedram, M.: Qubit placement to minimize communication overhead in
2d quantum architectures. In: 2014 19th Asia and South Pacific Design Automation Conference
(ASP-DAC), pp. 495–500 (2014)
[46] Shor, P.: Algorithms for quantum computation: discrete logarithms and factoring. In: Proceedings
35th Annual Symposium on Foundations of Computer Science, pp. 124–134. IEEE Comput. Soc.
Press, Santa Fe, NM, USA (1994)
[47] Siraichi, M.Y., Santos, V.F.d., Collange, S., Pereira, F.M.Q.: Qubit Allocation. In: Proceedings
of the 2018 International Symposium on Code Generation and Optimization, CGO 2018, pp.
113–125. ACM, New York, NY, USA (2018). Event-place: Vienna, Austria
[48] Takahashi, Y., Kunihiro, N., Ohta, K.: The Quantum Fourier Transform on a Linear Nearest
Neighbor Architecture. Quantum Info. Comput. 7(4), 383–391 (2007)
[49] Versluis, R., Poletto, S., Khammassi, N., Haider, N., Michalak, D.J., Bruno, A., Bertels, K.,
DiCarlo, L.: Scalable quantum circuit and control for a superconducting surface code. Phys. Rev.
Applied 8(3), 034021 (2017)
[50] Wille, R., Burgholzer, L., Zulehner, A.: Mapping Quantum Circuits to IBM QX Architectures
Using the Minimal Number of SWAP and H Operations. In: Proceedings of the 56th Annual
Design Automation Conference 2019 on - DAC ’19, pp. 1–6. ACM Press, Las Vegas, NV, USA
(2019)
[51] Wille, R., GroÃ§e, D., Teuber, L., Dueck, G.W., Drechsler, R.: RevLib: An Online Resource
for Reversible Functions and Reversible Circuits. In: 38th International Symposium on Multiple
Valued Logic (ismvl 2008), pp. 220–225 (2008)
[52] Wille, R., Keszocze, O., Walter, M., Rohrs, P., Chattopadhyay, A., Drechsler, R.: Look-ahead
schemes for nearest neighbor optimization of 1d and 2d quantum circuits. In: 2016 21st Asia and
South Pacific Design Automation Conference (ASP-DAC), pp. 292–297. IEEE, Macao, Macao
(2016)
[53] Wille, R., Lye, A., Drechsler, R.: Considering nearest neighbor constraints of quantum circuits at
the reversible circuit level. Quantum Inf. Process. 13(2), 185–199 (2014)
[54] Wille, R., Lye, A., Drechsler, R.: Exact Reordering of Circuit Lines for Nearest Neighbor Quantum
Architectures. IEEE Trans. on Comput.-Aided Des. of Integr. Circuits and Syst. 33(12), 1818–
1831 (2014)
[55] Yao, N.Y., Gong, Z.X., Laumann, C.R., Bennett, S.D., Duan, L.M., Lukin, M.D., Jiang, L.,
Gorshkov, A.V.: Quantum Logic between Remote Quantum Registers. Phys. Rev. A 87(2),
022306 (2013)
[56] Zulehner, A., Bauer, H., Wille, R.: Evaluating the Flexibility of A* for Mapping Quantum Cir-
cuits. In: M.K. Thomsen, M. Soeken (eds.) Reversible Computation, vol. 11497, pp. 171–190.
Springer International Publishing, Cham (2019)
16
[57] Zulehner, A., Paler, A., Wille, R.: An Efficient Methodology for Mapping Quantum Circuits to
the IBM QX Architectures. IEEE Trans. on Comput.-Aided Des. of Integr. Circuits and Syst.
38(7), 1226–1236 (2019)
17
