Compiling SU(4) Quantum Circuits to IBM QX Architectures by Zulehner, Alwin & Wille, Robert
Compiling SU(4)antum Circuits to IBM QX Architectures
Alwin Zulehner Robert Wille
Institute for Integrated Circuits, Johannes Kepler University Linz, Austria
alwin.zulehner@jku.at robert.wille@jku.at
ABSTRACT
e Noisy Intermediate-Scale antum (NISQ) technology is cur-
rently investigated by major players in the eld to build the rst
practically useful quantum computer. IBM QX architectures are
the rst ones which are already publicly available today. However,
in order to use them, the respective quantum circuits have to be
compiled for the respectively used target architecture. While rst
approaches have been proposed for this purpose, they are infeasi-
ble for a certain set of SU(4) quantum circuits which have recently
been introduced to benchmark corresponding compilers. In this
work, we analyze the bolenecks of existing compilers and provide
a dedicated method for compiling this kind of circuits to IBM QX
architectures. Our experimental evaluation (using tools provided by
IBM) shows that the proposed approach signicantly outperforms
IBM’s own solution regarding delity of the compiled circuit as
well as runtime. Moreover, the solution proposed in this work has
been declared winner of the IBM QISKit Developer Challenge. An
implementation of the proposed methodology is publicly available
at hp://iic.jku.at/eda/research/ibm qx mapping.
1 INTRODUCTION
antum computers oer a promising computation paradigm that
allows to solve certain tasks signicantly faster than conventional
machines. Instead of bits, these devices operate on so-called qubits
that can not only be in one of the basis states |0〉 and |1〉, but also in
an (almost) arbitrary superposition of both, i.e. |ϕ〉 = α |0〉 + β |1〉.
In combination with other quantum mechanical phenomena like
entanglement and phase shis, this allows to develop quantum cir-
cuits (i.e. a sequence of operations that are applied to the qubits) that
gain an exponential speedup compared to conventional machines
for several practically relevant problems.
Currently, there is an ongoing “race” to build the rst practically
useful quantum computer between large companies like IBM, Intel,
Rigei, and Google [11, 14, 16, 20]. ey all develop devices that can
be classied to the Noisy Intermediate-Scale antum (NISQ [19])
technology. Although still limited by their number of available
qubits and low delity, these devices provide the capability of run-
ning quantum algorithms for dedicated problems in domains such
as quantum chemistry or physical simulation and they provide the
rst step towards fault-tolerant quantum computing. Among the
dierent solutions currently developed by the companiesmentioned
above, IBM’s approach (yielding so-called IBM QX architectures) is
the rst one which is already publicly available today (through a
cloud access launched within their project IBM Q [1]). Because of
this, we are focusing on this architecture in the following.
However, in order to use IBM QX devices (or NISQ devices in
general), the respective quantum circuits have to be compiled to the
target architecture. is includes a decomposition of the operations
into elementary gates provided by the architecture, as well as a
mapping procedure that maps the logical qubits of the circuit to the
physical ones of the QX device. While for the decomposition step,
several solutions exist (cf. [7, 17, 18, 27]), especially the mapping
step constitutes a tough challenge, since further physical constraints
have to be considered. In fact, 2-qubit gates can be applied to certain
pairs of physical qubits only. erefore, SWAP operations have to
be inserted that exchange the state of two physical qubits and, by
this, allow to “move” the logical qubits to positions where they can
interact with each other. Since each additional operation further
decreases the delity of the quantum circuit, their number shall be
kept as small as possible.
Accordingly, researchers investigated how to eciently accom-
plish that—yielding a large body of solutions for minimizing the
number of SWAP operations required for satisfying the physical
constraints. But most of them (e.g. the ones proposed in [9, 21, 25,
26, 28]) focus on so-called nearest neighbor constraints, which are
not sucient to get executed on IBM QX architectures (or NISQ
architectures in general for that purpose). Other ones (such as
proposed in [13, 24]) focus on specic quantum circuits only. In
fact, to the best of our knowledge, only IBM’s own solution [5]
(provided in the corresponding SDK) as well as the approaches
recently proposed in [22, 29] are capable of suciently compiling
quantum circuits for IBM QX architectures thus far.
However, recently a set of quantum circuits (called SU(4) quan-
tum circuits in the following) has been introduced which turns out
to constitute a worst case for these compiling methods—making
them infeasible. is is a crucial issue since this kind of circuits
has explicitly been advocated by IBM to benchmark compilers
(e.g. through a so-called QISKit Developer Challenge [4]). Hence,
for a class of circuits which is considered to be important by a major
player in the development of quantum computers, no method exists
for eciently compiling them to IBM QX architectures.
In this paper, we address this problem by providing a dedicated
compiler for SU(4) quantum circuits for IBM QX architectures. To
this end, we analyze the existing compilation approaches and de-
termine their respective advantages and bolenecks. Based on that
evaluation, we present a compilation approach which explicitly
takes the structure of SU(4) quantum circuits into consideration.
Experimental evaluations clearly show that the proposed approach
signicantly outperforms IBM’s current solution as well as the other
recently provided compilers with respect to delity of the result-
ing circuits as well as regarding runtime. Moreover, the proposed
approach has been declared winner of the IBM QISKit Developer
Challenge. According to IBM, the proposed solution yields compiled
circuits with at least 10% beer costs than the other submissions
while generating them at least 6 times faster.
e remainder of this work is structured as follows. In Section 2,
we review IBM’s QX architectures, the considered SU(4) quantum
circuits, as well as the compilation problem itself. In Section 3, we
review the existing state of the art discuss why existing solutions
suer in compiling SU(4) circuits—providing the motivation of this
work. In Section 4, we present the dedicated solution in detail;
followed by an experimental comparison to the state of the art in
Section 5. Section 6 concludes the paper.
2 BACKGROUND
In this section, we briey discuss IBM’s QX architectures as well
as the considered quantum circuits and provide a more detailed
description of the considered compilation task.
ar
X
iv
:1
80
8.
05
66
1v
2 
 [q
ua
nt-
ph
]  
5 N
ov
 20
18
Q0 Q1
Q2
Q4 Q3
(a) QX2
Q0 Q1
Q2
Q4 Q3
(b) QX4
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8
Q0 Q15 Q14 Q13 Q12 Q11 Q10 Q9
(c) QX5
Figure 1: Coupling map of the IBM QX architectures [2]
2.1 IBM’s QX Architectures
In 2017, IBM started the initiative IBM Q in order to make quantum
computers available to the broad audience via cloud access. Cur-
rently, their infrastructure contains two 5-qubit quantum devices
located in Yorktown and Tenerife (also called IBMQX2 and IBMQX4,
respectively), as well as a 16-qubit device located in Rueschlikon
(also called IBM QX5), which are publicly available. Moreover, there
exists a 20-qubit architecture located in Austin that is available for
IBM’s partners and members of the IBM Q network.
All these devices use superconducting qubits that are connected
with coplanar waveguide bus resonators [2]. antum operations
are conducted by applying microwave impulses to the qubits. By
this, all these architectures have the same (or at least similar) phys-
ical constraints that have to be satised when running quantum
algorithms (i.e. quantum circuits) on them.
In fact, IBM’s QX architectures only support two types of quan-
tum operations (i.e. quantum gates): U (θ ,ϕ, λ) = Rz (ϕ)Ry (θ )Rz (λ)
is a single qubit gate, which is composed of two rotations around
the z-axis and one rotation around the y-axis (i.e. an Euler decom-
position). Furthermore, a controlled NOT gate (i.e. a CNOT ) can be
applied to a pair of qubits. If the so-called control qubit (denoted
as • in quantum circuits) is in basis state |1〉, the state of the target
qubit (denoted as ⊕ in quantum circuits) is inverted. ese two
quantum gates provide a universal basis, i.e. any quantum algorithm
can be conducted by usingU and CNOT gates only.
However, besides the restriction regarding the available gates,
there are further physical constraints given by the architecture. In
fact, CNOT gates can be applied only to qubits that are connected
by a bus resonator. Furthermore, only the qubit with lower fre-
quency may serve as target while only the qubit with the higher
frequency may serve as control (except for certain cases; cf. [2]).
ese restrictions are summarized in so-called coupling maps.
Example 1. Fig. 1 shows the coupling maps for the IBM QX2,
IBM QX4, and IBM QX5 architectures. Here, qubits are visualized
with vertices and an arrow pointing from qubit Qi to qubit Q j indi-
cates that only CNOTs with control qubit Qi and target qubit Q j can
be applied.
In the following, we denote the devices listed above, as well as
(future) devices that employ the same type of constraints as IBM QX
architectures. Besides that, note that, since quantum computers are
still in their infancy, applying a quantum gate fails with a certain
probability (cf. NISQ devices [19]). According to data provided by
IBM [3], CNOT operations approximately have a delity that is
10 times smaller than for single qubit gates. Because of that, it
is of uermost importance to keep the number of CNOT gates in
particular as small as possible.
q0
SU (4)
U1 • U3 U5 • U6
q1
≡
U2 U4 • U7
Figure 2: KAK decomposition of an SU(4) gate
2.2 Consideredantum Circuits
antum algorithms or quantum circuits are usually described
using high-level quantum languages [6, 15], quantum assembly
languages (e.g. OpenQASM 2.0 developed by IBM [12]), or circuit
diagrams (such as those shown in Fig. 2), where the qubits are
visualized as circuit lines that are passed through quantum gates.
ese lines do not refer to an actual hardware connection (as in
conventional logic), but rather dene in which order (from le to
right) the respective gates (i.e. operations) are applied.
In this paper, we consider quantum circuits provided by IBM to
benchmark the performance of respective compilers (e.g. through a
so-called QISKit Developer Challenge [4]). ese circuits are prod-
ucts of random 2-qubit gates from SU(4) that are applied to random
pairs of qubits and denoted as SU(4) quantum circuits in the follow-
ing.1 More precisely, in each layer of the circuit, the available qubits
are grouped randomly into pairs of 2 qubits each (if their number is
even). en, to each of these pairs of qubits, a random 2-qubit gate
from SU(4) is applied. Since these 2-qubit gates are not available in
the gate set of the IBM QX architectures, KAK-decomposition [23]
is used to decompose each of these 2-qubit gates into a sequence
of three CNOTs and 7 single qubit gates. Eventually, these decom-
posed gates form the circuits for determining the performance of
the compilers.
Example 2. Fig. 2 shows the KAK decomposition of a random
SU(4) gate. For simpler visualization, we neglect the parameters θ ,
ϕ, and λ for the single qubit gates Ui (which are usually dierent
for eachUi ). As can be seen, single qubit gates and CNOT gates are
applied in an interleaved fashion.
2.3 Considered Problem
In this work, we consider how to eciently compile the quantum
circuits reviewed in the previous section to IBM QX architectures.
In general, compilation is comprised of two steps. First, the opera-
tions occurring in the quantum circuits have to be decomposed into
elementary operations that are available on the target hardware.
In the literature, there exist plenty of such approaches (e.g. those
proposed in [7, 17, 18, 27]) for dierent gate libraries like Clif-
ford+T [10] or the the NCV library [8]. ose solutions can easily
be integrated in compilers such as the one proposed here.
However, the second step represents a bigger challenge: Here,
we need to determine a mapping of the n logical qubits occurring
in the quantum circuit (denoted by q0,q1, . . . ,qn−1 in the follow-
ing) to the m ≥ n physical qubits in the hardware (denoted by
Q0,Q1, . . . ,Qm−1 in the following) such that the physical (architec-
tural) constraints reviewed above are satised. In almost all cases,
it is not possible to determine such a mapping so that these con-
straints are satised for all gates/operations throughout the circuit.
Consequently, the mapping has to change dynamically. is can be
achieved by adding SWAP gates to the circuit, which exchange the
state of two physical qubits and, thus, allow to “move” the logical
qubits to positions where they can interact with each other.
1SU(4) is the special unitary group with degree 4, i.e. the Lie group of 4 × 4 unitary
matrices with determinant 1. e functionality of any 2-qubit gate is described by an
element from this group.
Q0   q0 × q1 • • • H • H •
Q1   q1 × q0 ≡ • ≡ H H
Figure 3: Decomposition of a SWAP gate
Example 3. Fig. 3 shows a SWAP operation and how it can be de-
composed into operations that are available on IBM QX architectures.
In the le-most circuit shown in Fig. 3, the logical qubits q0 and q1 are
mapped to the physical qubits Q0 and Q1, respectively. By applying
a SWAP operation between Q0 and Q1 the “position” of q0 and q1 is
permuted. e SWAP operation can be decomposed into a sequence of
three CNOTs as shown in the center of Fig. 3. If we assume that only
CNOTs with control qubit Q0 and target qubit Q1 are possible (like
for IBM QX2; cf. Fig. 1a), we additionally have to invert the direction
of the middle CNOT by applying Hadamard gates H = U (pi/2, 0,pi )
before and aer the CNOT (as shown in the right-most circuit in
Fig. 3).
Obviously, the number of additional SWAP operations shall be
kept as small as possible, since each further operation decreases
the delity of the circuit when running on an IBM QX device.2
erefore, IBM has set the goal to develop a compiler (including
a mapping strategy) such that a circuit with the largest possible
delity results [4].
3 STATE OF THE ART AND MOTIVATION
FOR A DEDICATED SOLUTION
In this section, we discuss the current state of the art and moti-
vate the need for a dedicated approach for compiling the circuits
reviewed in Section 2.2 to IBM’s QX architectures reviewed in
Section 2.1.
In the literature, there have already been several works that con-
sider the mapping of quantum circuits to physical devices. However,
most of them either focus on so-called nearest neighbor constraints
only [9, 21, 25, 26, 28] and/or on special quantum circuits to be
mapped [13, 24]. In the corresponding nearest neighbor architec-
tures, a 2-qubit gate can be applied to any neighboring qubits and
also in any desired direction—clearly violating the constraints for
IBM QX architectures represented by the coupling maps. Moreover,
many of the previously proposed approaches are only applicable
for a very limited number of qubits (even lower than the 16 already
available from IBM).
In contrast, few methods exist which map the logical qubits of
a quantum circuit to the physical ones of the IBM QX architec-
tures. More precisely, a solution developed by IBM itself (based on
Bravyi’s algorithm and implemented in IBM’s own SDK QISKit [5])
as well as the works presented in [22, 29] is available thus far. While
the solution proposed in [22] has only been thoroughly evaluated
for 5-qubit architectures and rather small circuits (and yields cir-
cuits with larger overhead than IBM’s solution for 16-qubit devices),
the approach proposed in [29] has shown signicant improvements
regarding gate count, depth, and runtime—clearly outperforming
IBM’s solution e.g. on the 16-qubit architectures and for circuits
composed of thousands of gates.
is dierence in quality is mainly because IBM’s solution ran-
domly searches for amapping that satises the physical constraints—
leading to a rather small exploration of the search space so that only
rather poor solutions are usually found. In contrast, the approach
proposed in [29] aims for an optimized solution by exploring a
2Note that these additional SWAPs also increase the depth of the circuit and, thus, its
execution time on a quantum computer.
larger part of the search space and additionally exploiting informa-
tion of the circuit. More precisely, a look-ahead scheme is employed
that considers gates that are applied in the near future and, thus,
allows to determine mappings which constitute a local optima with
respect to the number of SWAP operations. However, this solution
is hardly suitable for the SU(4) circuits reviewed in Section 2.2,
because:
• e solution rests on the main idea to rst divide the circuit
into layers of gates3 and, aerwards, determine a permu-
tation of qubits for each layer which satises all physical
(architectural) constraints within this subset of gates.4
• SU(4) circuits are composed of layers of gates which fre-
quently contain n2 dierent CNOT congurations (with n
being the number of qubits). is is basically a worst case
scenario since the more CNOT gates are employed within
a layer, the more constraints have to be satised by a per-
mutation of qubits.
As a consequence, the solution proposed in [29] cannot unfold
its power for determining mapped circuits with smaller overhead
than IBM’s solution when applied for SU(4) circuits as it basically
has to check all permutations within a layer until one is determined
satisfying all constraints imposed by the CNOTs. Considering that
SU(4) circuits have explicitly been provided by IBM to benchmark
compilers, this is a serious drawback and motivates a compilation
approach dedicated to this kind of circuits.
4 PROPOSED APPROACH
In this section, we describe a dedicated procedure to compile SU(4)
quantum circuits to IBM QX architectures. To overcome the limi-
tations of the approach proposed in [29], while keeping the avail-
ability of a look-ahead scheme, we break out of the layered-based
approach and consider each gate on its own. In order to deal with
the correspondingly resulting complexity, the proposed algorithm
employs a combination of three steps: a pre-process step (reducing
the complexity beforehand), a powerful search method (solving
the mapping problem), and eventually a dedicated post-mapping
optimization (exploiting further optimization potential aer the
mapping).
4.1 Pre-Process: Grouping Gates
Since each gate is considered on its own, the mapping may change
aer each gate (requiring much more calls of the mapping algo-
rithm). To overcome this issue, we perform a pre-processing step
where we form groups of gates, which we represent as a directed
acyclic graph (DAG). By this, the mapping algorithm has to be
called (at most) only once per group instead of once per gate. As
further advantage, this DAG representation inherently encodes the
precedence of the groups of gates and, thus, unveils important infor-
mation about which groups of gates commute—giving the degree
of freedom to choose which group shall be mapped next.
In order to group the gates, we topologically sort the circuit and
group all gates that act on pairs of logical qubits (e.g. on qubits qi
and qj ) into a group Gk . is includes single qubit gates on qi or
qj as well as CNOTs with control qi and target qj (or vice versa).
is grouping is done in a greedy fashion—until observing a CNOT
with control or target qi (qj ) that acts on a qubit dierent from qj
(qi ). is is possible, since gates that act on distinct sets of qubits
are commutative.
3A layer contains only gates that act on disjoint qubits. us all gates of a layer can be
applied in parallel.
4Between the respective layers, SWAP gates as shown in Fig. 3 are introduced to
establish the respective qubit permutations.
q0 , q1 q2 , q3 q4 , q5
q1 , q2 q3 , q4 q0 , q5
G0 G1 G2
G3 G4 G5
Figure 4: DAG aer grouping the gates of the circuit
Example 4. Consider again the circuit shown at the right-hand
side of Fig. 2. Since, all gates of the circuit act on qubits q0 and q1, the
grouped circuit contains a single group. By this, the mapping has to
be changed at most once in order to apply all gates.
As stated above, grouping gates has a positive eect on the
following mapping algorithm, since all gates of a group can be
applied once the physical constraints are satised for the involved
qubits.5 us, the mapping of the gates of the circuit reduces to
mapping the groups.
Example 5. Consider the DAG shown in Fig. 4. is DAG repre-
sents a quantum circuit composed of 6 qubits, where the rst layer
is composed of SU(4) gates between the logical qubits q0 and q1, q2
and q3, as well as q4 and q5, respectively. Moreover, the second layer
contains SU(4) gates between the logical qubits q1 and q2, q3 and q4,
as well as q0 and q5, respectively.
4.2 Solving the Mapping Problem
Aer grouping the gates, the physical constraints of the target
architecture given by the coupling map are satised by a mapping
algorithm that determines a dynamically changing mapping of
the logical qubits to the physical ones. In theory, the mapping
can change (by inserting SWAP gates) aer each group—resulting
in a huge search space since m! possibilities exist for each such
intermediate mapping. To cope with this enormous search space
we use an A* search algorithm to nd a solution that is as cheap as
possible.
For the mapping strategy presented in this paper, we choose an
arbitrary initial mapping such that the physical constraints are sat-
ised for all groups in the DAG that do not have any predecessors
(i.e. the corresponding logical qubits are mapped to physical ones
that are connected in the coupling map). By this, we can immedi-
ately add the gates of these groups to the (initially empty) compiled
circuit.6
Example 6. Consider again the DAG in Fig. 4, which describes the
gate groups to be mapped. Assume that the circuit shall be compiled
for the IBM QX5 architecture, whose coupling map is depicted in
Fig. 1c. One possible initial mapping is Q1   q0, Q0   q1, Q2   q4,
Q15   q2, Q3   q5, and Q14   q3 (i.e. the logical qubits are mapped
to the six le-most physical qubits). Using this initial mapping, the
gate groups in the rst layer (i.e. G0, G1, and G2) can be applied
since the involved logical qubits are mapped to physical ones that are
connected in the coupling map for each of the groups.
Aer determining an initial mapping, the actual mapping proce-
dure is composed of two alternating steps that are employed until
all groups are mapped.
e rst step adds all groups to the compiled circuit, whose
parents in the DAG are already mapped and whose logical qubits
are mapped to physical ones that are connected in the coupling
map.
5Note that the direction of the CNOTs might have to be adjusted (which is rather
cheap since only Hadamard gates have to be added).
6Note that the qubits have to be relabeled according to the mapping and that the
direction of some CNOTs might be adjusted.
Example 6 (continued). e initial mapping additionally allows
to add gates of group G3 to the compiled circuit, since the its parents
in the DAG (i.e. the groups G1 and G2) are already mapped and the
physical constraints are also satised (since Q0   q1 and Q15   q2).
e second step determines the set of groups Gnext that can be
applied next according to their precedence in the circuit, i.e. the set
of groups whose parents in the DAG are already compiled. en,
the task of the mapping algorithm is to determine a new mapping
(by inserting SWAP gates) such that the physical constraints are
satised for at least one of the gate groups in Gnext .
Example 6 (continued). One possibility is to incorporate a SWAP
operation on the physical qubits Q15 and Q2 since this “moves” the
logical qubits q3 and q4 towards each other and, thus, allows to add
the gates from gate groupG4 to the compiled circuit. Finally, inserting
another SWAP operation between the physical qubitsQ1 andQ2 allows
to add the gates of the group G5 to the compiled circuit. Overall, two
SWAP gates were inserted during the mapping procedure of the circuit.
Another solution would be to incorporate a SWAP operation on the
physical qubits Q2 and Q3. Since this “moves” the logical qubits q0
and q5, as well as the logical qubits q3 and q4 towards each other, the
gate groups G4 and G5 can be applied by inserting a single SWAP
operation during the compilation procedure.
Among the solutions found by the mapping algorithm, we aim
for determining the mapping that yields the lowest cost. Since there
arem! dierent mappings of the physical qubits, we utilize an A*
search to avoid exploring the whole search space. e general idea
of the A* search algorithm is to reach a goal state from an initial
state such that the costs for reaching this state is the minimum
(with respect to a certain heuristic). To this end, all successor states
of the cheapest state are added to the explored search space (i.e. the
cheapest state is expanded) until a goal state is reached. e costs
c(x) = f (x) + h(x) are thereby dened as the sum of the x costs
f (x) (i.e. the costs for reaching the state x from the initial state)
and the heuristic costs h(x) (i.e. an estimation for reaching a goal
state from state x ).
is general description of the A* search algorithm has been
adjusted for the considered mapping problem. More precisely, the
initial state is the current mapping of the logical qubits to the phys-
ical ones. A goal state is any state that describes a mapping where
the physical constraints are satised for at least one of the groups
groups. Expanding a state is conducted by applying one SWAP
operation between two physical qubits which results in a successor
mapping. Given that, the corresponding cost functions f (x) and
h(x) have to be determined. e x cost f (x) of a state is given by
the number of SWAP operations that have been added (starting from
the current mapping). For the estimation of the remaining costs
h(x), the utilized heuristic employs a look-ahead scheme, which
allows to signicantly reduce the costs of the compiled circuit.
More precisely, for each group, we determine the distance of the
physical qubits in the coupling map where the respective logical
qubits are mapped to, and sum these distances up for all groups
in Gnext .7 By this, we do not only focus on one of these groups,
but additionally try to optimize the mapping for groups that are
applied in the near future.
Example 6 (continued). e look-ahead scheme determines the
goal node reached by conducting a SWAP operation between the
physical qubits Q2 and Q3, since from the two solutions resulting
in a goal state with costs 1 (inserting a single SWAP gate; as discussed
above), the solution with the lower look-ahead costs was chosen.
7Note that the heuristic is not admissible and, hence, may not lead to a locally optimal
solution. However, locally optima are not desired anyways, since these oen yield to
globally larger overhead.
4.3 Post-Mapping Optimization
Aer satisfying the physical constraints given by the target archi-
tecture, we nally apply a dedicated post-mapping optimization in
order to further reduce the costs of the compiled circuit. To this
end, we regroup the gates of the compiled circuit as described in
Section 4.1, since the mapping algorithm has added several SWAP
gates to the compiled circuit. en, we traverse the resulting DAG
and optimize each group individually.
e key idea of the proposed optimization is that the functional-
ity of the gates in a groupGi can be represented by a single matrix
from SU(4). Hence, we can easily build up this matrix by multi-
plying the unitary matrices representing the individual gates and,
again, use KAK-decomposition [23] to determine another group
G ′i with 3 CNOTs and 7 single qubit gates that realizes the same
functionality (cf. Section 2.2). If the gates in G ′i have lower costs
than the gates in the original group Gi , we replace Gi with G ′i in
the DAG. is especially works well, when applying a SWAP gate
to two qubits, to which a gate from SU(4) has been applied right
before.
Example 7. Consider again the KAK-decomposition shown in
Fig. 2 with its 3 CNOTs and 7 single qubit gates. Furthermore, assume
that immediately aerwards a SWAP operation is applied to the phys-
ical qubits currently holding the logical qubits q0 and q1—yielding a
group Gi with 6 CNOTs and 11 single qubits. However, representing
the overall functionality of this group as a unitary matrix from SU(4)
and applying KAK-decomposition again yields another groupG ′i with,
again, 3 CNOTs and 7 single qubit gates. Hence, the SWAP operation
can be conducted “for free”.
Note that the knowledge of this post-mapping optimization can
be used to improve the mapping algorithm itself. More precisely,
knowing that SWAP operations directly applied aer a gate from
SU(4) are “for free” can be included in the costs function f (x) of the
x costs by seing the costs of the respective SWAP operation to 0.
Finally, a similar (but simpler) optimization can be applied for
optimizing subsequent single qubit gates within a group. Such gates
may e.g. occur when swapping the direction of a CNOT by inserting
Hadamard gates. Again, the 2 × 2 unitary matrices describing the
individual gates can be multiplied. Aerwards, the Euler angles of
the rotations around the z and y axis are determined.
Example 8. Consider again the KAK-decomposition shown in
Fig. 2. To change the direction of the center CNOT gate, Hadamard
gates are inserted to each qubit before and aer the CNOT—yielding
to subsequent single qubit gates that are applied to q0 and q1, respec-
tively. Again, this sequence of e.g.U3 and H can again be represented
by one single qubit gate.
5 EXPERIMENTAL EVALUATION
In this section, we experimentally evaluate the proposed approach
and compare it to the compiler available in IBM’s SDK QISKit [5].8
To this end, we implemented the proposed methodology in Cython
(available at hp://iic.jku.at/eda/research/ibm qx mapping) and
used the scripts provided by IBM to conduct the evaluation (these
scripts are available at [4]). Since the delity of CNOT gates is
approximately 10 times lower for IBM QX architectures than the
8Note that no experimental comparison is reported for the approach presented in [29]
since, as discussed in Section 3, the SU(4) circuits represent a worst case for them and,
hence, this method is not feasible for those benchmarks. In fact, running the publicly
available implementation (taken from hp://iic.jku.at/eda/research/ibm qx mapping/)
conrms that this method frequently times out for those benchmarks. For the same rea-
son, also no results for the approach presented in [22] is presented which is applicable
for a rather tiny number of qubits only, respectively.
delity of single qubit gates (cf. [3]), the provided cost function
assigns a cost of 10 for each CNOT as well as a cost of 1 for sin-
gle qubit gate.9 All evaluation have been conducted on a 3.8 GHz
machine with 32GB RAM.
Besides the circuits, IBM also provides several coupling maps for
architectures with 5, 16, and 20 qubits, respectively. ese archi-
tectures include the existing quantum devices IBM QX2, IBM QX4,
and IBM QX5, as well as other architectures where the qubits are
arranged in a linear, circular, or rectangular fashion. For these
architectures, the direction of the arrows in the coupling maps
are chosen randomly by IBM (or connections are missing at all)
to provide a realistic basis for the evaluation. For each number of
qubits in the architectures (5, 16, or 20), we use 10 circuits, which
we compile to each architecture with the corresponding number
of qubits. Eventually, this results in a seing which is also used
by IBM to evaluate compilers submied to the QISKit Developer
Challenge [4]. e resulting costs are visualized by means of scaer
plots in Fig. 5.
Each of the plots in Fig. 5 shows the cost of the compiled circuits
when using the QISKit compiler on the x-axis, as well as the cost
of the compiled circuit when using the proposed solution on the
y-axis. Each point represents one SU(4) circuit that is compiled for a
certain architecture. Hence, a point underneath the main diagonal,
indicates the proposed solution yields a circuit with lower cost
(which is the case for all evaluated circuits and architectures). e
larger the distance to themain diagonal, the larger the improvement.
We additionally added horizontal and vertical lines that indicate
the cost of the original circuits (i.e. the cost before compilation).
As can be seen in Fig. 5a, circuits compiled by the proposed
methodology may be cheaper than the original circuit (despite the
fact that SWAP gates are added during the compilation process).
is is possible since, in some cases, two SU (4) gates are subse-
quently applied to the same two qubits. By using our post-mapping
optimization (cf. Section 4.3), these gates can be combined to a
single gate from SU (4). Overall, we achieve an average improve-
ment by a factor of 1.54 compared to IBM’s own solution for the
5-qubit architectures. For the 16 and 20 qubit architectures, the
probability that two subsequent SU (4) gates are applied to the same
qubits is almost zero. But although this does not allow as much
post-mapping optimization as for the 5-qubit architectures, we still
observe signicant improvements of a factor of 1.26 and 1.22 on av-
erage, respectively. e precise improvements for each architecture
are listed in Table 1.
Besides the average improvement in terms of the provided cost
function, the proposedmethod is also signicantly faster than IBM’s
solution. While IBM’s solution requires more than 200 seconds for
mapping some of the circuits composed of 20 qubits, the proposed
method was able to map each of the circuits within 10 seconds. On
average, we obtain an improvement of the runtime by a factor of
5.68, 16.42, and 21.90 for the architectures with 5, 16, and 20 qubits,
respectively (cf. Table 1).
Overall, the evaluation using the scrips, circuits, and coupling
maps provided by IBM shows that the dedicated compile methodol-
ogy proposed in this paper signicantly outperforms IBM’s own
solution regarding the provided cost function (which estimates -
delity) as well as runtime. Moreover, the solution proposed in this
paper has been declared winner of the QISKit Developer Challenge.
According to IBM, it yields compiled circuits with at least 10% beer
costs than the other submissions while generating them at least 6
times faster.
9Note that a single qubit gate U (0, 0, λ) has cost 0 since no pulse is applied to the
respective qubit in this case.
(a) 5 qubit architectures (b) 16 qubit architectures (c) 20 qubit architectures
Figure 5: Cost of the compiled circuits
Table 1: Average improvement factors
5 qubits 16 qubits 20 qubits
architecture cost time architecture cost time architecture cost time
IBM QX2 1.55 5.96 IBM QX3 1.25 14.85 Random Linear 1.15 14.64
IBM QX4 1.54 5.84 IBM QX5 1.23 12.87 Regular Circle 1.19 14.25
Regular Linear 1.58 5.40 Random Linar 1.19 11.52 Regular Rectangle 1.24 32.67
Random Linear 1.57 5.43 Random Rectangle 1.24 20.99 Random Rectangle 1.27 33.95
Random Circle 1.49 5.81 Defect Rectangle 1.39 25.80 Defect Rectangle 1.25 21.78
avg 1.54 5.68 avg 1.26 16.42 avg 1.22 21.90
6 CONCLUSIONS
In this paper, we presented a dedicated method for compiling cir-
cuits composed of SU (4) gates to IBM QX architectures. By using
a preprocessing-step that groups the gates in order to reduce the
complexity, a mapping algorithm based on an A* search with a
look-ahead scheme, as well as a dedicated post-mapping optimiza-
tion, we were able to overcome the shortcomings of previously
proposed approaches. Our evaluation using tools provided by IBM
clearly shows that the proposed approach signicantly outperforms
the compiler available in IBM’s SDK QISKit regarding a cost func-
tion that estimates the delity of the compiled circuit as well as
runtime. Moreover, it has been declared winner of the QISKit
Developer Challenge. An implementation is publicly available at
hp://iic.jku.at/eda/research/ibm qx mapping.
ACKNOWLEDGMENTS
is work has partially been supported by the European Union
through the COST Action IC1405.
REFERENCES
[1] IBM Q. hps://www.research.ibm.com/ibm-q/. Accessed: 2017-09-15.
[2] IBM QX backend information. hps://github.com/QISKit/
ibmqx-backend-information. Accessed: 2017-09-15.
[3] IBM QX Devices. hps://quantumexperience.ng.bluemix.net/qx/devices. Ac-
cessed: 2018-06-27.
[4] QISKit Developer Challenge. hps://qx-awards.mybluemix.net/
#qiskitDeveloperChallengeAward. Accessed: 2018-06-27.
[5] QISKit Python SDK. hps://github.com/QISKit/qiskit-sdk-py. Accessed: 2017-
09-15.
[6] A. J. Abhari, A. Faruque, M. J. Dousti, L. Svec, O. Catu, A. Chakrabati, C.-F.
Chiang, S. Vanderwilt, J. Black, and F. Chong. Scaold: antum programming
language. Technical report, Princeton univ nj dept of computer science, 2012.
[7] M. Amy, D. Maslov, M. Mosca, and M. Roeeler. A meet-in-the-middle algorithm
for fast synthesis of depth-optimal quantum circuits. IEEE Trans. on CAD of
Integrated Circuits and Systems, 32(6):818–830, 2013.
[8] A. Barenco, C. H. Benne, R. Cleve, D. DiVinchenzo, N. Margolus, P. Shor,
T. Sleator, J. Smolin, and H. Weinfurter. Elementary gates for quantum computa-
tion. e American Physical Society, 52:3457–3467, 1995.
[9] D. Bhaacharjee and A. Chaopadhyay. Depth-optimal quantum circuit place-
ment for arbitrary topologies. arXiv preprint arXiv:1703.08540, 2017.
[10] P. O. Boykin, T. Mor, M. Pulver, V. Roychowdhury, and F. Vatan. A new universal
and fault-tolerant quantum basis. Information Processing Leers, 75(3):101–107,
2000.
[11] R. Courtland. Google aims for quantum computing supremacy. IEEE Spectrum,
54(6):9–10, 2017.
[12] A. W. Cross, L. S. Bishop, J. A. Smolin, and J. M. Gambea. Open quantum
assembly language. arXiv preprint arXiv:1707.03429, 2017.
[13] A. G. Fowler, S. J. Devi, and L. C. Hollenberg. Implementation of Shor’s
algorithm on a linear nearest neighbour qubit array. arXiv preprint quant-
ph/0402196, 2004.
[14] L. Gomes. antum computing: Both here and not here. IEEE Spectrum April
2018.
[15] A. S. Green, P. L. Lumsdaine, N. J. Ross, P. Selinger, and B. Valiron. ipper: a
scalable quantum programming language. In Conf. on Programming Language
Design and Implementation, pages 333–342, 2013.
[16] J. Hsu. CES 2018: Intel’s 49-qubit chip shoots for quantum supremacy. IEEE
Spectrum Tech Talk, 2018.
[17] K. Matsumoto and K. Amano. Representation of quantum circuits with Cliord
and pi /8 gates. arXiv preprint arXiv:0806.3834, 2008.
[18] D. M. Miller, R. Wille, and Z. Sasanian. Elementary quantum gate realizations
for multiple-control Toolli gates. In Int’l Symp. on Multi-Valued Logic, pages
288–293, 2011.
[19] J. Preskill. antum computing in the NISQ era and beyond. arXiv preprint
arXiv:1801.00862, 2018.
[20] E. A. Sete, W. J. Zeng, and C. T. Rigei. A functional architecture for scalable
quantum computing. In Int’l Conf. on Rebooting Computing (ICRC), pages 1–6,
2016.
[21] A. Shafaei, M. Saeedi, and M. Pedram. bit placement to minimize communi-
cation overhead in 2d quantum architectures. In Asia and South Pacic Design
Automation Conf., pages 495–500, 2014.
[22] M. Siraichi, V. F. Dos Santos, S. Collange, and F. M. Q. Pereira. bit alloca-
tion. In CGO 2018-IEEE/ACM International Symposium on Code Generation and
Optimization, pages 1–12, 2018.
[23] F. Vatan and C. Williams. Optimal quantum circuits for general two-qubit gates.
Physical Review A, 69(3):032315, 2004.
[24] D. Venturelli, M. Do, E. Rieel, and J. Frank. Compiling quantum circuits to
realistic hardware architectures using temporal planners. antum Science and
Technology, 3(2):025004, 2018.
[25] R. Wille, O. Keszocze, M. Walter, P. Rohrs, A. Chaopadhyay, and R. Drechsler.
Look-ahead schemes for nearest neighbor optimization of 1d and 2d quantum
circuits. In Asia and South Pacic Design Automation Conf., pages 292–297, 2016.
[26] R. Wille, A. Lye, and R. Drechsler. Exact reordering of circuit lines for nearest
neighbor quantum architectures. IEEE Trans. on CAD of Integrated Circuits and
Systems, 33(12):1818–1831, 2014.
[27] R. Wille, M. Soeken, C. Oerstedt, and R. Drechsler. Improving the mapping of
reversible circuits to quantum circuits using multiple target lines. In Asia and
South Pacic Design Automation Conf., pages 85–92, 2013.
[28] A. Zulehner, S. Gasser, and R. Wille. Exact global reordering for nearest neighbor
quantum circuits using A∗ . In Int’l Conf. of Reversible Computation, pages 185–
201. Springer, 2017.
[29] A. Zulehner, A. Paler, and R. Wille. An ecient methodology for mapping
quantum circuits to the IBM QX architectures. IEEE Trans. on CAD of Integrated
Circuits and Systems, 2018.
