TIGER: Topology-aware Assignment using Ising machines Application to
  Classical Algorithm Tasks and Quantum Circuit Gates by Butko, Anastasiia et al.
TIGER: Topology-aware Assignment using Ising machines
Application to Classical Algorithm Tasks and Quantum Circuit Gates
Anastasiia Butko · Ilyas Turimbetov · George Michelogiannakis · David
Donofrio · Didem Unat · John Shalf
September 20, 2020
Abstract Optimally mapping a parallel application to
compute and communication resources is increasingly
important as both system size and heterogeneity in-
crease. A similar mapping problem exists in gate-based
quantum computing where the objective is to map tasks
to gates in a topology-aware fashion. This is an NP-
complete graph isomorphism problem, and existing task
assignment approaches are either heuristic or based on
physical optimization algorithms, providing different
speed and solution quality trade-offs. Ising machines
such as quantum and digital annealers have recently
become available and offer an alternative hardware so-
lution to solve this type of optimization problems. In
this paper, we propose an algorithm that allows solv-
ing the topology-aware assignment problem using Ising
machines. We demonstrate the algorithm on two use
cases, i.e. classical task scheduling and quantum circuit
gate scheduling. TIGER—topology-aware task/gate as-
signment mapper tool—implements our proposed al-
gorithms and automatically integrates them into the
quantum software environment. To address the limi-
tations of physical solver, we propose and implement
a domain-specific partition strategy that allows solv-
ing larger-scale problems and a weight optimization al-
gorithm that allows tuning Ising model parameters to
achieve better restuls. We use D-Wave’s quantum an-
nealer to demonstrate our algorithm and evaluate the
proposed tool flow in terms of performance, partition
A. Butko · G. Michelogiannakis · D. Donofrio · J. Shalf
Lawrence Berkeley National Laboratory
Berkeley CA 94720, USA E-mail:
{abutko,mihelog,ddonofrio,jshalf}@lbl.gov
I. Turimbetov · D. Unat
Koc¸ University, Istanbul 34450, Turkey
E-mail: {iturimbetov18,dunat}@ku.edu.tr
efficiency, and solution quality. Results show significant
speed-up compared to classical solutions, better scala-
bility, and higher solution quality when using TIGER
together with the proposed partition method. It reduces
the data movement cost by 68% in average for quan-
tum circuit assignment compared to the IBM QX opti-
mizer [15].
Keywords Topology-aware task assignment · gate
scheduling optimization · Ising machine · quantum
annealing.
1 Introduction
The task assignment problem aims to maximize appli-
cation performance by balancing computational load
among multiple and often heterogeneous processing
units while reducing compute overhead. The task as-
signment problem has been shown to be equivalent to
a graph isomorphism problem by Bokhari [1], which
is known to be NP-complete [20,13]. Therefore, many
solvers for this problem are heuristic [31] that inevitably
tradeoff solution quality for computation speed, or
physical optimization algorithms, such as simulated an-
nealing [34], genetic techniques [25], and others. In
addition, solvers can have different optimization met-
rics that are often contradictory, such as computa-
tional load, communication cost, or a weighted com-
bination [29,4].
Scheduling quantum gates onto physical qubits is
similarly a challenging problem, given the complexity
and variety of quantum operations and physical restric-
tions of each quantum chip. To keep operations efficient,
quantum gates should be scheduled on quantum hard-
ware such as to minimize the number of operations and
maximize quantum circuit fidelity (how much quantum
ar
X
iv
:2
00
9.
10
15
1v
1 
 [c
s.E
T]
  2
1 S
ep
 20
20
2 A. Butko et al.
information is preserved), while taking into account
the connectivity between physical qubits [10]. Conse-
quently, many mapping algorithms scale poorly due to
runtime, memory usage, and the quality of their gen-
erated solutions [21]. In addition, the quality of their
solutions compared to the theoretical optimal is un-
known [35]. These challenges indicate that gate assign-
ment may hinder high-quality solutions on future quan-
tum accelerators with more physical qubits and com-
plex connectivity.
While genetic algorithms and simulated annealing
are often considered best practices, recent Ising ma-
chines offer an alternative hardware solution for a set of
optimization problems, such as task scheduling. These
Ising machines can be implemented using different tech-
nologies and exploit various physical effects. Such ex-
amples include coherent Ising machines [37], Fujitsu’s
digital annealer [9], and quantum annealers designed
by D-Wave Systems Inc. [16]. Several studing on quan-
tum annealers [22] [19] explore its capabilities and lim-
itations projecting the potential of these machines for
future use.
Despite the potential benefits offered by quantum
annealers combined with a growing interest in alterna-
tive solutions, practical applicability of annealing ma-
chines remains highly questionable. One of the reasons
is physical limitations of current machines, namely the
relatively small size of the chip and the poor connec-
tivity between qubits [19]. Problem sizes demonstrated
in comparison studies are usually not competitive with
those handled by classical solvers. Therefore, effective
problem partitioning and post-processing are required
to continue exploiting quantum solver capabilities while
the solution for physical limitations is sought [38]. That
makes most of the near-term quantum annealing-based
approaches classical-quantum hybrids.
Another obstacle towards wide-spread quantum an-
nealer adoption is programming complexity. Its pro-
gramming model is based on the Quadratic Uncon-
strained Binary Optimization (QUBO) [12] model that
is different form the conventional programming and re-
quires special approaches. The highest level that users
are required to program D-Wave is “virtual” QUBO,
where “virtual” means that the compiler takes care of
mapping and routing the problem while taking into ac-
count device connectivity. Transforming a problem into
QUBO format is not a trivial task. Higher-level tools as
well as efficient algorithms are typically required [27].
In this work, we present the Topology-aware task as-
sIGnment mappER (TIGER) to solve the assignment
problem using Ising machines. Namely, our contribu-
tions are:
– We develop an algorithm to assign Task-
Communication Graph (TCG) to the architecture
units minimizing the required data-movement
and maximizing the performance. The assignment
problem is expressed in the QUBO format to be
used by an Ising machine.
– We develop an algorithm to assign Quantum Circuit
Graph (QCG) to the qubits minimizing data move-
ment (number of SWAP operations) and miximizing
the fidelity. The assignment problem is expressed in
the QUBO format to be used by an Ising machine.
– We develop a domain-specific QUBO partitioning
algorithm (sub-QUBO) based on the graph depen-
dency levels to overcome current physical limita-
tions of existing quantum annealers and accelerate
the solution search.
– We develop a weight optimization algorithm (WOA)
to tune Ising equation parameters in order to priori-
tize target metrics and adjust them to obtain better
solutions.
– We implement these algorithms as a TIGER tool.
TIGER is written in Python and uses the NetworkX
package [7] to create and manipulate TCG/QCG
and ARC structures.
– We integrate TIGER into the D-Wave tool-flow by
supporting qbsolv qubo [2], qmasm [26] formats and
creating a feedback loop from D-Wave to TIGER in
order to evaluate the solution for further optimiza-
tions.
– We evaluate the proposed algorithms and its im-
plementation using D-Wave quantum annealer. We
compare the D-Wave solver performance and qual-
ity of the task assignment (solution) to the classical
TABU-search algorithm. We evaluate the quality of
the quantum circuits assignment in terms of the cir-
cuit fidelity using real IBM systems [15] and com-
pare it against IBM QX gate optimizer. Our results
show that TIGER with the D-Wave annealer pro-
vides up to 8% of computation cost improvement
and up to 25% of communication cost improvement
compared to the classical TABU-search solver when
assigning a TCG. It reduces the data movement cost
by 68% in average for quantum circuit assignment
compared to the IBM QX optimizer [15].
Given the relatively small size of the evaluated
quantum annealer, we leave the discussion on general
competitiveness of quantum annealers against classical
computing out of the scope of this paper. Our results
aim to provide useful insights on the entire tool-flow
including classical decomposition, domain-specific par-
tition and QUBO solvers. Last but not least, we would
like to extend an invitation to the community to use
TIGER and then contribute back to aid tool growth.
TIGER: Topology-aware Assignment using Ising machines 3
Latest updates, documentation, and support can be
found online 1.
The rest of the paper is organized as follows: Sec-
tion 2 provides the background on the existing Ising
machines. Section 3 and Section 4 describe the pro-
posed task assignment and quantum gate assignment
mapping approaches, respectively. Section 5 describes
TIGER tool implementation as well as its integration
into the complete tool-flow with the D-Wave program-
ming environment. Section 6 shows performance, qual-
ity, sensitivity and scalability evaluation results. Section
7 concludes the work.
2 Background
Ising machines are special-purpose processors that solve
the Ising model, an intensely-studied NP-complete
problem that is a system of interacting classical
spins [5]. An Ising model is mathematical model com-
posed of a large lattice of sites, where each site can be
in one of two states. This model can be used to model
the impact to the global state of the system caused by
changes to parameters (such as connectivity and desired
operations). Ising models have been used to express and
perform computation with different materials such as
lasers and magnets, but are also the basis of several
quantum accelerators because they are a natural fit to
express a graph of interconnected qubits.
2.0.1 Quantum annealers
Quantum annealing [18] is a metaheuristic technique
for solving local search problems, such as finding the
global minimum or maximum in a discrete search space.
Quantum annealing offers potential benefits compared
to popular heuristic algorithms through its quantum
tunneling effect. This effect allows the system to pen-
etrate energy barriers escaping from the local minima
and therefore find better solutions to the original opti-
mization problem.
A quantum annealing machine or a quantum an-
nealer is a hardware implementation of the adiabatic
quantum computing algorithm. Quantum annealers op-
erate on a set of qubits. A qubit is a two-state quantum-
mechanical system that can carry states |0〉 and |1〉 or
be in superposition that expresses a linear superposi-
tion of the ”basis states”, i.e. |0〉 and |1〉. This feature
forms the key power of quantum machines, which with
n qubits can be in an arbitrary superposition of up to 2n
different states simultaneously. Another inherent quan-
tum property of qubits is quantum entanglement where
1 https://github.com/lbnlcomputerarch/tiger
a group of qubits is coupled to each other in such a
way that the state of each qubit cannot be perceived
separately, but as a whole system state instead [24].
Quantum annealers provided by D-Wave Systems
Inc. have been commercially available since 2011 [16].
D-Wave quantum chips are implemented using super-
conducting technology and require an extreme isolated
environment with a temperature close to absolute zero.
A closed cycle dilution refrigerator cools the proces-
sor down to 15 mK. Therefore, while the actual quan-
tum chip is the size of a stamp, the physical volume
of the whole D-Wave system reaches 20 m3. However,
D-Wave machines consume less than 25 kW of power,
mostly for cooling and front-end servers [17]. In around
10 years, quantum annealing chips have reached 103
number of qubits, promising significant performance
improvement for certain computing problems in the
near future. Physically, qubits are connected to each
other using a so-called Chimera topology. The small-
est Chimera unit contains a complete bipartite graph
of eight vertices, each of which is connected to its four
neighbours inside the unit and to its two neighbours
outside the unit.
In [6], authors compare the performance of physical
quantum annealer (D-Wave 2X quantum annealer) to
simulated annealing and quantum Monte Carlo meth-
ods executed on a classical processor.
Furthermore, authors in [22] extend Google Inc.
studies by comparing quantum annealing to state-of-
the art optimization methods, introducing more sophis-
ticated assessment metrics. Their work considers four
categories of optimization methods: sequential meth-
ods that include quantum annealing, simulated an-
nealing and quantum Monte Carlo, tailored methods
that solve simplified optimization problems, and non-
tailored methods that are generic and thus represent the
state of the art. Authors conclude that physical quan-
tum annealing has better scaling compared to other se-
quential optimization methods, but it concedes to tai-
lored as well as non-tailored state-of-the-art methods.
Also, authors emphasize the importance of determin-
ing the application domain where quantum annealing
maximizes its benefits, but this has yet to be defined.
Finally, King et al. in [19] introduce a problem class
that can maximize usefulness of the quantum tunneling
effect. Authors again compare quantum annealers to
classical solvers and demonstrate three to four orders of
magnitude performance speed-up in favor of quantum
annealing.
Several studies demonstrate the use of quantum
annealing for task scheduling. In [32], authors intro-
duce a hybrid quantum-classical approach to solving
scheduling problems. Their framework integrates quan-
4 A. Butko et al.
0 1
4  T A S K S
0 1 2 3
0 q0 q4 q8 q12
1 q1 q5 q9 q13
2 q2 q6 q10 q14
3 q3 q7 q11 q15
P R O C E S S I N G  
U N I T S
T
A
S
K
S
2 3
4  P U s
a) QAP mapping on QUBO  
0
1
5  T A S K S
2
3
2  P U s
0 1 2 3 4
0 q0 q2 q4 q6 q8
1 q1 q3 q5 q7 q9
4
b) TCG mapping on QUBO  c) TCG partitioning and mapping on QUBO  
e d g e s
0
1 2 3
4
5
6 7
8
9
10
S G 1
S G 2
S G 3
X
X
X X
X X X X
1 1  T A S K S 4  P U s
0 1 2 3
0 q0 q4 q8 q12
1 q1 q5 q9 q13
2 q2 q6 q10 q14
3 q3 q7 q11 q15
4 5 6 7 1 2 3
0 q0 q4 q8 q12
1 q1 q5 q9 q13
2 q2 q6 q10 q14
3 q3 q7 q11 q15
8 9 10 4 5 6 7
0 q0 q4 q8
1 q1 q5 q9
2 q2 q6 q10
3 q3 q7 q11
S u b - Q U B O 1
S u b - Q U B O 2 S u b - Q U B O 3
i n p u t  e d g e s i n p u t  e d g e s
Fig. 1: Task Communication Graph (TCG) assignment on a heterogeneous multi-PU system: problem mapping
on QUBO.
tum annealing with classical computing into a guided
tree search. Classical algorithms manage a global tree
search and communicate the node search in QUBO for-
mat to the quantum annealer. Authors test the pro-
posed framework on three scheduling problems, i.e.
graph-coloring, mars lander task scheduling, and air-
port runway scheduling. Results show that the quan-
tum annealer’s output can effectively prune and guide
the search process. Authors motivate their work by ne-
cessity to expand on the capabilities of current quantum
annealers and do not expect quantum annealers to be
competitive in the near-term against classical comput-
ers.
In our work, we address a different scheduling
problem, i.e. topology-aware assignment. The proposed
TIGER framework extends existing software environ-
ments by automatically generating and dynamically ad-
justing QUBO files. We evaluate the tool flow in terms
of quantum solver performance, the quality of task/gate
assignment and discuss the potential scalability of near-
term machines.
2.1 Problem formulation and programming
Quantum annealers minimize the QUBO problem de-
scribed by Equation 1. The equation describes the evo-
lution of the time-dependent Hamiltonian [14] that aims
to find low-energy states in a system of N interacting
spins, i.e. qubits. In Equation 1, qi represents qubits
that take value from the set {0, 1}, hi is a weight co-
efficient associated with each qubit, Jij denotes the
strength of the couplings between two qubits qi and
qj and N is the number of qubits.
E(q1, ..., qN ) =
N∑
i=1
hi · qi +
N∑
i<j=1
Jij · qi · qj (1)
D-Wave annealer architectural designs impose a
number of limitations on Equation 1. Notably, chips do
not support all-to-all qubit connectivity. Thus, to cou-
ple two qubits located on different sides of the Chimera
grid, excessive routing through other qubits is required.
That dramatically cuts the number of available qubits
to be purely used for problem solving. Another limita-
tion concerns qubit weights and coupler strengths that
lie in a specific range, i.e. [-2;2] and [-1;1] respectively,
affecting the precision of the machine.
A low-level D-Wave program is expressed in the
form of Equation 1 as a list of hi and Jij with the asso-
ciated qubit numbers. The provided solution is a list of
qi values. This program is usually referred to as Quan-
tum Machine Instruction (QMI). At this level, all previ-
ously listed constrains, such as qubit connectivity, vari-
able range as well as the number of physically available
qubits, have to be taken into account. That makes D-
Wave programming a challenging task. However, there
are several tools to provide a certain level of abstrac-
tion by taking as input a so-called “virtual” QUBO that
abstracts away the size or connectivity topology of the
D-Wave system and maps the problem onto the physi-
cal hardware using different optimization techniques.
TIGER: Topology-aware Assignment using Ising machines 5
0
2
1
3
0
c o m p u t a t i o n  t a s k  
p l a c e m e n t
0
0 0
1 0
2 1
3 0
1 2 3
0 1 0
0 0 0
0 0 1
1 0 0
4 6 7
0 0 0
0 0 1
0 1 0
1 0 0
5
1
0
0
0
10
1
0
0
0
9
1
0
0
0
8
1
0
0
0
0
2
1
3
0
2
1
3
0
2
1
3
0
2
1
3
0
2
1
3
0
2
1
3
2
3 1
7
46
5 8 9 10
N o  i n p u t
e d g e s
s o u r c e  u n i t d e s t i n a t i o n  u n i t i n t e r - u n i t  
c o m m u n i c a t i o n
l o c a l  
c o m m u n i c a t i o n
Fig. 2: Binary solution interpretation: computation task assignment and communication impact.
3 Task Assignment Mapping Algorithm
3.1 Linear assignment problem
In the task allocation context, the Linear Assignment
Problem (LAP) consists of placing a set of independent
tasks onto a set of Processing Units (PUs), with each
assignment incurring a certain cost. The objective is to
assign each task to a PU such that the total cost is
minimized [3].
Figure 1(a) illustrates the transformation of the
LAP to QUBO. The qubit matrix Q represents the
permutation matrix X, where each qubit defines the as-
signment of a task to a specific PU similar to xij above.
An xij value of 1 represents that task i was assigned to
PU j. A weight coefficient hi (not shown) represents the
computational cost of the assignment. Since solvers in
current machines find local minima, we transform pos-
itive computation costs into negative numbers to pre-
vent the solver from giving all-zero answers. To respect
assignment constraints such as assigning one task to
one qubit, we use qubit couplings and give them high
penalty values such as Jij >> |hi|. For example, to
prevent task 0 from being placed on multiple PUs, we
couple qubits (q0 · q1), (q0 · q2), (q0 · q3), (q1 · q2), (q1 · q3)
and (q2 · q3) for four qubits. Therefore, if two of these
qubits have the same task assigned to them, the large
penalty value will make the overall solution ineligible.
3.2 Task-communication graph assignment
Applications can be represented as a weighted directed
acyclic graph, usually referred to as a Task Commu-
nication Graph (TCG). A TCG is defined as a tuple
G = (V,E), where V = (vi) is a set of weighted vertices
with the weight representing task computational cost,
and E = (ei,j) is a set of weighted edges with the weight
representing inter-task communication cost. An exam-
ple of TCG is shown in the upper part of Figure 1(b).
Mapping of such as TCG into QUBO differs from
previously shown LAP in three aspects. First, a TCG
includes not only computation cost, but also inter-task
communication cost expressed with graph edges. Sec-
ond, not all tasks are assigned to PUs within the same
time frame. A TCG is divided into multiple dependency
levels each of which represents a LAP. Dependency lev-
els (groups) are shown with red dashed lines. Third,
within each dependency level, the number of indepen-
dent tasks can be different compared to the number
of available PUs. The QUBO mapping transformation
respects each of the above three constraints.
Communication edges. Each communication edge is
included into QUBO by qubit coupling. Communica-
tion cost is represented by coupling strength. Total end-
to-end cost is calculated based on the weight of each
edge in the communication path. If both source and
destination tasks are assigned to the same PU, commu-
nication cost is equal to zero. This the most favourable
case if the objective is to minimize data movement. For
the example in Figure 1(b), to define the edge between
task0 and task1 we couple qubits (q0 · q3) and (q1 · q2)
with the associated topology-aware communication cost
and qubits (q0 · q2) and (q1 · q3) with zero communica-
tion cost. Here, cost values are converted to negative
numbers similar to computation cost values. The rela-
tive priority of communication and computation costs
can be formulated by adding a weight factor to bias the
solver.
Dependency levels. Because of dependencies, only
a certain number of tasks can be assigned to PUs in
parallel. This relaxes the second assignment constraint
that says that no more than one task can be placed at
a PU. This constraint is valid only for tasks belonging
to the same dependency group. For the example shown
in Figure 1(b), task 0 is separated from task 1 and task
2 with a red dashed line. Thus, we couple only qubits
(q2 · q4) and (q3 · q5) with a high penalty cost to prevent
placing them on the same PU, which would otherwise
be a valid solution for the solver. The first assignment
constraint that says that a task can not be placed on
multiple PUs at the same time remains unchanged.
Level adjustments. When the number of parallel
tasks exceeds the number of available computing re-
sources, an important decision has to be taken to priori-
6 A. Butko et al.
tize a set of tasks in the most efficient way. This decision
is reflected in the qubit matrix, i.e. the order of columns
associated to specific tasks and corresponding assign-
ment constrain couplings. Multiple approaches exist in
the field, but this study is out of the scope of this pa-
per. Here, we apply a simple cut based on the task ID
increment. Figure 1(b) illustrates the case in which task
4 belongs to dependency level 1, but is moved to the
next level. In case there are no available slots in the
following group of tasks, an additional level is created.
3.3 Domain-specific TCG partition
Given the number of logical qubits together with the
potential number of couplings and constrains per single
problem, we quickly exhaust the physical capabilities
of quantum machines. Therefore, an intelligent prob-
lem partition is required. There has been extensive re-
search on graph partitioning [30]. In this context, we
apply the method shown in Figure 1(c). This method
divides a TCG into sub-graphs (SGs) based on depen-
dency levels. The example shown in Figure 1(c) illus-
trates partitioning with two and three dependency lev-
els per sub-QUBO1/2 and sub-QUBO3 respectively.
The lowest degree of granularity corresponds to one
dependency level per sub-QUBO. Further division of
the problem will distort the concept of optimal par-
allel tasks assignment. The weakness of such a parti-
tioning is that only communication edges inside a SG
are regarded. Thus, multiple communication edges get
excluded from the problem and are not represented in
the qubit matrix. Excluded edges are labelled with red
crosses in Figure 1(c). This may have a significant im-
pact on the quality of the provided solution, especially
for communication-intensive applications.
Part of the novelty of our work is improving the par-
tition by applying an interactive previous-placement-
dependent approach. This approach takes advantage of
dependency level-based partitioning. Sub-QUBOs are
solved one after another and each previous SG place-
ment is used to enhance following sub-QUBOs. Our
mapper extends the qubit matrix with additional vir-
tual qubits–one per each unique source task of all ex-
cluded input edges (edges that are inputs to a SG).
This qubit is associated with a specific PU because the
previous task placement is already known at this point.
In Figure 1(c), virtual qubits are shown as red crosses
inside the sub-QUBO matrices and missed edges pre-
viously shown as crossed out are illustrated with red
arrows.
Our approach guides the solver towards a better so-
lution than is possible with heuristics alone, but does
not guarantee an optimal solution because the output
edges of the sub-graphs are still excluded from the prob-
lem and the future placement is not available at this
point. It should also be emphasized that QUBO mini-
mizes the sum of given costs, which are abstract posi-
tive numbers. Minimizing the sum does not guarantee
that parallel execution time is also minimized, if that
is determined by the slowest task.
3.4 Binary solution interpretation
Figure 2 illustrates the binary solution interpretation
by mapping the example graph from Figure 1(c) on the
four-unit mesh architecture. Each block corresponds to
a dependency level of the task-communication graph. It
contains three illustrative components, i.e. a qubit sub-
matrix with solution values, computation task place-
ment corresponding to the solution and communication
traffic based on the prior task placements. In case both
source and destination tasks are placed on the same
unit, the communication edge is marked as local com-
munication. Local communications do not contribute to
the data movement component of the objective func-
tion and represent the most favourable assignment for
communication cost minimization.
3.5 Computation and Communication costs
Computation and communication costs have been pre-
viously discussed as abstract positive numbers. How-
ever, the nature of the cost metric determines whether
the proposed method provides an optimal solution. If
the cost is based on delay and the goal of task assign-
ment is to minimize time, QUBO minimization will not
provide the optimal placement. This is because QUBO
minimizes the sum of the placement costs in each SG
and it does not guarantee that if placed in parallel task
execution time is minimum. For other metrics, such as
data movement, power consumption, energy, the pro-
posed method provides an optimal solution.
4 Gate Assignment Mapping Algorithm
4.1 Quantum Circuits
In the context of gate-based quantum computing, quan-
tum algorithms are usually represented in the form of so
called quantum circuits. Figure 3(a) shows an example
of the quantum circuit.To avoid confusion, the qubits
represented on the circuit will be referred to as logical
qubits and the real qubits inside a quantum computer
as physical qubits. Four horizontal lines represent logi-
cal qubit state evolution over time (from left to right).
TIGER: Topology-aware Assignment using Ising machines 7
H X
H[q3]
[q2]
[q1]
[q0]
+ X
H
Z
H
H
+
Z
0
1.2
1.1
2.2
2.1
3
4 8
95.2
5.1
6.2
6.1
7.1
7.2
S i n g l e - q u b i t  g a t e s
+ Z
T w o - q u b i t  g a t e s
0 1.1
1.2
3
2.1
2.2
4
9
8
5.2 6.1
6.2
7.1
7.2
5.1
q0
q1
q2
q3
q4
a) Quantum Circuit  b) Quantum Circuit as TCG  c) Quantum Chip Topologies  
x S i n g l e - q u b i t  
g a t e  t a s k x.1
x.2
T w o - q u b i t  
g a t e  t a s k s
I B M  V i g o
5 q u b i t s
q0 q1
q3
q2
q4
I B M  Q X 2
5 q u b i t s
Fig. 3: Quantum circuit graph: gate-to-qubit assignment.
Single- and two-qubit gates are applied on specific
qubits according to algorithm computations. Quantum
circuits can be transformed into a task-communication
graph similar to the classical algorithm transforma-
tion. In this case, quantum gates represent tasks that
have dependencies (black arrows). Figure 3(b) shows
the Quantum Circuit Graph (QCG) in the form of the
TCG. A two-qubit gate becomes two connected tasks
in the QCG. Moreover, two-qubit gates are directional,
i.e. there are source and destination qubits in the pair.
Topology-aware quantum gate assignment is based
on physical qubit connectivity inside the quantum chip.
Figure 3(c) shows an example of the 5-qubit chip con-
nectivity. Arrows show not only the connection between
two physical qubits, but also the supported direction for
the two-qubit gates.Because of the limited connectivity
between qubits, not all two-qubit gates can be directly
applied. For example, consider a circuit where a two-
qubit gate is applied to logical qubits 0 and 3, and the
circuit is matched to the architecture on Figure 3(c).
There are two ways to map the qubits to circuit. First
is to map the logical qubits to physical in a different
order such that logical 0 and 3 are mapped to physical
0 and 2. Another is to swap the underlying logical qubit
states, in case if they are already mapped to the archi-
tecture in the same order. For instance, if the states of
qubits 2 and 3 are swapped, the physical qubit 2 now
would contain the state of the logical qubit 3, making
it possible to apply the desired 2-qubit gate.
4.2 Fidelity and SWAP operation costs
Unlike a classical assignment optimization problem that
minimizes computation and communication costs (de-
scribed in Section 3.5), in quantum gate assignment op-
timization we target different metrics. One of the most
important parameters for quantum computations in the
NISQ era is fidelity. Circuit fidelity is a measure of how
much quantum information is preserved [23]. Due to the
noise, the experimentally-obtained output qubit state
is different from the desired output qubit state which
would have been obtained in the ideal scenario. There
is a direct correlation between the number of gates and
circuit fidelity.
Typically, in case of superconducting technology,
single-qubit gates have higher fidelity than two-qubit
gates, which require significantly more effort to tune
and improve. Each physical qubit is unique in its prop-
erties and has different fidelity per gate. The fidelity
resulting from mapping logical qubits and their corre-
sponding gates to the underlying architecture’s physical
qubits will be referred to as fidelitymapping.
There are several types of two-qubit gates. SWAP
gate swap the states between two-qubits. A SWAP gate
is usually decomposed into a sequence of three CNOT
two-qubit gates. CNOT belongs to the so-called native
set of gates that is supported by the control hardware
and quantum chip technology. The need of this opera-
tion is dictated by the nature of quantum computation
- it is not possible to make a copy of a qubit state (no-
cloning theorem [28] [36]). A SWAP gate is used to move
the qubit state to the right location. Thus, the num-
ber of SWAP operations Nswaps is similar to the data
movement (communication) cost of the classical TCG.
Consequently, the quantum state movement is required
to satisfy chip connectivity. This movement comes at
a cost, because two-qubit gates are the main source of
infidelity in quantum circuits. The reduction in fidelity
resulting from insertion of SWAP gates, each having fi-
delity fidelityswap, will be referred to as fidelitymovement.
fidelitymovement = (fidelityswap)
Nswaps
fidelitytotal = fidelitymapping ∗ fidelitymovement
(2)
Since two-qubit gates have lower fidelity, quantum
gate assignment optimization can be formulated as
Nswaps minimization. However, in order to obtain the
best total fidelity for the quantum circuit both of the
optimization parameters need to be taken into account,
i.e. gate mapping fidelity and minimum number of
SWAPs. That makes the optimization problem almost
8 A. Butko et al.
P r o b l e m
I N P U T
Q U B O
M a p p e r
Q M I
I n t e r f a c e  D e c o m p o s e r S o l v e r
A R C
T C G
D - W a v e
T A B U
s e a r c h
q b s o l v
M a p p i n g - t o - M e t r i c
s u b - *
. q u b o
. q m a s m
0 1 0  1 0 0  0 0 1
x x x  x x x  x x x  0 1 0  1 0 0  0 0 1  x x x  x x x  x x x
1 2
3 4 5 6 7
8
E
x
t
e
r
n
a
l 
M
o
d
e
ll
in
g
T
o
o
ls
T I G E R
M
a
p
p
in
g
 S
o
lu
t
io
n
s
s i z e  <  l i m i t  v a l u e
s i z e  >  l i m i t  v a l u e
Fig. 4: Topology-aware task assignment using TIGER and quantum annealing.
identical to the classical topology-aware task assign-
ment on extremely heterogeneous architectures, where
fidelitymapping represents computation performance to
be maximized and where Nswaps represents the com-
munication cost to be minimized. Equation 2 shows
how optimization of these two metrics can be refor-
mulated as total fidelity fidelitytotal maximization. A
large number of recent studies target the total circuit
fidelity maximization [8]. However, they solve the opti-
mization problem of the circuit gate decomposition and
assignment to minimize the number of gates, especially
SWAP gates, without consideration of fidelitymapping.
4.3 Weight Optimization Algorithm
Ising machine weights allow us to vary the priority
of one or another optimization metric. By scaling the
weights associated with SWAP minimization, either
the qubit fidelity or SWAP reduction can be priori-
tized. To scale the weights, a priority coefficient pref
is introduced.To arrive at the optimal solutions either
in terms of the resulting number of SWAP gates in-
serted or gate fidelity, we propose an optimization al-
gorithm. It searches for the coefficient value that max-
imizes fidelitytotal. Since fidelitytotal is obtained from
fidelitymapping and fidelitymovement, the algorithm can
also find a solution with maximum fidelitymapping or
minimum qubit movement. Due to infidelity of SWAP
gates, a solution with minimum Nswaps should cor-
respond to maximum fidelitytotal solution. However,
in a hypothetical fully-connected architecture where
qubit movement constraint is eliminated, fidelitymapping
would correspond to fidelitytotal. In such a scenario it
would be practical to maximize only mapping fidelity.
Optimizing only fidelitymapping or Nswaps metric can
also give an estimate of the bounds of these metrics in
case if no optimal solution is known beforehand. More-
over, the proposed optimization algorithm can be suit-
able when it is needed to maintain a specific compu-
tation to communication ratio in task assignment, for
example. The pseudocode is given in Algorithm 1 on
the facing page. The search starts with an initial pref-
erence coefficient, gets the corresponding metric value,
for example fidelitytotal, and compares it to other solu-
tions with a larger and smaller coefficient. The search
space range is defined by setting the parameter sSpr.
How fast the algorithm converges is defined by the pa-
rameter sRed, which reduces the search space at every
step. For better local search space exploitation lines 6-
17 can be repeated with sSpr =
√
sSpr.
5 TIGER
5.1 D-Wave programming environment
Qbsolv [2] is an open source decomposing solver that
focuses on large-scale problems that do not fit into phys-
ical hardware. In addition to the D-Wave annealer in-
terface, qbsolv has an embedded classical solver that
implements the tabu search algorithm [11] to minimize
the QUBO problem. Qmasm [26] is a quantum macro
assembler that provides extra flexibility for program-
ming. A qmasm program can be run on both D-Wave
annealers and qbsolv classical solvers.
5.2 TIGER tool flow
Figure 4 shows the tool flow for the task/gate as-
signment problem optimization. The key component of
the flow is our proposed TIGER tool. TIGER is an
TIGER: Topology-aware Assignment using Ising machines 9
Algorithm 1: Preference coefficient optimization
Data: QCG,ARC
Result: fidelitybest, prefbest
1 sSpr = 2 // search spread, sets the search space range
2 sRed = 0.9 // spread reduction, reduces sSpr at every
step for convergence
3 prefbest = 0.05 // initial preference coefficient
4 fidelitybest = tiger(QCG,ARC, pref)
5 while sSpr > 1 do
6 prefleft = pref/sSpr
7 prefright = pref ∗ sSpr
8 fidelityleft = tiger(QCG,ARC, prefleft)
9 fidelityright = tiger(QCG,ARC, prefright)
10 if fidelityleft > fidelitybest then
11 fidelitybest = fidelityleft
12 prefbest = prefleft
13 end
14 if fidelityright > fidelitybest then
15 fidelitybest = fidelityright
16 prefbest = prefright
17 end
18 sSpr = sSpr ∗ sRed
19 end
open-source QUBO mapper written in Python. It uses
NetworkX python package [7] to create and manipu-
late TCG/QCG and ARC structures, i.e. computing
the computation and communication costs for classi-
cal problems and fidelity and SWAP costs for quantum
problems taking into account hardware (architecture)
topology. We demonstrate TIGER on the D-Wave ma-
chine.
TIGER receives two files as inputs (marked as red
‘1’ to denote step 1), namely TCG or QCG and ARC
(architecture). TCG describes the classical applica-
tion’s TCG, QCG describes the quantum algorithm’s
QCG, while ARC describes the architecture (hardware
topology). The format of these files is presented in Fig-
ure 5 (a) and (b). The TCG file consists of lines of two
types associated to application tasks and edges. Task
lines contain a task ID and multiple cost values each of
a different type, e.g. number of integer, floating point,
memory access instructions. Edge lines contain an edge
ID, source and destination task IDs, and a cost value,
e.g. the amount of data to be transferred between two
tasks in bytes. The architecture file describes the archi-
tecture topology and its details such as number of rows
and columns, number of PUs, and the capabilities of
each PU and link such as cost per type of instructions,
link throughput, etc.
Using the algorithm described in Section 3, TIGER
maps input TCG and ARC files into the QUBO format
and generates the QMI interface file (step ‘2’). It sup-
ports both qmasm and qubo formats and can generate
a single file per problem or multiple files in case the
QUBO partitioning option is chosen. If the size of the
t a s k 	 I D 	 [ c o s t 1 ] [ c o s t 2 ] 	 [ c o s t 3 ]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0 	 0 	 1 0 	 1 0 	
1 	 0 	 2 0 	 2 0
…  
e d g e 	 I D t a s k _ 1 t a s k _ 2 [ c o s t ]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0 0 	 1 	 2
1 	 0 	 2 	 2
…
P a r a m e t e r 	 I D s V a l u e
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
T o p o l o g y M E S H
N u m R o w s 2
N u m C o l s 2
N u m P U s 4
P U . 0 . . 3 1 , 2 , 2 , 4
L i n k 	 	 	 	 	 	 	 	 	 0 . . 3 2 , 2 , 2 , 2
…
a) Application TCG file format b) Architecture ARC file format
Fig. 5: Topology-aware task assignment problem input.
Table 1: Benchmark suite
Workload Problem size Tasks # Edges #
Ultrasound 9x5x10 15 15
RS-encoder 32x28x8 141 140
RS-decoder 32x28x8 526 789
problem is less than the physical limit value, i.e. qubit
sub-matrix size, QUBO or sub-QUBO can be directly
solved (step ‘3’). Otherwise, it has to be further de-
composed by qbsolv and then solved (step ‘4’). In both
cases the problem is solved by two available solvers: the
D-Wave annealer or a TABU search qbsolv implemen-
tation (step ‘5’).
Finally, the solver generates mapping solutions that
are sent back to the TIGER tool. If the solution corre-
sponds to sub-QUBO (step ‘7’), it is used by TIGER
to generate the next sub-QUBO as described in Sec-
tion 3.2. If the solution is complete (step ‘6’) or the
last sub-QUBO problem is solved, TIGER calculates
the final cost of the assignment through its Mapping-
to-Metric (MtoM) interface (step ‘8’). This cost can be
used to estimate the quality of the solution.
6 Results
6.1 Experimental setup
Experiments are conducted on a hybrid classical-
quantum system that consists of an Intel Core i7 run-
ning at 3.3 GHz with 16 GB 2133 MHz LPDDR3 and
a D-Wave 2X (DW2X) quantum annealer [16] that has
1152 qubits and 2400 couplers.
For classical TCG assignment optimization, we use
three workload TCGs from the COSMIC benchmark
set [33]. The choice of these three workloads is dictated
by the differences in its problem size, number of tasks,
and number of edges. A detailed analysis and classifi-
cation of the application graphs in the context of the
Ising model evaluation can provide additional insights.
Such as study is out of the scope of this paper. The
TCG files are provided by external modelling tool, i.e.
10 A. Butko et al.
Fig. 6: Delay-to-Solution evaluation: (I) - classical TABU-search solver w/o TIGER sQ partition, (II) - quantum
DW solver w/o TIGER sQ partition and (III) - quantum DW solver with TIGER sQ partition.
the COSMIC benchmark suite [33]. Table 1 shows the
set of chosen benchmarks and their characteristics.
For quantum QCG assignment optimization, we cre-
ate the QCG files formatted for TIGER from the quan-
tum benchmark suite [39]. We create ARC files based
on two IBM quantum chips [15]: IBM Yorktown (QX2)
with 5 qubits and IBM Vigo with 5 qubits. Figure 3 (c)
illustrates these two topologies. The quantum bench-
mark suite [39] provides 48 circuits for 5-qubit chips.
We reduce the circuit size down to 50 gate.
6.2 Tool flow evaluation
For each workload we evaluate three scenarios: (I)
TIGER QUBO mapper - qbsolv decomposer/TABU-
search qbsolv solver - TIGER MtoM interpretor, (II)
TIGER QUBO mapper - qbsolv decomposer/DW
solver - TIGER MtoM interpretor and (III) TIGER
QUBO mapper/TIGER SG partitioner - qbsolv decom-
poser/ TABU-search qbsolv solver - TIGER MtoM in-
terpretor. For each scenario, we vary the size of the
architecture to a 2×2 PU mesh, 4×4 PU mesh, and an
8×8 PU mesh.
Figure 6 shows evaluation results. Here, we report
the delay normalized to the total delay of the longest
case. Each delay is also broken down to its four major
components. In all cases, the longest scenario is the
one fully executed on a classical computer, e.g. scenario
I. In addition, we show the number of logical qubits
and couplers generated by TIGER’s mapper (qubits #
and couplers #), the number of partitions provided by
qbsolv’s decomposer (partitions #), and the number of
SGs generated by TIGER’s partitioner (tiger sQs #).
The number of qubits in scenarios I and II is equal,
but it is higher in scenario III because additional qubits
are required to define previous sub-QUBO placements
as shown in Figure 1. Similarly, the number of couplers
as well as the number of partitions in scenarios I and II
are equal. It is lower in scenario III due to the optimized
QUBO mapping. The number of TIGER sub-QUBOs
is reported only for scenario III. In scenarios I and II
this TIGER option is not applied (na).
Discussion: Performance evaluation results prove
that the physical quantum annealer, i.e. DW2X, can
significantly reduce delay-to-solution compared to the
classical qbsolv solver. For the given set of bench-
marks and architecture configurations, the performance
speedup of the DECOMPOSER-SOLVER phase varies
between 1.2× and 10.2×. The major portion of this im-
provement is caused by the replacement of the classical
solver with the quantum annealer. The average value
of the DW2X access time is around 20ms. This time
includes programming time, sampling time and post-
processing time. The sampling phase consists of multi-
ple sample batches, each of which includes annealing,
readout, and additional delay that allows the quantum
annealer to cool down to the initial state. The anneal-
ing time is 20µs. Although QUBO is solved by a phys-
ical quantum annealer, a significant amount of time
associated to the problem decomposition is spent by
qbsolv DECOMPOSER. The total D-Wave SOLVER
phase is composed of multiple D-Wave accesses, where
the number of accesses is determined by the number of
partitioned calls provided by qbsolv DECOMPOSER.
Therefore, while using the quantum annealing solver
the delay-to-solution phase highly depends on the qual-
ity of the classical decomposition.
In scenario III, we evaluate the impact of the
domain-specific partitioning integrated into the QUBO
mapper, i.e. TIGER level partitioner. Here, reported
values represent the sum of all sub-QUBO parameters
concerning the total number of qubits and couplers as
well as delays per phase. Results show that by applying
two-level QUBO partitioning (i.e. domain-specific first
and classical qbsolv second), a massive speedup in time-
to-solution can be achieved. For the given set of TCGs
and ARCs, the DECOMPOSER-SOLVER phase is re-
duced down to 6% compared to the baseline scenario.
TIGER: Topology-aware Assignment using Ising machines 11
(a) Ultrasound-9x5x10
(b) Reed-Solomon Encoder-32x28x8
(c) Reed-Solomon Decoder-32x28x8
Fig. 7: Task assignment sensitivity and quality of the solution. (DW, single): DW w/o sQ vs. classical TABU-
search w/o sQ, (qbsolv, sQ): classical TABU-search with sQ vs. classical TABU-search w/o sQ and (DW, sQ):
DW with sQ vs. classical TABU-search w/o sQ.
Such an improvement has several sources. First, TIGER
partition significantly simplifies the task for qbsolv DE-
COMPOSER, which performs better on a smaller sub-
set of qubits and coupler tasks than on a single large
problem. Consequently, qbsolv generates fewer parti-
tion calls thereby reducing D-Wave SOLVER phase
delay. This effect is particularly noticeable for larger
TCGs where the number of partitions is reduced twice.
The total number of qubits and couplers is also differ-
ent compared to the baseline. By applying the mini-
mum number of qubits possible and adjusting the level
of granularity (i.e. one sub-level per sub-QUBO), we re-
duce the total number of couplers. These improvements
are achieved at the expense of having a larger number
of qubits. This increase is 12% by average compared to
the baseline. On the other hand, additional partitioning
can potentially impact the quality of the generated so-
lution. This effect is evaluated in the following section.
6.3 Task assignment evaluation
We evaluate the assignment quality and multiple-run
sensitivity in three comparison scenarios: (i) single
QUBO on quantum annealer versus classical qbsolv
solver (dw, single), (ii) partitioned sub-QUBOs versus
single QUBO assignment on classical qbsolv solver (qb-
solv, sQ), and (iii) partitioned sub-QUBOs on quan-
tum annealer versus single QUBO assignment on classi-
cal qbsolv solver (DW, sQ). Architecture configuration
files represent a 2×2, 4×4, or 8×8 heterogeneous PU
MESH with an abstract PU acceleration factor varied
from 1× to 4×. Link cost is equal to 2. Figure 7 shows
the difference in computation, communication and to-
tal costs for the three evaluation scenarios compared to
the baseline.
Discussion: In some cases, we obtain the same so-
lution over multiple runs. If different solutions are re-
turned, usually the variation is within 5% from the
mean value. For a given set of experiments, DW2X
quantum solver provides solution improvements for a
single QUBO compared to the classical TABU-search
solver. Results show up to 8% of computation cost im-
provement, up to 25% of communication cost improve-
ment, and up to 15% of total improvement. Both qb-
solv sQ and DW sQ scenarios show similar behaviour
in most experiments. However, again DW2X quan-
tum solver provides better solutions, e.g. RS-Encoder
12 A. Butko et al.
Fig. 8: IBM Vigo: mapping fidelity, number of swaps
and total fidelity.
mapped on 2×2 MESH and RS-Decoder mapped on
2×2 MESH.
Qbsolv sQ and DW sQ evaluations show that
dependency-level partitioning indeed can significantly
impact assignment quality, namely its communication
constituent. This impact increases when architecture
size scales. MtoM communication difference rises from
35% to 45% and then to almost 4× for US TCG
mapped on 2×2, 4×4 and 8×8 architectures shown in
Figure 7(a). Similarly, it changes from -2% to 6% and
then to 60% for RS Encoder TCG as shown in Fig-
ure 7(b). However, the computation constituent does
not deteriorate. In both TCGs, task computation costs
far outweigh communication edge cost. For instance, US
computation cost ranges between 4,510 and 3,461,112,
while communication highest cost is 20, 60 and 140 for
2×2, 4×4, and 8×8 MESHes respectively. Thereby, cal-
culated edge weights and associated qubits couplings
have low impact on the total QUBO cost. Indeed, the
total MtoM difference follows the computational cost
behaviors, e.g. DW sQ in Figure 7(b.1) or qbsolv SQ
in Figure 7(a.2). By prioritizing the edge cost versus
task cost, the communication MtoM difference can be
significantly reduced. In contrast, RS Decoder TCG is
communication intensive. The computation cost varies
to up to 1,880, while the communication cost reaches
14,280 for 8×8 MESH. In this case, DW sQ partition
does not impact the solution quality, but improves it
by up to 15%.
6.4 Gate assignment evaluation
In order to make an estimation of how scaling the
weights would correlate with Nswaps and fidelitymapping
a set of experiments was performed on the QCGs men-
tioned in Section 6.1.2. The preference coefficient varies
from 0.01 to 30. Figure 8 shows the mapping fidelity
(fidelitymapping), number of swaps (Nswaps) and total
fidelity for different coefficient values. Smaller coeffi-
cients minimize qubit state movement, while larger ones
prioritize mapping fidelity instead. Black box shows a
near-optimum region of the priority coefficient. Using
the priority coefficient smaller than 0.05 results in in-
valid solutions being produced by the algorithm and
can even lead to the opposite effect, increasing Nswaps
instead. Setting the coefficient larger than 20 provides
only small improvement of fidelitymapping, but it only
happens in some architectures and incurs an inadequate
number of additional SWAPs. Hence, applicable coef-
ficient values that produce the minimum Nswaps and
maximum fidelitymapping are approximately 0.05 and
20, respectively. Total fidelity strongly correlates with
the number of SWAPs and mapping fidelity plays a
negligible role in this scenario.
Discussion: Since fidelitymovement coming from
Nswaps has a larger impact on fidelitytotal, usually
Nswaps is minimized and gate fidelity is not consid-
ered. It means that the priority coefficient that max-
imizes fidelitytotal is the same that minimizes Nswaps,
i.e. 0.05. However, as connectivity in quantum com-
puting architectures increases, qubit movement might
become less significant. In such a context maximiza-
tion of fidelitytotal would be entirely dependent on
fidelitymapping.
6.4.1 Weight optimization algorithm evaluation
To tackle any possible scenario, fidelitytotal can be max-
imized regardless of connectivity and gate fidelity. The
priority coefficient that allows such a maximization is
unknown, and can vary for every different circuit and
architecture. We study the proposed weight optimiza-
tion algorithm to assess its efficiency in finding the op-
timal priority coefficient for a combination of quantum
circuit and device topology.
Figure 9 shows total fidelity and number of SWAPs
optimization using WOA algorithm for multiple circuits
for IBM Vigo and IBM QX2 topologies. The results in-
clude initial value at the beginning of the algorithm
execution and the final value. For IBM Vigo topology
(results in Figure 9 (a) and (b)), the WOA finds the
priority coefficient that reduces the number of SWAPs
from the initial step value in 62.5% of cases. In 37.5%
of cases the number of SWAPs remains unchanged. The
results with strong reduction are highlighted in green.
In average, WOA improves total fidelity by 39% for
IBM Vigo topology. For IBM QX2 topology (results in
Figure 9 (c) and (d)), the WOA finds the priority co-
efficient that reduces the number of SWAPs from the
initial step value in 83.3% of cases. In one case the num-
ber of SWAPs remains unchanged, and in 14.6% of cases
WOA provides weak increase of the SWAPs number. In
average, WOA improves total fidelity by 107% for IBM
QX2 topology.
TIGER: Topology-aware Assignment using Ising machines 13
(a) Vigo: Fidelity (b) Vigo: Number of SWAPs
(c) QX2: Fidelity (d) QX2: Number of SWAPs
Fig. 9: Quantum gate assignment: wieght optimization algorithm search
Discussion: The results show significant difference
in WOA performance when applied on different topolo-
gies. While in general WOA allowed us finding more
suitable combination of QUBO weights (preference co-
efficient) for both topologies, IBM QX2 mapping is
much more sensitive towards priority coefficient choice.
Moreover, in few cases WOA missed optimal solution
that resulted in a weak increase in SWAPs number com-
pared to the initial state value. We believe, that the rea-
son lies in the complexity of the topology graph that
calls for the QUBO weights adjustments to find the
most suitable combination in a near-optimum region.
6.4.2 Comparison
Finally, we compare the performance of TIGER
topology-aware SWAP optimizer against the IBM QX
optimizer. Figure 10 shows the comparison results
across multiple circuits for two topologies, i.e. vigo and
qx2. The numbers show the final number of SWAPS.
The SWAP reduction color map highlights the cases
when one of the optimizer provides a better result with
the SWAP number differences as follow: (i) 1-2 SWAPs,
(ii) 3-4 SWAPs, (iii) 5-7 SWAPs or (iv) more than 7.
For the vigo topology, TIGER and IBM QX provides
same SWAP number in 18.7% of cases; IBM QX outper-
forms TIGER in 41.7% of cases with the total reduction
difference of 51 SWAPs; and TIGER outperforms IBM
QX in 39.6% of cases with the total reduction difference
of 59 SWAPs. For the qx2 topology, TIGER and IBM
QX provides same SWAP number only in 4.2% of cases;
IBM QX outperforms TIGER in 8.3% of cases with the
total reduction difference of 12 SWAPs; and TIGER sig-
nificantly outperforms IBM QX in 87.5% of cases with
the total reduction difference of 260 SWAPs. Moreover,
TIGER found the perfect mapping reducing the data
movement to 0 SWAPs in 16.7% of cases, while IBM
QX found the perfect matching only in 4.2% of cases.
Discussion: Similar to the WOA evaluation results
(see section 6.4.1), the comparison results show signif-
icant difference when applied on different topologies.
TIGER allowed us significantly improve the mapping
for IBM QX2 topology compared to the IBM QX opti-
mizer. We believe, that the reason also lies in the topol-
ogy graph complexity. Classical IBM QX optimizer is
not suitable for more complex topologies with a larger
number of potential combinations, while TIGER opti-
mizer allows us to find the ‘perfect’ mapping regardless.
14 A. Butko et al.
T
op
ol
og
y
O
pt
im
iz
er
4_
49
_1
6
4g
t1
0-
v1
_8
1
4g
t1
1_
82
4g
t1
1_
83
4g
t1
1_
83
4g
t1
3_
90
4g
t1
3_
91
4g
t1
3_
92
4g
t1
3-
v1
_9
3
4g
t5
_7
5
4g
t5
_7
6
4g
t5
_7
7
4m
od
5-
v0
_1
8
4m
od
5-
v0
_1
9
4m
od
5-
v0
_2
0
4m
od
5-
v1
_2
2
4m
od
5-
v1
_2
3
4m
od
5-
v1
_2
4
4m
od
7-
v0
_9
4
4m
od
7-
v1
_9
6
aj
-e
11
_1
65
al
u-
v0
_2
6
al
u-
v0
_2
7
al
u-
v1
_2
8
al
u-
v1
_2
9
al
u-
v2
_3
1
al
u-
v2
_3
2
al
u-
v2
_3
3
al
u-
v3
_3
4
al
u-
v3
_3
5
al
u-
v4
_3
6
al
u-
v4
_3
7
de
co
d2
4-
bd
d_
29
4
de
co
d2
4-
v1
_4
1
de
co
d2
4-
v3
_4
5
hw
b4
_4
9
m
in
i-a
lu
_1
67
m
od
10
_1
71
m
od
10
_1
76
m
od
5d
1_
63
m
od
5d
2_
64
m
od
5m
ils
_6
5
on
e-
tw
o-
th
re
e-
v0
_9
7
on
e-
tw
o-
th
re
e-
v0
_9
8
on
e-
tw
o-
th
re
e-
v1
_9
9
on
e-
tw
o-
th
re
e-
v2
_1
00
on
e-
tw
o-
th
re
e-
v3
_1
01
rd
32
_2
70
vigo TIGER 16 18 8 6 4 14 14 17 18 18 20 13 18 11 5 4 15 10 16 16 18 17 11 13 12 18 14 13 19 11 16 11 15 15 17 19 14 15 17 7 17 10 18 18 18 18 19 16
IBM QX 16 16 12 9 6 21 21 17 15 15 19 21 16 12 5 8 15 12 15 17 21 18 12 12 12 15 17 9 16 12 16 12 14 14 14 13 18 20 13 7 17 10 16 13 16 15 18 17
qx2 TIGER 0 2 1 0 0 5 5 4 2 7 3 4 3 1 0 0 4 2 0 4 7 3 1 1 1 2 4 2 4 1 0 1 4 3 3 0 4 3 1 0 4 2 3 1 3 4 4 3
IBM QX 10 6 4 1 0 0 0 11 11 6 9 11 10 6 5 5 13 6 10 11 6 12 5 5 5 10 7 2 8 5 10 5 5 10 10 13 6 11 10 8 8 9 6 10 10 11 16 6
SWAP Reduction Difference (Color Map) 1-2 3-4 5-7 >7
Fig. 10: Optimizer comparison: TIGER vs. IBM QX
7 Conclusions
In this paper, we propose an algorithm for solving
the topology-aware task/gate assignment problem on
physical Ising machines in order to accelerate and
improve the quality of the solution to this challeng-
ing NP-complete problem. We implement our solu-
tion in our TIGER tool that transforms weighted
task-communication, quantum circuit, and architecture
graphs into an appropriate format of the Hamiltonian
function. Our solution takes into account both compu-
tation and communication costs for the classical prob-
lem or fidelity and SWAP number for the quantum
problem. We evaluate the proposed approach using D-
Wave’s quantum annealer. In order to overcome exist-
ing physical limitations of current quantum annealers,
we propose domain-specific partitioning based on the
task-communication graph dependency levels. Also, we
propose weight optimization algorithm that enables ad-
justing the model parameters and find better solutions.
We integrate TIGER into the D-Wave software stack
that enables us to apply both our proposed dependency-
level partitioning as well as the partitioning provided by
the qbsolv tool in a dynamic iterative way. We demon-
strate that our method can reach 15% higher-quality
solutions 9% faster compared to the classical qbsolv
heuristic algorithm. Finally, TIGER reduces the data
movement cost by 68% in average for quantum circuit
assignment compared to the IBM QX optimizer [15].
Our work alleviates the concern that task mapping may
hinder high-quality solutions on future quantum accel-
erators with more physical qubits and complex connec-
tivity. The TIGER tool is publicly available online 2.
For future work, we consider three major directions:
– Comparison to a wide range of classical
scheduling tools: we plan to design a methodol-
ogy to compare the hardware optimizer, i.e. Ising
machine, to existing heuristic software tools.
2 https://github.com/lbnlcomputerarch/tiger
– Use other Ising machines: we plan to expand our
study running the problem on other Ising machines,
such as digital annealer [9] and coherent Ising ma-
chine [37].
– Problem partitioning algorithms and addi-
tional constrains mapping: we plan to evaluate
additional graph partitioning algorithms and alter-
native problem mapping algorithms, e.g. assigning
multiple tasks in one node based on the node capac-
ity.
Acknowledgements The research leading to these results
has received funding from the the U.S. Department of En-
ergy, grant agreement no DE-AC02-05CH11231.
References
1. Bokhari, S.H.: On the mapping problem. IEEE Trans-
actions on Computers C-30(3), 207–214 (1981). DOI
10.1109/TC.1981.1675756
2. Booth, M., Reinhardt, S.P., Roy, A.: Partitioning opti-
mization problems for hybrid classical/quantum execu-
tion. Tech. rep. (2017)
3. Burkard, R., Dell’Amico, M., Martello, S.: Assignment
Problems. Society for Industrial and Applied Mathemat-
ics, PA, USA (2009)
4. Chan, C.P., Bachan, J.D., Kenny, J.P., Wilke, J.J., Beck-
ner, V.E., Almgren, A.S., Bell, J.B.: Topology-aware per-
formance optimization and modeling of adaptive mesh
refinement codes for exascale. In: 2016 First Interna-
tional Workshop on Communication Optimizations in
HPC (COMHPC), pp. 17–28 (2016)
5. Daskalakis, C., Dikkala, N., Kamath, G.: Testing ising
models. IEEE Transactions on Information Theory pp.
1–1 (2019). DOI 10.1109/TIT.2019.2932255
6. Denchev, V.S., Boixo, S., Isakov, S.V., Ding, N., Bab-
bush, R., Smelyanskiy, V., Martinis, J., Neven, H.: What
is the computational value of finite-range tunneling?
Phys. Rev. X 6, 031015 (2016). URL https://link.aps.
org/doi/10.1103/PhysRevX.6.031015
7. developers, N.: Networkx. software for complex networks.
https://networkx.github.io
8. Dueck, G.W., Pathak, A., Mazder Rahman, M.,
Shukla, A., Banerjee, A.: Optimization of Circuits for
IBM’s five-qubit Quantum Computers. arXiv e-prints
arXiv:1810.00129 (2018)
TIGER: Topology-aware Assignment using Ising machines 15
9. Fujitsu: Digital annealer.
http://www.fujitsu.com/global/digitalannealer/ (2019)
10. Giacomo Guerreschi, G., Park, J.: Two-step approach to
scheduling quantum circuits. Quantum Science and Tech-
nology 3(4), 045003 (2018). DOI 10.1088/2058-9565/
aacf0b
11. Glover, F., Laguna, M.: Tabu Search. Kluwer Academic
Publishers, Norwell, MA, USA (1997)
12. Glover, F.W., Kochenberger, G.A.: A tutorial on formu-
lating qubo models. ArXiv abs/1811.11538 (2018)
13. Hoefler, T., Snir, M.: Generic topology mapping strate-
gies for large-scale parallel architectures. In: Proceedings
of the International Conference on Supercomputing, ICS
’11, pp. 75–84. ACM, New York, NY, USA (2011). URL
http://doi.acm.org/10.1145/1995896.1995909
14. Hwang, F.K.: The hamiltonian property of linear func-
tions. Oper. Res. Lett. 6(3), 125–127 (1987). DOI 10.
1016/0167-6377(87)90024-1. URL http://dx.doi.org/
10.1016/0167-6377(87)90024-1
15. IBM: Ibm q systems. https://www.ibm.com/quantum-
computing/technology/systems/
16. Inc., D.W.S.: D-wave. the quantum computing company.
https://www.dwavesys.com
17. Inc., D.W.S.: The d-wave 2x quantum computer. tech-
nology overview. Tech. rep. (2015)
18. Kadowaki, T., Nishimori, H.: Quantum annealing in the
transverse ising model. Phys. Rev. E p. 5355 (1998)
19. King, J., Yarkoni, S., Raymond, J., Ozfidan, I., King,
A.D., Nevisi, M.M., Hilton, J.P., McGeoch, C.C.: Quan-
tum Annealing amid Local Ruggedness and Global Frus-
tration. ArXiv e-prints (2017)
20. Lee, S.Y., Aggarwal, J.K.: A mapping strategy for par-
allel processing. IEEE Transactions on Computers C-
36(4), 433–442 (1987)
21. Li, G., Ding, Y., Xie, Y.: Tackling the Qubit Mapping
Problem for NISQ-Era Quantum Devices. arXiv e-prints
arXiv:1809.02573 (2018)
22. Mandra`, S., Zhu, Z., Wang, W., Perdomo-Ortiz, A., Katz-
graber, H.G.: Strengths and weaknesses of weak-strong
cluster problems: A detailed overview of state-of-the-art
classical heuristics versus quantum approaches. ArXiv
e-prints 94(2), 022337 (2016)
23. Markov, I.L., Fatima, A., Isakov, S.V., Boixo, S.: Quan-
tum Supremacy Is Both Closer and Farther than It Ap-
pears. arXiv e-prints arXiv:1807.10749 (2018)
24. Nielsen, M.A., Chuang, I.L.: Quantum Computation and
Quantum Information: 10th Anniversary Edition, 10th
edn. Cambridge University Press, New York, NY, USA
(2011)
25. Orduna, J.M., Silla, F., Duato, J.: A new task mapping
technique for communication-aware scheduling strate-
gies. In: Proceedings International Conference on Parallel
Processing Workshops, pp. 349–354 (2001)
26. Pakin, S.: A quantum macro assembler. In: 2016
IEEE High Performance Extreme Computing Conference
(HPEC), pp. 1–8 (2016)
27. Pakin, S., Reinhardt, S.P.: A survey of programming
tools for d-wave quantum-annealing processors. In:
R. Yokota, M. Weiland, D. Keyes, C. Trinitis (eds.) High
Performance Computing, pp. 103–122. Springer Interna-
tional Publishing, Cham (2018)
28. Park, J.L.: The concept of transition in quantum mechan-
ics. Foundations of Physics 1(1), 23–33 (1970). DOI
10.1007/BF00708652. URL https://doi.org/10.1007/
BF00708652
29. Salimi, R., Motameni, H., Omranpour, H.: Task schedul-
ing with load balancing for computational grid using nsga
ii with fuzzy mutation. In: 2012 2nd IEEE International
Conference on Parallel, Distributed and Grid Computing
(2012)
30. Schaeffer, S.E.: Survey: Graph clustering. Comput. Sci.
Rev. (2007)
31. Taura, K., Chien, A.: A heuristic algorithm for map-
ping communicating tasks on heterogeneous resources.
In: Proceedings 9th Heterogeneous Computing Workshop
(HCW 2000) (Cat. No.PR00556), pp. 102–115 (2000)
32. Tran, T.T., Do, M., Rieffel, E.G., Frank, J., Wang,
Z., O’Gorman, B., Venturelli, D., Beck, J.C.: A hybrid
quantum-classical approach to solving scheduling prob-
lems. In: Ninth Annual Symposium on Combinatorial
Search (2016)
33. Wang, Z., Liu, W., Xu, J., Li, B., Iyer, R., Illikkal, R.,
Wu, X., Mow, W.H., Ye, W.: A case study on the commu-
nication and computation behaviors of real applications
in noc-based mpsocs. In: 2014 IEEE Computer Society
Annual Symposium on VLSI, pp. 480–485 (2014)
34. Wayne Bollinger, S., Midkiff, S.: Processor and link as-
signment in multicomputers using simulated annealing.
In: ICPP, vol. 1, pp. 1–7 (1988)
35. Wille, R., Burgholzer, L., Zulehner, A.: Mapping Quan-
tum Circuits to IBM QX Architectures Using the Mini-
mal Number of SWAP and H Operations. arXiv e-prints
arXiv:1907.02026 (2019)
36. Wootters, W.K., Zurek, W.H.: A single quantum cannot
be cloned. Nature 299(5886), 802–803 (1982). DOI
10.1038/299802a0. URL http://dx.doi.org/10.1038/
299802a0
37. Yamamoto, Y., Aihara, K., Leleu, T., Kawarabayashi,
K.i., Kako, S., Fejer, M., Inoue, K., Takesue, H.: Co-
herent ising machines—optical neural networks operat-
ing at the quantum limit. npj Quantum Information
3(1), 49 (2017). DOI 10.1038/s41534-017-0048-9. URL
https://doi.org/10.1038/s41534-017-0048-9
38. Zick, K.M., Shehab, O., French, M.: Experimental quan-
tum annealing: case study involving the graph isomor-
phism problem. Scientific Reports 5, 11168 EP – (2015).
URL http://dx.doi.org/10.1038/srep11168
39. Zulehner, A., Paler, A., Wille, R.: An efficient methodol-
ogy for mapping quantum circuits to the IBM QX archi-
tectures. IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems (2018)
