Minor-embedding heuristics for large-scale annealing processors with
  sparse hardware graphs of up to 102,400 nodes by Sugie, Yuya et al.
Soft Computing manuscript No.
(will be inserted by the editor)
Minor-embedding heuristics for large-scale annealing processors with
sparse hardware graphs of up to 102,400 nodes
Yuya Sugie 1,2 · Yuki Yoshida 3 [0000-0002-1402-7840] · Normann Mertig 1, 
[0000-0003-3025-7141] · Takashi Takemoto 1 [0000-0002-5949-2252] · Hiroshi
Teramoto 4,5 · Atsuyoshi Nakamura 2,4 [0000-0001-7078-8655] · Ichigaku Takigawa
2,4,5,6,7 [0000-0001-5633-995X] · Shin-ichi Minato 4,8 · Masanao Yamaoka 1 · Tamiki
Komatsuzaki 4,6 [0000-0001-7175-8474]
Received: date / Accepted: date
Abstract Minor embedding heuristics have become an in-
dispensable tool for compiling problems in quadratically un-
constrained binary optimization (QUBO) into the hardware
graphs of quantum and CMOS annealing processors. While
recent embedding heuristics have been developed for an-
nealers of moderate size (about 2000 nodes) the size of the
A preliminary version of this paper has been published in (Sugie et al.,
2018).
1 Hitachi Hokkaido University Laboratory, Center for Exploratory
Research, Research and Development Group, Hitachi, Ltd., Sapporo
001-0021, Japan
E-mail: normann.mertig.ee@hitachi.com
2 Graduate School of Information Science and Technology, Hokkaido
University, Kita 14 Nishi 9, Kita-ku, Sapporo 060-0814, Japan
3 Department of Complexity Science and Engineering, Graduate
School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashi-
wanoha, Kashiwa, Chiba 277-8561, Japan
4 Research Center of Mathematics for Social Creativity, Research
Institute for Electronic Science, Hokkaido University, Kita 20 Nishi
10, Kita-Ku, Sapporo 001-0020, Japan
5 PRESTO, Japan Science and Technology Agency (JST), Kawaguchi-
shi, Saitama 332-0012, Japan
6 Institute for Chemical Reaction Design and Discovery (WPI-
ICReDD), Hokkaido University, Kita 21, Nishi 10, Kita-ku, Sapporo,
Hokkaido 001-0021, Japan
7 Center for Advanced Intelligence Project (AIP), RIKEN, Ni-
honbashi 1-chome, Mitsui Building, 15th floor, 1-4-1 Nihonbashi,
Chuo-ku, Tokyo 103-0027, Japan
8 Graduate School of Informatics, Kyoto University, Yoshida-
Honmachi, Sakyo-ku, Kyoto 606-8501, Japan
latest CMOS annealing processor (with 102,400 nodes) poses
entirely new demands on the embedding heuristic. This raises
the question, if recent embedding heuristics can maintain
meaningful embedding performance on hardware graphs of
increasing size. Here, we develop an improved version of the
probabilistic-swap-shift-annealing (PSSA) embedding heuris-
tic [which has recently been demonstrated to outperform
the standard embedding heuristic by D-Wave Systems (Cai
et al., 2014)] and evaluate its embedding performance on
hardware graphs of increasing size. For random-cubic and
Bara´basi-Albert graphs we find the embedding performance
of improved PSSA to consistently exceed the threshold of
the best known complete graph embedding by a factor of 3.2
and 2.8, respectively, up to hardware graphs with 102,400
nodes. On the other hand, for random graphs with constant
edge density not even improved PSSA can overcome the de-
terministic threshold guaranteed by the existence of the best
known complete graph embedding. Finally, we prove a new
upper bound on the maximal embeddable size of complete
graphs into hardware graphs of CMOS annealers and show
that the embedding performance of its currently best known
complete graph embedding has optimal order for hardware
graphs with fixed coordination number.
Keywords graph minor · heuristic · scalability · annealing ·
QUBO
1 Introduction
The last decade has witnessed impressive progress in the
development of a new hardware architecture, which is com-
monly known as annealing processor. This development was
sparked by the discovery of a new model for quantum com-
putation (Farhi et al., 2001; Kadowaki and Nishimori, 1998),
which led to the development of annealing hardware based
on superconducting quantum bits (D-Wave Systems Inc.,
ar
X
iv
:2
00
4.
03
81
9v
1 
 [q
ua
nt-
ph
]  
8 A
pr
 20
20
2 Y. Sugie et al.
since 2007; Johnson et al., 2011). Simultaneously, the per-
formance of hardware processors executing simulated an-
nealing (Kirkpatrick et al., 1983) has improved consider-
ably, e.g., due to algorithmic advances (Aramon et al., 2018;
Isakov et al., 2015; Zhu et al., 2015) as well as customized
hardware circuits based on CMOS or laser technology, see
(Hitachi Ltd., 2018; Matsubara et al., 2018; Okuyama et al.,
2017; Takemoto et al., 2019; Tsukamoto et al., 2017; Ya-
maoka et al., 2016) or (Inagaki et al., 2016; McMahon et al.,
2016) for references. While hardware realizations may differ
in their implementation, all annealing processors perform
exactly the same task. That is, they provide a fast method for
finding the ground state configuration, which minimizes the
energy of an Ising model. In that, they represent the ideal
hardware for solving combinatorial optimization problems
in quadratic unconstrained binary optimization (QUBO) (see,
e.g., Kochenberger et al., 2014; Lucas, 2014) and related ap-
plications, e.g., in quantum chemistry (Xia et al., 2018) or
machine learning (Neven et al., 2009).
A general problem in quadratically unconstrained binary
optimization results in Ising models which consist of N bi-
nary spin variables {σi} which take values σi ∈ {−1,1},
i = 1, · · · ,N. The energy of each spin configuration is given
by
H(σi) =−
N
∑
i, j=1
σiJi jσ j−
N
∑
i=1
hiσi, (1)
where hi ∈ R and Ji j ∈ R are externally fixed model param-
eters, known as magnetic fields and spin-spin couplings, re-
spectively. These model parameters encode the energy land-
scape (or cost function) of the optimization problem. Un-
fortunately, present day quantum annealers can only pro-
vide for a finite amount of couplings between spins (D-Wave
Systems Inc., since 2007; Johnson et al., 2011), resulting in
tangible restrictions on Ji j. In particular, the non-zero Ji j of
a quantum annealer induce the famous hardware topology
known as chimera graph. Similarly, large-scale CMOS an-
nealing processors with up to 102,400 spins and fast par-
allel updates have only been realized with sparse hardware
topologies (Hitachi Ltd., 2018; Takemoto et al., 2019; Ya-
maoka et al., 2016). In this case, the non-zero spin-spin cou-
plings Ji j induce the hardware topology of a King’s graph,
as illustrated in Fig. 1. In order to solve QUBO problems
on such annealing processors, the ground state search of the
given Ising model has to be mapped to a given hardware by
means of minor embedding (Choi, 2008). See Fig. 1(a) for
a general work flow. In minor embedding each spin of the
original Hamiltonian is represented by a tightly connected
group of spins on the hardware, which are forced to point
into the same direction, see Fig. 1(c). Such a group of hard-
ware spins is called a super vertex, representing a single spin
of the original input problem. In contrast to a single spin, a
super vertex can be adjacent to many other super vertices
on the hardware. Precisely this property allows for encoding
the spin-spin couplings of the original Ising model between
super vertices, see Fig. 1(b, c) for an example.
Minor embedding for annealing processors, i.e., embed-
ding an input graph I (induced by the non-zero couplings
of a given optimization problem) into a hardware graph H
(induced by the non-zero couplings of the hardware) has at-
tracted significant attention over the past decade. In partic-
ular, for optimization problems whose input graph I posses
a predefined structure fast deterministic and often optimal
embeddings into the static hardware graphs H are known.
See (Boothby et al., 2016; Choi, 2011; Klymko et al., 2014;
Okuyama et al., 2016; Venturelli et al., 2015) for embed-
dings of complete graphs, (Goodrich et al., 2018; Okuyama
et al., 2016) for embeddings of bipartite graphs, or (Zarib-
afiyan et al., 2017) for embeddings of Cartesian product
graphs. On the other hand, many problems in combinatorial
optimization, such as solving the clique problem on social
networks, produce sparse input graphs whose connections
appear random. Such optimization problems do not posses a
tangible structure which can systematically be exploited for
minor embedding. Yet, embedding such problems by resort-
ing to minor embeddings of complete graphs seems waste-
ful. For example, for the presently best quantum annealer
with 2048 hardware spins, it would result in representing in-
put problems with only 65 spins. Similarly, for the currently
largest CMOS annealer with 102400 spins, it would result
in representing input problems with only 321 spins. Hence,
improving over the embeddability threshold ensured by the
best known complete graph embeddings is desirable.
To this end, one could, in principle, resort to the exact
algorithms invented by (Robertson and Seymour, 1995) and
later improved by (Kawarabayashi et al., 2012) and (Adler
et al., 2011). However, these algorithms are not constructive
and have prohibitively large runtime
O
(
2(2k+1) logk |V (I)|2k 22|V (I)|2 |E(H)|
)
, (2)
which scales exponentially with the number of vertices of
the input graph |V (I)|, linearly in the number of edges on the
hardware graph |E(H)|, and further depends on the branch
width k of the hardware graph. For this reason, it is impor-
tant to have efficient heuristics which find graph minors with
high probability, rather than attempting an exhaustive search
or proving minor exclusion. Such a heuristic was, for exam-
ple, proposed in (Cai et al., 2014) and has subsequently be-
come state of the art due to its inclusion in the standard soft-
ware package provided by D-Wave Systems. An alternative
heuristic, called probabilistic-swap-shift-annealing (PSSA),
was subsequently invented by YY as a submission to the
“Hokkaido University & Hitachi 2nd New-Concept Com-
puting Contest 2017.”1 There it showed outstanding perfor-
1 https://hokudai-hitachi2017-2.contest.atcoder.jp/
Minor embedding on large-scale annealers 3
Fig. 1 (a) Work flow for solving QUBO problems on annealing hardware, emphasizing the role of graph-minor embeddings for representing (b, d)
the spin-spin connectivity graphs of a given QUBO input on (c, e) annealing hardware with sparse spin-spin connectivity. (b, c) Minor embedding
of a complete graph K9 into a King’s graph KG8,8. Each super vertex is marked by the corresponding vertex label of the input graph.
mance as compared to hundreds of other submissions, even-
tually winning the contest. Its superior performance to (Cai
et al., 2014) was later demonstrated by the authors in (Sugie
et al., 2018). While both heuristics have initially been de-
veloped for annealers of moderate size (about 2000 spins)
the release of the latest CMOS annealing processor with
102,400 spins poses entirely new demands on the perfor-
mance of the embedding heuristics. With D-Wave Systems
doubling the number of qubits of its annealers roughly ev-
ery two years a similar demand on embedding heuristics is
expected for quantum annealers in the near future.
In this paper, we present an improved version of PSSA
and use it in a case study to evaluate the performance of em-
bedding heuristics on ever larger hardware graphs of up to
102,400 spins. The algorithmic improvements of PSSA in-
clude (i) an additional search phase, (ii) a degree-oriented
super-vertex shift rule, and (iii) optimized annealing sched-
ules. The performance of improved PSSA is then investi-
gated with respect to various types of random input graphs
on hardware graphs of increasing size. An embedding per-
formance consistently exceeding the embedding threshold
of the best known complete graph embedding by a con-
stant factor c is observed for random-cubic (c = 3.2) and
Bara´basi-Albert (c = 2.8) graphs for hardware graphs up to
102,400 spins. Incidentally this constitutes an average per-
formance gain of 42% and 28%, respectively, over the pre-
viously published version of PSSA (Sugie et al., 2018). On
the other hand, for sparse random graphs with constant edge
density even the embedding performance of improved PSSA
shrinks to the deterministic threshold guaranteed by the exis-
tence of the best known complete graph embedding on large
hardware graphs. Finally, this paper provides a new upper
bound on the maximal complete graphs embeddable into
a King’s graph hardware of present CMOS annealing pro-
cessors. In addition, we prove that the baseline performance
of the currently best known complete graph embedding has
optimal order for hardware graphs with fixed coordination
number.
This paper is organized as follows. In Sect. 2 we lay out
the basic notation, define graph minors, and specify the algo-
rithm task. In Sect. 3, we briefly outline PSSA as previously
published in (Sugie et al., 2018) in order to make the paper
self-contained. In Sect. 4 we present several improvements
of PSSA, including (i) an additional terminal search phase,
(ii) a degree-oriented super vertex shift rule, and (iii) opti-
mized annealing schedules. Finally, in Sect. 5 we evaluate
the embedding performance of the improved PSSA on hard-
ware graphs of increasing size (up to 102,400 nodes) with
respect to various input problems. In Sect. 6 we prove an
upper bound on the maximal complete graphs embeddable
4 Y. Sugie et al.
into a hardware King’s graphs of CMOS annealers and show
that the baseline performance of the currently best known
complete graph embedding has optimal order for hardware
graphs with fixed coordination number. We summarize in
Sect. 7 and discuss open questions for future research.
2 Preliminaries
In this section we introduce some basic notations and defini-
tions. In particular, we define minor embeddings and super
vertex placements, describe the hardware graph, specify the
algorithm task, and discuss several input graphs which will
be used for evaluating the embedding performance of im-
proved PSSA on hardware graphs of ever larger size.
2.1 Graph minor and super vertex placement
In what follows we consider undirected and simple graphs.
Let I = (V (I),E(I)) denote an input graph. Its vertex set
V (I) represents the spin indices i = 1, · · · ,N of the original
QUBO problem, Eq. (1). Its edges (i, j) ∈ E(I) ⊂ V (I)×
V (I) are induced by the non-zero entries of the correspond-
ing symmetric connectivity matrix Ji j, i, j = 1, · · · ,N. Let
H = (V (H),E(H)) denote a hardware graph. Its vertex set
V (H) shall represent the spins of the hardware, while its
edges E(H) are induced by the non-zero couplings of the
annealing processor.
We now define a minor embedding, similar to (Cai et al.,
2014). Our definition will proceed in two stages. In the first
stage we define a super vertex placement.
Definition 1 (Super Vertex Placement) Let I, H be two
graphs. A super vertex placement is a function φ : V (I)→
2V (H) which assigns each vertex i ∈V (I) to a subset of ver-
tices φ(i)⊂V (H), such that:
(M1) ∀i ∈V (I): φ(i) 6= /0 and the subgraph induced by φ(i)
in H is connected.
(M2) ∀i, j ∈V (I) with i 6= j: φ(i)∩φ( j) = /0.
In the second stage we define a minor embedding.
Definition 2 (Minor Embedding) Let I, H be two graphs.
A super vertex placement φ is a minor embedding, if in ad-
dition to property (M1) and (M2) we have
(M3) ∀(i, j) ∈ E(I):
∃(u,v) ∈ φ(i)×φ( j) such that (u,v) ∈ E(H).
In what follows we refer to each set φ(i), induced by a super
vertex placement φ , as a super vertex. In order for φ(i) to
represent a single spin from the input problem, we require
super vertices to be non-empty and connected, hence (M1).
In addition, a single spin of the hardware cannot represent
multiple spins of an input problem, hence (M2). Finally, a
valid representation of a given Ising model can be provided,
if and only if all edges of the input graph I, induced by non-
zero values of Ji j, can be represented by at least one edge
between the corresponding super vertices on the hardware
graph. Hence, a given super vertex placement φ is a suitable
hardware mapping, if and only if it is a minor embedding.
See Fig. 1(b, c) for an example.
In order to rate the quality of a super vertex placement
φ , we define a function which counts the edges that a su-
per vertex placement faithfully represents on the hardware
graph.
Definition 3 (Number of Embedded Edges) Let I, H be
two graphs and φ a super vertex placement. The number of
edges embedded by φ is defined as
Eemb(φ) := |{(i, j) ∈ E(I) | ∃u ∈ φ(i),v ∈ φ( j) (3)
with (u,v) ∈ E(H)}| .
For a general super vertex placement we have that Eemb(φ)≤
|E(I)| and a valid minor embedding, satisfying condition
(M3), is found, if and only if Eemb(φ) = |E(I)|.
Note that for certain inputs I and a fixed hardware H, it
may be the case that no minor embedding exists. In that case
the number of embedded edges will be strictly smaller than
the number of edges of the input graph, such that Eemb(φ)<
|E(I)| for any super vertex placement φ . In this case the
heuristic we propose in the next sections is bound to fail.
Furthermore, throughout the annealing phase of PSSA, we
restrict our implementation to super vertex placements where
each super vertex φ(i) can be parametrized by a path. We de-
note the endpoints of the vertex path, i.e., the set of its leaves
by Leaf[φ(i)].
Finally, we emphasize that finding a minor embedding
is the key ingredient for mapping QUBO problems, Eq. (1),
onto the spin-spin couplings and external magnetic fields of
an annealing processor. Prior to finding the minor embed-
ding one determines the input graphs I from the non-zero en-
tries of the matrix Ji, j. After a minor embedding is found, the
the spin-spin couplings and external magnetic fields of the
annealing processor can be determined from the input pa-
rameters and the minor embedding (Ji, j, hi,φ ) as described
in (Choi, 2008).2 Both processes are straight forward and
will no further be mentioned in this paper.
2.2 Hardware graphs
The development of annealing processors with sparse hard-
ware topology currently knows two pertinent hardware ar-
2 Roughly speaking, (i) the coupling Ji, j will be applied to exactly
one coupler connecting the super vertices φ(i) and φ( j), (ii) The in-
ternal connections of super vertex φ(i) are set to a strength of order
O(∑ j∈nbr(i) |Ji, j|− |hi|), where nbr(i) denotes the vertex neighbors of
i ∈V (I), and (iii) the magnatic field hi is distributed across the spins of
the super vertex φ(i).
Minor embedding on large-scale annealers 5
chitectures. One is the Chimera graph topology adopted by
quantum annealers (Choi, 2011; Klymko et al., 2014). The
other is the King’s graph topology adopted by the CMOS an-
nealing processors (Okuyama et al., 2016, 2017). Both hard-
ware graphs have very similar general features, i.e.: (i) both
graphs have a fixed coordination number, (ii) for both graphs
the tree-width grows as a square root of verticesO(
√|V (H)|),
and (iii) for both hardware graphs constructive minor em-
beddings of complete graphs ensure the embeddability of
input graphs with O(
√|V (H)|) vertices. (See next section
for details.)
In this study, we will focus on hardware topologies H
forming a square King’s graph (Okuyama et al., 2017) while
omitting Chimera-type hardware graphs entirely. We made
this choice due to the following arguments: (i) The large
scale hardware structures on which we benchmark the per-
formance of improved PSSA has so far only been realized
for CMOS annealers with hardware King’s graphs of up
to 102,400 spins. (ii) A performance analysis of PSSA on
Chimera graphs has previously been published in (Sugie et al.,
2018) and is no longer the main objective of this paper.
(iii) We believe that our evaluation of the embedding per-
formance of PSSA on hardware King’s graphs of increas-
ing size would not fundamentally change, if carried out on
Chimera graphs due to the similarity of both graphs with
respect to general features.
A square King’s graph represents all valid moves of the
king chess piece on a chessboard, i.e., each vertex represents
a square of the chessboard and each edge is a valid move.
We denote the King’s graph by the symbol KGL,L, where L
denotes the width of the chessboard and L×L denotes the
total number of vertices |V (H)|. See Fig. 1(c, e) for an ex-
ample. In what follows, the King’s graph KGL,L defined by
the hardware is usually fixed. The goal is to find minor em-
beddings for large input graphs which have as many vertices
|V (I)| as possible.
2.3 Best known complete graph embedding
A simple and fast baseline strategy for minor embedding
into a fixed hardware graph exploits the existence of known
complete graph embeddings. In particular, if a minor em-
bedding of a complete graph KN with N vertices is known,
embedding of other input graphs with N vertices is triv-
ial. For a fixed King’s graph hardware KGL,L, embedding
complete graphs KN with up to N = L+ 1 vertices, is al-
ways possible using the construction of (Okuyama et al.,
2016). See Fig. 1(b, c) for a sketch. Hence, minor embed-
ding of any graph I with |V (I)| ≤ L+ 1 is trivial. On the
other hand, a systematic embedding of complete graphs KN
with N > L+1 is currently unknown and believed to be im-
possible. (A proof showing that complete graphs KN with
N > 2L are not embeddable into King’s graph KGL,L is given
in Sect. 6.) In absence of a proof of optimality we refer to
the embedding of complete graphs according to (Okuyama
et al., 2016) as the best known complete graph embedding.
2.4 Sparse random input graphs
While the best known complete graph embedding does not
allow for embedding complete graphs with |V (I)| > L+ 1
into a King’s graph, going beyond this embedding threshold
should be feasible, if the input graph is sparse. See Fig. 1(d,
e). PSSA tries to find minor embeddings precisely for this
kind of input, in order to widen the range of QUBO prob-
lems amenable to the hardware of annealing processors. How-
ever, if a certain QUBO problem induces sparse input graphs
I with a predefined structure, it is advisable to resort to a
deterministic minor embedding. This is usually faster and
most likely produces embeddings for larger input graphs
than a general purpose heuristic. This strategy has, for ex-
ample, been applied for the minor embedding of bipartite
graphs (Goodrich et al., 2018; Okuyama et al., 2016) or the
minor embedding of Cartesian product graphs (Zaribafiyan
et al., 2017). In that, an embedding heuristic should be ap-
plied to sparse input problems whose structure is not known
in advance. In other words, an embedding heuristics is used
on input graphs which appear to be random to some de-
gree. Such randomized sparse input graphs can appear, e.g.,
in graph coloring or when solving the clique problem (Lu-
cas, 2014) on social network graphs (Albert and Baraba´si,
2002). To benchmark PSSA on a wide variety of potential
input types, this paper considers the following three famous
classes of random input graphs. (i) Random cubic graphs
as a model of super low edge density, (ii) Bara´basi-Albert
random graphs (Albert and Baraba´si, 2002) as a prototype
of scale-free graph structures from social network science,
and (iii) Erdo˝s-Re´nyi random graphs (Bolloba´s, 1985; Erdo˝s
and Re´nyi, 1959) with constant edge density as a prototype
of a sparse random graph with high complexity. For a de-
tailed description of these graphs, the reader is referred to
Appendix 8.1.
2.5 Embedding probability and embedding threshold
We now define our performance measures.
Definition 4 (Embedding Probability) Let G denote a class
of input graphs I ∈ G , e.g., random cubic graphs. Let A
denote an embedding algorithm such as PSSA. For a fixed
hardware graph H we define the embedding probability
pemb(|V (I)|,H,G ,A )
as the ratio of input samples from graph class G restricted to
graphs of vertex size |V (I)| for which the embedding algo-
rithm A finds a minor embedding.
6 Y. Sugie et al.
Note that the probabilistic nature of the embedding proba-
bility originates both from the potentially probabilistic ele-
ments of the embedding algorithm as well as the probabilis-
tic nature of the input graphs. Further note that, the defini-
tion of the embedding probability may in principal be aug-
mented by including further dependencies, such as the num-
ber of edges |E(I)| of an input graph. In this paper, we chose
the number of vertices |V (I)| because it directly corresponds
to the number of spins of the input problem. In addition,
for all graph classes considered in this paper the number of
edges |E(I)| immediately follows from |V (I)|.
In this paper we will evaluate the embedding probabil-
ity on a fixed hardware graph H for ever larger sizes |V (I)|
of the input graphs I ∈ G . Up to fluctuations this typically
results in a monotonically decreasing function with high em-
bedding probability at small sizes of |V (I)| and zero proba-
bility for large sizes of |V (I)|. See Fig. 3 for a preview. We
then define the embedding threshold as follows.
Definition 5 (Embedding Threshold) Let H be a fixed hard-
ware graph. Let G denote a class of input graphs I ∈ G . Let
A be an embedding algorithm and pemb(|V (I)|,H,G ,A )
the corresponding embedding probability. For a fixed con-
stant p with 0 < p≤ 1, we define the embedding threshold
V¯ (H,G ,A , p) = (4)
min{|V (I)| ∈ N | pemb(|V (I)|,H,G ,A )< p},
as the minimal vertex size |V (I)| for which the embedding
probability pemb falls below a prescribed threshold p.
Throughout this paper we will often denote the embedding
probability as pemb(|V (I)|) and the embedding threshold as
V¯ (L) because: (i) The graph class G (random cubic, Bara´basi-
Albert, or Erdo˝s-Re´nyi) will always be specified from the
context. (ii) The embedding algorithm A will always be
some version of PSSA whose details are specified from the
context. (iii) The hardware graph H will always be a King’s
graph KGL,L, whose size is either specified from context or
through the side length L. The latter is convenient since the
embedding threshold is often found to be a linear function
of the side length L empirically. (iv) We fix the value of
p at high and constant probability p = 0.95. (The precise
value of p is usually not important, since the transition of
pemb(|V (I)|) from 1 to 0 as a function of increasing |V (I)| is
usually quite sharp.)
2.6 Performance target and evaluation methodology
From the preceding discussion it is obvious that a good heuris-
tic should never fall below the embedding threshold ensured
by the best known complete graph embedding. Furthermore,
a good heuristic should be able to maintain an embedding
threshold which exceeds the best known complete graph em-
bedding as the size of the hardware graph increases. In par-
ticular, on a King’s graph we want a heuristic which can
maintain an embedding threshold V¯ (L) > L + 1 as L in-
creases.
In this paper we will demonstrate that improved PSSA
can maintain an embedding threshold V¯ (L) > L+ 1 even
on large hardware graphs for certain types of input graphs,
such as random cubic and Bara´basi-Albert graphs. On the
other hand, we also show that not even improved PSSA can
beat the embedding threshold of the best known complete
graph embedding for Erdo˝s-Re´nyi graphs as the size of the
hardware increases. To corroborate these statements we will
now introduce PSSA as previously described in (Sugie et al.,
2018) in the next section. Subsequently, we introduce im-
proved PSSA and tune it on hardware King’s graphs with
320×320 spins, as described in Sect. 4. Finally, we evaluate
the embedding threshold of improved PSSA phenomenolog-
ically on King’s graphs of ever larger size in Sect. 5.
3 Probabilistic-Swap-Shift-Annealing (PSSA)
In order to make this paper self-contained, we outline the
core elements of the probabilistic-swap-shift-annealing heuris-
tic, previously published in (Sugie et al., 2018).
For a given input graph I and a given hardware H PSSA
tries to find a minor embedding by implementing the fol-
lowing general framework: (i) First, an initial super vertex
placement representing the vertices of the input graph I on
the hardware H is prepared. See Fig. 2(a). In general, the ini-
tial super vertex placement will not embed all edges of the
input graph such that Eemb(φ) < |E(I)|. In this case PSSA
proceeds to the annealing search phase. (ii) In the anneal-
ing phase PSSA will successively propose new super vertex
placements. To this end it either swaps two super vertices,
see Fig. 2(b), or shifts hardware vertices from one super
vertex to its neighbor, see Fig. 2(c, d). The proposed super
vertex placement is accepted, if the number of embedded
edges Eemb(φ) grows. On the other hand, a proposed super
vertex placement is accepted with finite probability, even if
the number of embedded edges decreases. This avoids trap-
ping the algorithm on incomplete super vertex placements
which maximize Eemb(φ) locally. (iii) The annealing search
phase terminates, if a super vertex placement representing
all edges of the input graph, Eemb(φ) = |E(I)|, i.e., a valid
minor embedding is found. Alternatively, PSSA may termi-
nate unsuccessfully after reaching a maximum amount of
prescribed iterations tmax, because a minor embedding was
either not found or does not exist.
In addition to the general framework, PSSA is currently
implemented with the following specifications: (i) PSSA uses
super vertices φ(i) which are parametrized as path on the
hardware. (ii) Initial super vertex placements are generated
Minor embedding on large-scale annealers 7
swap
shift along
guiding pattern shift away from
guiding pattern
Fig. 2 Visualizing the main components of PSSA on a King’s graph KG8,8. (a) The guiding pattern induced by the best known complete graph
embedding of K9 and the division of its super vertices for initial placement. (b) Swapping of super vertices. (c, d) Shifting the leaves of a super
vertex to its neighbor (c) along and (d) away from the guiding pattern. Each super vertex is marked by the corresponding vertex label of the input
graph.
by splitting the super vertices of the best known complete
graph embedding on H into |V (I)| super vertices of almost
equal size to represent the vertices of the input graph V (I)
on the hardware H. See Fig. 2(a) for an example on a King’s
graph. Incidentally, this guarantees that PSSA never falls be-
low the embedding threshold of the best known complete
graph embedding. In addition, this initial placement gener-
ates a super vertex placements where each super vertex has
many neighbors resulting in a high connectivity of super ver-
tices. This is expected to facilitate finding a minor embed-
ding in the successive annealing search phase. (iii) A swap
is implemented by randomly selecting an edge (i,k) ∈ E(I)
and swapping the super vertex φ(i) with a super vertex φ( j),
adjacent to φ(k) on H. See Fig. 2(b). (iv) A shift is im-
plemented by randomly selecting a super vertex φ(i) (with
|φ(i)|> 1) and one of its leaf nodes u∈ Leaf[φ(i)]. The shift
proposal is completed by deleting u from φ(i) and attach-
ing it to a leaf v of a neighboring super vertices φ( j) on H.
See Fig. 2(c, d). If a neighboring leaf v does not exist, the
shift proposal is skipped and the algorithm proceeds to the
next proposal. If there are multiple candidates for v the algo-
rithm randomly selects one of the available nodes with equal
probability. (v) PSSA further uses the super vertices of the
best known complete graph embedding as a guiding pattern
in order to distinguish two types of shift moves. See top of
Fig. 2(a) for an example of the guiding pattern. A shift move
is along the guiding pattern, if the leaf u ∈ Leaf[φ(i)] is at-
tached to a leaf v ∈ Leaf[φ( j)] with both u and v belonging
to the same super vertex of the best known complete graph
embedding. See Fig. 2(c). On the other hand, a shift move
is away from the guiding pattern, if the leaf u ∈ Leaf[φ(i)]
is attached to a leaf v ∈ Leaf[φ( j)] with u and v belong-
ing to distinct super vertices of the best known complete
graph embedding. See Fig. 2(d). PSSA favors shifts along
the guiding pattern, in order to prioritize super vertices with
diagonal orientation. This is expected to increase the con-
nectivity of the super vertices, and thus, to prevent PSSA
from getting trapped in a local maximum of the score func-
tion Eemb(φ)< |E(I)|. For more details the reader is referred
to the pseudo-code summary of improved PSSA, given in
Algorithm 1.
Schedule – The original implementation of PSSA di-
vides the annealing time into two search phases of equal
length. During both phases the temperature is initialized at a
finite value and then decreased to zero
T (t) =
T0×
(
1− 2ttmax
)
if 0≤ t < tmax2 ,
Ttmax
2
×
(
2− 2ttmax
)
if tmax2 ≤ t ≤ tmax,
(5)
allowing for a finite acceptance of suboptimal moves in the
beginning of each phase, while suppressing them towards
the end of each phase. In the first search phase conventional
PSSA suppresses shift moves which lead away from the
8 Y. Sugie et al.
Algorithm 1: PSSA (Sugie et al., 2018)
Input : Input graph I and hardware graph H
Output : Super vertex placement φbest
Ensure : |V (I)| ≤ |V (H)|, |E(I)| ≤ |E(H)|
Require : Function Eemb(φ), Eq. (3), tmax, schedule,
guiding pattern
// prepare initial placement of super vertices
1 φ ← guiding pattern divided into |V (I)| super vertices;
// see Fig. 2(a)
2 φbest ← φ ; if Eemb(φbest) = |E(I)| return φbest and terminate;
// minor found
// improve super vertex placement through
simulated annealing
3 for t = 0 to tmax do
4 move← swap or shift, randomly selected according to
schedule;
5 if move is swap then // see Fig. 2(b)
6 i,k← (i,k) ∈ E(I), randomly selected;
7 j← j ∈V (I) with φ( j) neighboring φ(k) in H,
randomly selected;
8 φproposed ← φ with φ(i) and φ( j) swapped;
9 else if move is shift then // see Fig. 2(c, d)
10 i,u← i ∈V (I) with |φ(i)|> 1 and u ∈ Leaf[φ(i)],
both randomly selected;
11 allow any direction shift← true or false according
to schedule;
12 if allow any direction shift then // see
Fig. 2(d)
13 j,v← j ∈V (I), v ∈ Leaf[φ( j)] with v adjacent to
u in H, randomly selected;
14 else // see Fig. 2(c)
15 j,v← j ∈V (I), v ∈ Leaf[φ( j)] with v adjacent to
u along guiding pattern, randomly selected;
16 φproposed ← φ with u deleted from φ(i) and assigned
to φ( j);
// evaluate acceptance of proposed move
17 ∆E← Eemb(φproposed)−Eemb(φ); T (t)← temperature
according to schedule;
18 if exp(∆E/T (t))> random float ∈ [0,1) then
// accept and update
19 φ ← φproposed ;
20 if Eemb(φbest)< Eemb(φ) then
21 φbest ← φ ;
22 if Eemb(φbest) = |E(I)| return φbest and
terminate; // minor found
23 return φbest // even if Eemb(φbest)< |E(I)|, i.e.,
minor not found
guiding pattern. This policy is expected to protect PSSA
from being trapped in suboptimal super vertex placements.
If the first search phase fails to produce a valid minor em-
bedding, PSSA enters the second phase. During that sec-
ond phase we consider a wider search space by allowing
for a higher proportion of shifts away from the guiding pat-
tern. To this end, PSSA schedules shifts with probability
ps(t) and swaps with probability 1− ps(t). If a shift is pro-
posed, an arbitrary shift direction is allowed with probabil-
ity pa(t), while shifts along the guiding pattern are guaran-
teed with probability 1− pa(t). Both ps(t) and pa(t) proba-
bilities are scheduled with simple linear schedules. The de-
tailed scheduling parameters used in numerical experiments
are summarized in the Appendix 8.1. For further comments
on scheduling the reader is referred to Sect. 4.3 on improved
annealing schedules.
4 Improvements of PSSA
In order to improve the performance of PSSA on large input
graphs, we implemented several modifications which shall
be described in the following. These include (i) an addi-
tional terminal search phase, (ii) a degree-oriented super-
vertex shift rule, and (iii) optimized annealing schedules.
4.1 Terminal search phase
If standard PSSA fails to produce a valid minor embedding
after tmax iterations, it usually returns a super vertex place-
ments φbest (henceforth referred to as φ for brevity) with two
pertinent properties: (i) φ occupies all vertices u ∈ V (H) of
the hardware. This can potentially lead to unnecessary as-
signments u ∈ φ(i) which are neither needed for preserv-
ing the connected structure of a super vertex φ(i) nor for
connecting super vertices φ(i), φ( j) with (i, j)∈ E(I). Even
worse, these unnecessary super vertex assignments may ob-
struct potential connections between super vertices φ(i), φ( j)
with (i, j)∈E(I). (ii) The super vertex placement φ returned
by PSSA usually represents most edges of the input graph
already (Eemb(φ) . |E(I)|). Thus, finding a few more path
connecting super vertices φ(i), φ( j) with (i, j) ∈ E(I) may
already result in a valid minor embedding which fulfills (M3).
Hence, the terminal search tries to transform φ into a valid
minor embedding by addressing properties (i) and (ii) as
follows. For details see the pseudo-code summary in Algo-
rithm 2.
Creating free hardware vertices – First the hardware ver-
tices u ∈ V (H) = {0, · · · , |V (H)| − 1} are iterated over re-
peatedly and it is checked by the subroutine is deletable(·),
if u can be removed from its corresponding super vertex
φ(i) 3 u. In this step a vertex u is removable from φ(i), if
(p1) removing u from φ(i) does not destroy the non-empty
connected structure of the super vertex φ(i) (M1) and (p2)
removing u from φ(i) does not decrease the number of em-
bedded edges Eemb(φ). Vertices u ∈V (H), which have been
removed from a super vertex are collected in a set of free
vertices
U =V (H)\
 ⋃
i∈V (I)
φ(i)
 . (6)
The cleanup process is terminated, if no more vertices u ∈
V (H) can be removed from their super vertex φ(i) 3 u after
Minor embedding on large-scale annealers 9
Algorithm 2: Improved PSSA with terminal search
Input : Input graph I and hardware graph H
Output : Super vertex placement φ
Require : PSSA, is deletable(·), bfs path(·)
// get super vertex placement from PSSA
1 φ ← PSSA(I,H);
// --- Start terminal search ---
// Create free hardware vertices
2 Init: U ← /0 // Set of free vertices
3 Init: u← 0 // hw vertex u ∈V (H) = {0, · · · , |V (H)|−1}
4 Init: no del← 0 // Vert. scanned since last del.
5 while no del < (|V (H)|−1) do
// scan hardware vertices u = {0, · · · , |V (H)|−1}
6 i← φ−1(u) // get preimage of u
7 if (u /∈U) and (is deletable(u, i)) then
// shift u to set of free vertices
8 φ(i)← φ(i)\{u};
9 U ←U ∪{u};
10 no del← 0;
11 else
12 no del← no del+1;
13 u← (u+1) mod |V (H)| // check next hw vertex
// Find new super vertex links by breadth first
search on graph induced by free vertices
14 for i = 0, · · · , |V (I)|−1 do
15 if ∃(i, j) ∈ E(I) with φ(i),φ( j)⊂ H not connected then
// search BFS path from φ(i) to φ( j)
16 φ ,U ← bfs path(i, j)
17 return φ // even if Eemb(φ)< |E(I)|, i.e., minor
not found
a sweep of the whole hardware. For a possible implementa-
tion of the subroutine is deletable(·) the reader is referred to
Appendix 8.2.
Super vertex links from breadth first search (BFS) – Af-
ter the cleanup phase the free hardware vertices u ∈U are
used to create representations of edges (i, j) ∈ E(I) whose
super vertices φ(i),φ( j) are not yet linked on the hardware.
As described in Algorithm 2 this is done by successively it-
erating through the vertices i= 0, · · · , |V (I)|−1 of the input
graph and checking for edges (i, j) ∈ E(I) whose super ver-
tices φ(i),φ( j) are not yet linked on the hardware. If such
an edge is found the terminal search makes a call to the sub-
routine bfs path(·) which tries to link up the super vertices
φ(i),φ( j). In this step breadth first search (Cormen et al.,
2009) is used on the graph induced by the free vertices U
on the hardware graph H to search for a path connecting
the pair of super vertices φ(i), φ( j). If such a path is found,
the corresponding vertices are included in φ(i) and deleted
from U . The algorithm then proceeds to the next vertex pair
φ(i), φ( j). For a possible implementation of the subroutines
bfs path(·) the reader is referred to Appendix 8.2.
300 600 900 1200
0
0.5
1.0
(a)
|V(I)|
pemb
PSSA
Imp. PSSA
300 600 900 1200
0
0.5
1.0
(b)
|V(I)|
pemb
PSSA
imp. PSSA
Fig. 3 Comparing the empirical embedding probability (20 inputs) of
PSSA (gray) and improved PSSA with terminal search (black) for (a)
random cubic and (b) Bara´basi-Albert type input graphs on a King’s
graph KG320,320 hardware. A (dashed) vertical line indicates the em-
bedding threshold 321 of the best known complete graph embedding.
Note that the terminal search algorithm has already been
a part of the original PSSA as designed by YY. However,
since the terminal search phase had almost no impact on the
embedding probability on King’s graphs of moderate size
(L = 52) its details have previously been omitted in (Sugie
et al., 2018). On the other hand, for large hardware King’s
graphs (L = 320) the terminal search phase does have a pro-
found impact on the embedding performance. This is de-
picted in Fig. 3 which compares the embedding performance
of standard and the improved PSSA, showing that the im-
proved PSSA allows for increasing the size of the embed-
dable input problems both for random cubic and Bara´basi-
Albert type input graphs I. For Erdo˝s-Re´nyi-type input graphs
with 20% edge density even the improved PSSA fails to con-
struct minor embeddings for inputs beyond the embeddabil-
ity threshold (|V (I)| > 321) ensured by the existence of the
best known complete graph embedding. Reasons for the dif-
ficulty of embedding Erdo˝s-Re´nyi graphs will be discussed
in Sect. 7.
We close this section with a short discussion: (i) Note
that the terminal search phase will never decrease the num-
ber of embedded edges Eemb(φ). (ii) While PSSA produces
super vertex placements φ which occupy the whole hard-
ware, the terminal search phase may leave hardware vertices
unused. Similarly, while standard PSSA produces super ver-
tices φ(i) which induce a path in H, this property may be
destroyed in the terminal search phase due to the deletion of
vertices u ∈ V (H) as well as growing additional path. (iii)
Breadth first search is an efficient algorithm for finding the
shortest path between two vertices and two super vertices
10 Y. Sugie et al.
can be connected by including all the vertices of the path
into one super vertex. Searching for the shortest path is ben-
eficial since the number of nodes needed to connect two su-
per vertices is smallest. However, there is no guarantee that
the shortest path between two super vertices is the optimal
choice. In particular, there may be cases, where the short-
est path connecting a pair of super vertices may obstruct
the paths connecting another pair of super vertices, which
could be avoided by choosing a longer path. For this reason
the total number of links produced by the terminal search
phase may depend on the order in which links between su-
per vertex pairs are created by breadth first search. In a simi-
lar manner, the deletion of vertices u from their super vertex
φ(i) and thus the set U of free vertices may depend on the
order in which the deletion of u is tried. Ultimately, it is not
even guaranteed that the super vertex placement φ produced
by PSSA is necessarily the input which produces the largest
amount of embedded edges in the terminal search phase.
4.2 Degree-weighted shift proposals
We further improve the PSSA embedding performance, by
exploiting the degree distribution of the original input graph
I. The basic idea is that vertices with a large amount of
neighboring vertices should correspond to large super ver-
tices comprising many nodes of the hardware, while nodes
with few neighboring vertices should correspond to tiny su-
per vertices, comprising only few vertices on the hardware.
Most of our attempts to include the information on the de-
gree distribution, in particular when creating the initial super
vertex placement, went unsuccessful and shall not further be
reported. However, including the information of the degree
distribution to bias shift proposals, gave mild improvements
on the embedding probability of random cubic graphs, see
Fig. 4, and shall be described in more detail.
To begin with, the degree deg(i) of a vertex i ∈ V (I)
is defined as the number of edges incident to the vertex
i. On the other hand, the size of the corresponding super
vertex φ(i) ⊂ V (H) shall be denoted as |φ(i)|. The degree-
weighted shift rule is then applied as follows. First, we sched-
ule two neighboring leaf nodes u,v with u ∈ φ(i),v ∈ φ( j)
for a shift move exactly as in conventional PSSA. Subse-
quently, we compute the degree ratio’s, which we define as
dr(x) :=
|φ(x)|
deg(x)
x = i, j ∈V (I), (7)
as a measure to which extent the vertex size matches the
degree of the input graph. Finally, we propose assigning u ∈
φ(i) to φ( j) with probability
µ(u ∈ φ(i)→ u ∈ φ( j)) := dr(i)
dr(i)+dr( j)
, (8)
300 600 900 1200
0
0.5
1.0
(a)
|V(I)|
pemb
normal shift
weighted shift
300 600 900 1200
0
0.5
1.0
(b)
|V(I)|
pemb
normal shift
weighted shift
Fig. 4 Comparing the empirical embedding probability (20 inputs)
of improved PSSA with (black) and without (gray) degree-weighted
shift proposals for (a) random cubic and (b) Bara´basi-Albert type in-
put graphs on a King’s graph KG320,320 hardware. A (dashed) vertical
line indicates the embedding threshold 321 of the best known complete
graph embedding.
or alternatively propose shifting v ∈ φ( j) from φ( j) to φ(i).
This biases the shift proposal to assign a leaf node to the
super vertex with lower degree ratio.
In order to evaluate the impact of the degree-weighted
shift proposal, we compared the embedding performance of
improved PSSA using the conventional shift rule and the
degree-weighted shift rule. In both cases we applied the ter-
minal search phase. The embedding performance for a King’s
graph hardware KG320,320 is shown in Fig. 4, for (a) ran-
dom cubic and (b) Bara´basi-Albert-type input graphs. The
degree-weighted shift rule shows mild improvements for ran-
dom cubic graphs. Surprisingly, it clearly decreases the em-
bedding probability for Bara´basi-Albert graphs, for which
its design was originally intended. Finally, it has no effect on
the embedding probability of Erdo˝s-Re´nyi graphs with 20%
edge density. Those remain embeddable only by resorting to
the best known complete graph embedding of K321.
4.3 Improved annealing schedules
As a last step towards designing an improved PSSA, capa-
ble of embedding even more edges of a given type of ran-
dom input graphs, we tried to optimize the functional form
of the annealing schedules. In particular, we tested four dif-
ferent functional forms of the temperature schedule which
shall henceforth be described in more detail.
Schedules – (s1) Our investigation started out with the
double linear schedule of conventional PSSA, exactly as de-
scribed in the last paragraph of Sect. 3. (s2) In addition, we
Minor embedding on large-scale annealers 11
300 600 900 1200
0
0.5
1.0
(a)
|V(I)|
pemb single linear
double linear
single exponential
double exponential
300 600 900 1200
0
0.5
1.0
(b)
|V(I)|
pemb single linear
double linear
single exponential
double exponential
Fig. 5 Comparing the empirical embedding probability (20 inputs) of
improved PSSA using single or double linear (thin/thick gray lines) as
well as single or double exponential (thin/thick black lines) schedules
for (a) random cubic and (b) Bara´basi-Albert type input graphs on a
KG320,320 hardware. A (dashed) vertical line indicates the embedding
threshold 321 of the best known complete graph embedding. Improved
PSSA (a) includes (b) excludes degree-weighted shifts.
tried a single linear schedule which omits the second an-
nealing phase of the double linear schedule and immediately
jumps towards the terminal search phase of improved PSSA
after completing tmax/2 annealing steps. (s3) The third tem-
perature schedule we tried is a double exponential sched-
ule. Its total runtime, gets exactly the same amount of an-
nealing steps tmax as the original double linear schedule. It
further initiates the first and the second annealing phase at
exactly the same temperatures T0 and Ttmax/2 as the origi-
nal double linear schedule. However, rather than decreasing
the temperature linearly, the temperature is decreased expo-
nentially. To this end the temperature is updated every 1000
annealing steps by multiplying the current temperature with
a cooling factor β < 1. A larger value of β corresponds to
a slower cooling rate and tends to give better results. How-
ever, one has to ensure that the system is sufficiently cooled
at the end of each annealing phase in order not to jump out
of the optimal configuration. Here, we used the cooling rate
β = 0.9999. (s4) The fourth and last schedule we tried is
a single exponential schedule which is identical to the dou-
ble exponential schedule in the first annealing phase. Sub-
sequently it skips the second annealing phase and directly
jumps to the terminal search phase of improved PSSA after
completing tmax/2 annealing steps. Finally, we remark that
the schedules ps(t) and pa(t) which coordinate swap and
shift proposals as well as shift directions, remain exactly as
described in the last paragraph of Sect. 3.
Admittedly, our methodology of improving the embed-
ding performance based on the functional form of the an-
nealing schedule is rather phenomenological. Our motiva-
tion for doing so originates from our experience with sim-
ulated annealing on large spin systems, where we observed
that exponential schedules tend to perform better than linear
ones. In a similar manner our methodology is justified by
the phenomenological results depicted in Fig. 5. In this fig-
ure we compare the performance of improved PSSA using
the four different temperature schedules (s1-s4) for embed-
ding (a) random cubic graphs and (b) Bara´basi-Albert ran-
dom graphs into a King’s graph KG320,320. Our results cor-
roborate that the temperature schedules from the exponential
family (black lines) tend to outperform the linear tempera-
ture schedules (gray lines), both on (a) random cubic and (b)
Bara´basi-Albert random graphs. In addition we observe that
the performance of single and double linear as well as single
and double exponential schedules, respectively, is identical
within the bounds of statistical fluctuations. Meanwhile it
has remained impossible to find a single instance of a ran-
dom Erdo˝s-Re´nyi graph with 20% edge density and more
than 321 vertices which could be embedded into the King’s
graphs KG320,320 even with exponential scheduling.
5 Embedding performance on increasing hardware
Finally, we evaluate the empirical embedding threshold V¯ (L)
of PSSA on hardware graphs of increasing size. To this end
we proceed as follows: (i) We fix the size of the hardware by
setting the parameter L= 20. (ii) Starting from |V (I)|=L we
successively increase the size of the input problem. For each
pair (L, |V (I)|) we create 20 input samples Is,s = 1, · · · ,20
and try to embed them into KGL,L using (improved) PSSA.
If at least 19 out of 20 input graphs could successfully be
embedded we proceed by increasing the number vertices
|V (I)|. The first time we find a pair (L, |V (I)|) for which less
than 19 out of 20 cases were embeddable, we record |V (I)|
as the embedding threshold V¯ (L) and reinitiate step (ii) on a
larger King’s graph L← L+20. We stop, if L > 320.
The results are shown in Fig. 6 which depicts the embed-
ding threshold V¯ (L) up to which (improved) PSSA finds mi-
nor embeddings with 95% embedding probability on hard-
ware King’s graph KGL,L of increasing size L. For (a) ran-
dom cubic and (b) Bara´basi-Albert type input graphs we
phenomenologically observe a linear scaling of V¯ (L) with
L. In both cases V¯ (L) approximately scales as V¯ (L) = c×L
with (a) c= 3.2 and (b) c= 2.8. Thus it maintains a clear ad-
vantage over the deterministic minor embedding of KN for
which V¯ (L) = L+1. In addition, this constitutes an average
performance gain of 42% and 28%, respectively, over the
previously published version of PSSA (Sugie et al., 2018).
On the other hand, for (c) Erdo˝s-Re´nyi type input graphs
with 20% edge density V¯ (L) is hardly larger than L+1 even
for small hardware graphs KGL,L, eventually approaching
the threshold of the best known complete graph embedding
12 Y. Sugie et al.
0 100 200 300
0
600
1200 (a)
L
̄V(L) PSSAimp.̄PSSA
0 100 200 300
0
600
1200 (b)
L
̄V(L) PSSAimp.̄PSSA
0 100 200 300
0
200
400 (c)
L
̄V(L) PSSAimp.̄PSSA
Fig. 6 Embedding threshold V¯ (L) of PSSA (gray) and improved PSSA
(black) for increasing hardware King’s graph KGL,L for (a) random
cubic, (b) Bara´basi-Albert, and (c) Erdo˝s-Re´nyi type input graphs. A
dashed (blue) line indicates the minimal embedding threshold ensured
by the existence of the best known complete graph embedding.
KL+1 as L gets larger. The results in Fig. 6 use improved
PSSA with terminal search phase and double exponential
schedules. Degree-weighted shifts are applied only for ran-
dom cubic graphs. A discussion of these results will be given
in Section 7.
6 Can the best known complete graph embedding be
improved?
The results of the previous section show that PSSA can main-
tain an embedding threshold V¯ (L) which exceeds the min-
imal embedding performance ensured by the best known
complete graph embedding for certain inputs even on large
hardware graphs. Yet, it also shows that the embedding thresh-
old of Erdo˝s-Re´nyi graphs with constant edge density can-
not exceed the minimal embedding performance ensured by
the best known complete graph embedding on large hard-
ware graphs. This result emphasizes the outstanding role of
the best known complete graph embedding as a baseline em-
bedding which ensures a minimal embedding performance.
This naturally raises two questions. (i) Can a King’s graph
host minor embeddings of larger complete graphs? (ii) Can
the hardware graph be optimized to host larger complete
graphs?
In this section we address both issues in part. (i) We use
the concept of treewidth to show that a King’s graph KGL,L
cannot contain the minor embeddings of a complete graph
KN with N > 2L vertices. We believe this bound is still quite
loose but cannot prove that the best known complete graph
embedding hosting input graphs KN with N = L+1 vertices
is (close to) optimal. (ii) We show that any hardware graph
H with bounded coordination number d (a common restric-
tion for quantum and CMOS annealers) can, at most, em-
bed complete graphs with O(
√|V (H)|) vertices. Both the
King’s graph and the Chimera graph take this order and are
thus in some sense optimal. Note that the style of the paper
will henceforth become more mathematical. A reader who
is more interested in the discussion of our improved PSSA
is advised to skip to the summary section.
6.1 Upper bound on complete graphs embeddable into a
King’s graph
In this section we prove that a complete graph KN with N
vertices cannot be embedded into a King’s graph KGL,L, if
N > 2L. The cornerstone of our proof is the following prop-
erty, based on the concept of treewidth, which will be de-
fined further below.
Property 1 (Halin (1976)) Let I and H be two graphs. If I is
a minor of H, its treewidth tw(I) is smaller or equal than the
treewidth tw(H) of H
I being a minor of H⇒ tw(I)≤ tw(H). (9)
We then exploit this property by identifying the input with a
complete graph I =KN , whose treewidth is known tw(KN)=
N − 1 (Fomin and Kratsch, 2010). Finally, we identify H
with a King’s graph KGL,L and prove the following upper
bound on the treewidth of a King’s graph further below
tw(KGL,L)≤ 2L−1. (10)
Incidentally, this implies that a complete graph KN with ver-
tices N > 2L cannot be minor embedded into a King’s graph
KGL,L, by means of property 1.
In order to prove Eq. (10) we require some additional
definitions. Let H be a general graph. A path of the graph H
is a sequence of vertices 〈v1,v2,v3, · · · ,vn〉 without any rep-
etition such that (vl ,vl+1)∈E(H) holds for all l = 1, · · · ,n−
1. A cycle of the graph H is a sequence of vertices without
any repetition 〈v1,v2,v3, · · · ,vn〉 such that (vn,v1) ∈ E(H)
and (vl ,vl+1) ∈ E(H) holds for all l = 1, · · · ,n−1. A graph
H is connected, if for all the pairs of vertices v,v′ (v 6= v′),
there is a path connecting v and v′. A tree T¯ is a connected
graph without any cycle. Note that if H is a tree, there exists
a unique path for each pair of vertices in V (H).
Minor embedding on large-scale annealers 13
Definition 6 (Tree Decomposition (Robertson and Sey-
mour, 1986)) A tree decomposition of H is a family {Xi}i∈I¯
of vertex subsets Xi ⊆ V (H) together with a tree graph T¯
connecting the indices i ∈ I¯ of the subsets Xi, such that the
following properties hold. The vertex subsets Xi ⊆ V (H)
cover the vertices and edges of the graph H according to
(T1)
⋃
i∈I¯ Xi =V (H),
(T2) for every edge (v,v′) ∈ E(H), there exists i ∈ I¯ such
that v,v′ ∈ Xi holds.
In addition we require that
(T3) for all i, j,k ∈ I¯, if j is on the path of T¯ from i to k, then
Xi∩Xk ⊆ X j.
Subsequently, for each tree decomposition, we define its width
as maxi∈I¯ |Xi| − 1 through the subset Xi which contains the
maximal amount of vertices |Xi|. Finally, the treewidth of a
graph H is the minimal width among all the possible tree
decompositions of H. The treewidth is denoted by tw(H).
Upper bound for treewidth of King’s graph – We now
prove Eq. (10) by constructing a concrete tree decomposi-
tion of a King’s graph and computing its width. To this end,
let KGL,L be the King’s graph of size L×L and denote its
vertex set as
V (KGL,L) = {xi, j | i, j = 1, · · · ,L}, (11)
where xi, j is the vertex located at the point (i, j) in the plane.
See Fig. 7 for an example.
We construct a tree decomposition of the King’s graph
KGL,L as follows: Let {Xi}i∈I¯ be a family of vertex subsets
Xi =
{
xi, j,xi+1, j | j = 1, · · · ,L
}
, (12)
with i ∈ I¯ = {1, · · · ,L−1}. Let T¯ be a tree on the index set
(V (T¯ ) = I¯) with edges E(T¯ ) = {(1,2),(2,3), · · · ,(L−2,L−
1)} forming a simple path graph 〈1,2,3, ...,L−1〉 on the in-
dices i of Xi. See Fig. 7 for an example. Then, the family
of the subsets {Xi}i∈I¯ along with the tree T¯ is a tree de-
composition of KGL,L. This can be checked as follows: (T1)
holds because
⋃
i∈I¯ Xi = V (KGL,L). (T2) holds because (i)
all horizontal edges (xi, j,xi+1, j)with i= 1, · · · ,L−1 and j=
1, · · · ,L are contained in Xi, (ii) all diagonal (xi, j,xi+1, j+1)
and anti-diagonal edges (xi+1, j,xi, j+1) with i, j = 1, · · · ,L−
1 are contained in Xi, (iii) vertical edges (xi, j,xi, j+1) with
i, j= 1, · · · ,L−1 are contained in Xi while (iv) vertical edges
(xL, j,xL, j+1) with j = 1, · · · ,L− 1 are contained in XL−1.
(T3) holds as well which can be seen as follows: (i) For in-
dices i, k, with |i− k| > 1 the intersection Xi ∩Xk is empty
and (T3) is fulfilled. (ii) For indices |i−k|= 1 assume with-
out loss of generality i < k = i+1. In this case j is either i or
k and the intersection Xi∩Xk =
{
xi+1, j | j = 1, · · · ,L
}
which
is both a subset of Xi and Xk=i+1. (iii) Finally, consider the
case |i− k| = 0. In this case j = i = k and Xi ∩Xk ⊆ X j is
x2,1 x3,1 x4,1 x5,1x1,1
x2,2 x3,2 x4,2 x5,2x1,2
x2,3 x3,3 x4,3 x5,3x1,3
x2,4 x3,4 x4,4 x5,4x1,4
x2,4 x3,4 x4,4 x5,4x1,4
X4X3
X2X1
Fig. 7 Illustration of the proposed tree decomposition for constructing
the upper bound on the treewidth of a King’s graph KGL,L using the
example L= 5. The corresponding sets Xi for i= 1,2,3,4 are indicated
by boxes with thick fat boundaries.
also true. Altogether this shows that our above definition is
a proper tree decomposition.
To complete the proof of Eq. (10) we compute the width
of the tree decomposition, given as maxi∈I¯ |Xi| − 1 = 2L−
1, since all sets in the decomposition contain 2L vertices,
i.e. |Xi| = 2L ∀i ∈ I¯. By the definition of the treewidth, we
conclude that tw(KGL,L)≤ 2L−1.
6.2 Embedding complete graphs into hardware with fixed
coordination number
We now touch upon the question of constructing an alter-
native hardware graph, which may potentially embed larger
complete graphs.
Hardware graph with a fixed coordination number – To
start with, let H be a connected hardware graph. Since we
are evaluating the question of an optimal hardware graph
for the embedding of complete graphs, we will not yet fix its
concrete structure. For the moment, we only specify a fixed
upper bound on the coordination number d for each vertex of
the hardware graph H. We believe that a fixed upper bound
d on the coordination number of each vertex is a reasonable
hardware restriction which cannot easily be overcome in the
foreseeable future. In particular, current quantum annealers
(D-Wave Systems Inc., since 2007) can hardly couple more
than a few super-conducting quantum bits due to technical
restrictions. On the other hand, CMOS annealers (Okuyama
et al., 2017; Takemoto et al., 2019) exploit the sparsity of
the hardware graph for parallel simultaneous spin updates
14 Y. Sugie et al.
and thus require a fixed upper bound on the coordination
number. In order to avoid pathological cases we assume d >
2. (If d = 2, H could at most be a cycle or a path graph).
We now show that hardware graphs with fixed upper
bound d on the coordination number require a minimum of
|V (H)| ≥ N(N−3)
d−2 (13)
vertices to host a minor of a complete graph KN with N ver-
tices. In order to prove this, recall that a complete graph KN
requires connecting each of its vertices to all of its other
N − 1 vertices. However, a single vertex of the hardware
graph can connect to at most d neighbors and can thus in
general not be connected to N− 1 neighbors. On the other
hand, a super vertex comprising a total of S vertices on the
hardware can be connected to many more hardware vertices.
More specifically, a super vertex with S vertices has S×d in-
coming edges. In order to keep the super vertex connected
we have to connect its vertices with at least (S−1) internal
edges, which requires sacrificaing at least 2(S− 1) incom-
ing edges of the super vertex. (Note that each internal edge
is incident to two vertices and thus equates to two incom-
ing edges.) Finally, utilizing the remaining incoming edges
(at most S(d−2)+2) to connect the super vertex to at least
N − 1 hardware vertices which represent all other vertices
of the original complete graph KN requires S(d− 2)+ 2 ≥
(N−1). This results in a minimal size of a super vertex ac-
cording to
S≥ N−3
d−2 . (14)
Further embedding a complete graph KN requires the super
vertices of all N vertices to take the aforementioned minimal
size, thus, resulting in a hardware requirement as specified
in Eq. (13).
Incidentally, the forgoing discussion demonstrates that
for hardware graphs with fixed upper bound on the coordi-
nation number, the hardware spin resources |V (H)| are of
order Ω(N2). On the other hand, the number of vertices N
of the largest embeddable complete graph is inevitably of or-
der O(
√|V (H)|). Both for King’s graphs and for Chimera
graphs (Klymko et al., 2014) the best known complete graph
embeddings fulfill the above orders and are thus optimal in
a certain sense. On the other hand, it remains an interest-
ing question for future research, if graph structures turning
Eq. 13 into an equality exists for arbitrary d. If so, it would
pave the way for constructing hardware graphs with more ef-
ficient resource utilization with respect to complete graphs.
7 Summary
We presented an improved version of PSSA and used it to
evaluate the performance of embedding heuristics on hard-
ware King’s graphs of unprecedented size of 102,400 spins
as released with the latest CMOS annealing processor (Hi-
tachi Ltd., 2018). The algorithmic improvements of PSSA
included (i) an additional search phase, (ii) a degree-oriented
super-vertex shift rule, and (iii) optimized annealing sched-
ules. The embedding performance of the improved PSSA
was investigated with respect to various types of input graphs
for hardware graphs of increasing size. An embedding per-
formance consistently exceeding the embedding threshold
of the best known complete graph embedding by a factor of
c> 1 was observed for random-cubic (c= 3.2) and Bara´basi-
Albert (c = 2.8) graphs. This constitutes an average perfor-
mance gain of 42% and 28%, respectively, over the previ-
ously published version of PSSA (Sugie et al., 2018). On
the other hand, for sparse random graphs with constant edge
density we observe that even the improved PSSA cannot ex-
ceed the embedding threshold of the best known complete
graph embedding on large input graphs. Finally, we derived
a new upper bound on the vertex number of complete graphs
embeddable into a hardware King’s graph and showed that
its size attains the maximal attainable order on hardware
structures with a fixed coordination number.
7.1 Discussion
We now discuss several open questions of our research. (i)
Our results demonstrate that the improved PSSA can out-
perform the best known complete graph embedding on large
hardware graphs for certain inputs, such as random cubic
and Bara´basi-Albert graphs. Yet, it remains a future task
to evaluate whether the improved PSSA would also outper-
form the best known complete graph embedding for input
graphs induced by concrete applications in quadratically un-
constrained binary optimization. We believe that graph col-
oring problems or solving the clique problem (Lucas, 2014)
on social networks are suitable candidate applications. (ii)
It is currently unclear whether the improved PSSA performs
close or far from optimality. To address this question it would
be desirable to compare its embedding threshold to the op-
timal embedding threshold attainable by means of exact al-
gorithms (Adler et al., 2011),
V¯ (H,G ,AiPSSA, p)≤ V¯ (H,G ,Aexact , p). (15)
This would allow for evaluating the headroom for further
improvements of PSSA. (iii) The results of Sect. 6 show that
current hardware graphs are close to optimal for represent-
ing complete graphs. Yet, it remains a question of future re-
search, if hardware graphs H could be optimized to increase
the embedding threshold V¯ (H,G ,A , p) for certain classes
G of input graphs. (iv) It remains a task for future research
to develop easily accessible criteria which indicate whether
an embedding heuristic such as improved PSSA can outper-
form the best known complete graph embedding for increas-
ing hardware graphs on a certain class of input problems G .
Minor embedding on large-scale annealers 15
Empirically, we observed that PSSA can outperform the best
known complete graph embedding on large hardware graphs
for random cubic and Bara´basi-Albert graphs whose edge
set grows only linearly in the number of vertices |E(I)| ∝
|V (I)|. On the other hand, we observed that PSSA cannot
outperform the best known complete graph embedding on
large hardware graphs for Erdo˝s-Re´nyi graphs whose thicker
edge set grows quadratically with the number of vertices
|E(I)| ∝ |V (I)|2. Yet, it is by no means clear, that the re-
verse should hold. That is, we cannot prove that the number
of edges scaling linearly in the size of the vertex set would
guarantee embedding performance superior to the best known
complete graph embedding. Similarly, it is unclear, if the
number of edges growing quadratically in the number of
vertices necessarily implies that the embedding threshold
shrinks to the performance of the best known complete graph
embedding as the hardware increases. (v) The main objec-
tive of improved PSSA is a high embedding threshold which
outperforms the best known complete graph embedding even
on large hardware graphs. In particular for input problems
where this objective fails, it is advisable to optimize the
minor embedding heuristic with respect to other objectives.
Such an objective could, for example, be the runtime of the
embedding heuristic (Goodrich et al., 2018). In addition, a
good minor heuristic should attempt to minimize the size
of its super vertices to avoid invalid solutions in the sub-
sequent annealing phase (Boothby et al., 2016; Venturelli
et al., 2015). This secondary objective has partially been ad-
dressed by the cleanup process in the terminal search phase
of our improved PSSA. Investigating its effect in terms of
precise data on the final distribution of super vertex sizes,
before and after the cleanup of improved PSSA, remains a
subject of future research.
Acknowledgements We thank Hirofumi Suzuki, Kazuhiro Kurita, and
Shoya Takahashi for supporting the organization of the “Hokkaido
University & Hitachi 2nd New-Concept Computing Contest 2017” and
Ko Fujizawa and Masumitsu Aoki for stimulating discussions.
References
Adler I, Dorn F, Fomin FV, Sau I, Thilikos DM (2011) Faster
parameterized algorithms for minor containment. Theo-
retical Computer Science 412(50):7018 – 7028
Albert R, Baraba´si AL (2002) Statistical mechanics of
complex networks. Rev Mod Phys 74:47–97, DOI 10.
1103/RevModPhys.74.47, URL https://link.aps.
org/doi/10.1103/RevModPhys.74.47
Aramon M, Rosenberg G, Valiante E, Miyazawa T, Tamura
H, Katzgraber HG (2018) Physics-inspired optimization
for quadratic unconstrained problems using a digital an-
nealer. ArXiv:180608815 URL https://arxiv.org/
pdf/1806.08815.pdf
Bolloba´s B (1985) Random Graphs, 2nd edn. Cambridge
University Press
Boothby T, King AD, Roy A (2016) Fast clique mi-
nor generation in chimera qubit connectivity graphs.
Quantum Information Processing 15(1):495–508, DOI
10.1007/s11128-015-1150-6, URL https://doi.org/
10.1007/s11128-015-1150-6
Cai J, Macready B, Roy A (2014) A practical heuristic for
finding graph minors, arxiv:1406.2741
Choi V (2008) Minor-embedding in adiabatic quantum com-
putation: I. The parameter setting problem. Quantum In-
formation Processing 7(5):193–209
Choi V (2011) Minor-embedding in adiabatic quantum com-
putation: II. Minor-universal graph design. Quantum In-
formation Processing 10(3):343–353
Cormen TH, Leiserson CE, Rivest RL, Clifford S (2009) In-
troduction to Algorithms, 3rd edn. MIT Press
D-Wave Systems Inc. (since 2007) See publications in tech-
nology section on the homepage of D-Wave Systems
Inc., last accessed july 20th, 2018. URL https://www.
dwavesys.com/
Erdo˝s P, Re´nyi A (1959) Statistical mechanics of complex
networks. Publicationes Mathematicae 6:290–297
Farhi E, Goldstone J, Gutmann S, Lapan J, Lundgren
A, Preda D (2001) A Quantum Adiabatic Evolu-
tion Algorithm Applied to Random Instances of an
NP-Complete Problem. Science 292(5516):472–475,
http://science.sciencemag.org/content/292/
5516/472.full.pdf
Fomin FV, Kratsch D (2010) Exact exponential algo-
rithms, 1st edn. Springer-Verlag, DOI 10.1007/978-3-
642-16533-7
Goodrich TD, Sullivan BD, Humble TS (2018) Optimizing
adiabatic quantum program compilation using a graph-
theoretic framework. Quantum Information Process-
ing 17(5):118, DOI 10.1007/s11128-018-1863-4, URL
https://doi.org/10.1007/s11128-018-1863-4
Halin R (1976) S-Functions for Graphs. J Geom 8:171
Hitachi Ltd. (2018) Hitachi ltd., news release, last accessed
february 20th, 2018. URL http://www.hitachi.co.
jp/New/cnews/month/2018/06/0615.html
Inagaki T, Haribara Y, Igarashi K, Sonobe T, Tamate S,
Honjo T, Marandi A, McMahon PL, Umeki T, Enbutsu
K, Tadanaga O, Takenouchi H, Aihara K, Kawarabayashi
Ki, Inoue K, Utsunomiya S, Takesue H (2016) A
coherent Ising machine for 2000-node optimization
problems. Science DOI 10.1126/science.aah4243,
http://science.sciencemag.org/content/
early/2016/10/19/science.aah4243.full.pdf
Isakov SV, Zintchenko IN, Rønnow TF, Troyer M (2015)
Optimised simulated annealing for Ising spin glasses.
Computer Physics Communications 192:265 – 271
16 Y. Sugie et al.
Johnson MW, Amin MHS, Gildert S, Lanting T, Hamze
F, Dickson N, Harris R, Berkley AJ, Johansson J,
Bunyk P, Chapple EM, Enderud C, Hilton JP, Karimi E
Kand Ladizinsky, Ladizinsky N, Oh T, Perminov I, Rich
C, Thom MC, Tolkacheva E, Truncik CJS, Uchaikin S,
Wang J, Wilson B, Rose G (2011) Quantum annealing
with manufactured spins. Nature 473:194–198
Kadowaki T, Nishimori H (1998) Quantum annealing in the
transverse Ising model. Physical Review E 58:5355–5363
Kawarabayashi Ki, Kobayashi Y, Reed B (2012) The
disjoint paths problem in quadratic time. Journal
of Combinatorial Theory, Series B 102(2):424 –
435, DOI https://doi.org/10.1016/j.jctb.2011.07.004,
URL http://www.sciencedirect.com/science/
article/pii/S0095895611000712
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization
by simulated annealing. Science 220(4598):671–680
Klymko C, Sullivan BD, Humble TS (2014) Adiabatic quan-
tum programming: minor embedding with hard faults.
Quantum Information Processing 13(3):709–729
Kochenberger G, Hao JK, Glover F, Lewis M, Lu¨ Z, Wang
H, Wang Y (2014) The unconstrained binary quadratic
programming problem: a survey. Journal of Combinato-
rial Optimization 28(1):58–81
Lucas A (2014) Ising formulations of many NP problems.
Frontiers in Physics 2:5
Matsubara S, Tamura H, Takatsu M, Yoo D,
Vatankhahghadim B, Yamasaki T H Miyazawa,
Tsukamoto S, Watanabe Y, Takemoto Kea (2018)
Ising-model optimizer with parallel-trial bit-sieve engine.
In: Barolli L, Terzo O (eds) Complex, Intelligent, and
Software Intensive Systems, Springer International
Publishing, pp 423–428
McMahon PL, Marandi A, Haribara Y, Hamerly R, Lan-
grock C, Tamate S, Inagaki T, Takesue H, Utsunomiya S,
Aihara K, Byer RL, Fejer MM, Mabuchi H, Yamamoto
Y (2016) A fully programmable 100-spin coherent
Ising machine with all-to-all connections. Science
354(6312):614–617, http://science.sciencemag.
org/content/354/6312/614.full.pdf
Neven H, Denchev VS, Drew-Brook M, Zhang J, Macready
WG, Rose G (2009) NIPS demonstration 2009: Binary
classification using hardware implementation of quantum
annealing
Okuyama T, Yoshimura C, Hayashi M, Tanaka S, Yamaoka
M (2016) Contractive graph-minor embedding for CMOS
Ising computer. IEICE technical report 116:97–103
Okuyama T, Hayashi M, Yamaoka M (2017) An Ising Com-
puter Based on Simulated Quantum Annealing by Path
Integral Monte Carlo Method. In: 2017 IEEE Interna-
tional Conference on Rebooting Computing (ICRC), pp
1–6
Robertson N, Seymour P (1995) Graph minors .xiii. the dis-
joint paths problem. Journal of Combinatorial Theory, Se-
ries B 63(1):65 – 110, DOI https://doi.org/10.1006/jctb.
1995.1006, URL http://www.sciencedirect.com/
science/article/pii/S0095895685710064
Robertson N, Seymour PD (1986) Graph Minors. II. Algo-
rithmic Aspects of Tree-Width. J Alg 7:309
Steger A, Wormald NC (1999) Generating random regular
graphs quickly. Combinatorics, Probability and Comput-
ing 8(4):377396
Sugie Y, Yoshida Y, Mertig N, Takemoto T, Teramoto H,
Nakamura A, Takigawa I, Minato SI, Yamaoka M, Ko-
matsuzaki T (2018) Graph minors from simulated an-
nealing for annealing machines with sparse connectivity.
In: Fagan D, Martı´n-Vide C, O’Neill M, Vega-Rodrı´guez
MA (eds) Theory and Practice of Natural Computing,
Springer International Publishing, Cham, pp 111–123
Takemoto T, Hayashi M, Yoshimura C, Yamaoka M (2019)
A 2x30k-Spin Multichip Scalable Annealing Processor
Based on a Processing-in-Memory Approach for Solving
Large-Scale Combinatorial Optimization Problems. IEEE
International Solid-State Circuits Conference (ISSCC),
Dig Tech Papers pp 52–53
Tsukamoto S, Takatsu M, Matsubara S, Tamura H (2017)
An accelerator architecture for combinatorial optimiza-
tion problems. FUJITSU Sci Tech J 53:8–13, URL
http://www.fujitsu.com/global/documents/
about/resources/publications/fstj/archives/
vol53-5/paper02.pdf
Venturelli D, Mandra` S, Knysh S, O’Gorman B, Biswas
R, Smelyanskiy V (2015) Quantum optimization of fully
connected spin glasses. Phys Rev X 5:031040, DOI 10.
1103/PhysRevX.5.031040, URL https://link.aps.
org/doi/10.1103/PhysRevX.5.031040
Xia R, Bian T, Kais S (2018) Electronic Structure Cal-
culations and the Ising Hamiltonian. J Phys Chem B
122(13):3384–3395
Yamaoka M, Yoshimura C, Hayashi M, Okuyama T, Aoki H,
Mizuno H (2016) A 20k-Spin Ising Chip to Solve Combi-
natorial Optimization Problems With CMOS Annealing.
IEEE Journal of Solid-State Circuits 51(1):303–309
Zaribafiyan A, Marchand DJJ, Changiz Rezaei SS (2017)
Systematic and deterministic graph minor embedding
for cartesian products of graphs. Quantum Informa-
tion Processing 16(5):136, DOI 10.1007/s11128-017-
1569-z, URL https://doi.org/10.1007/s11128-
017-1569-z
Zhu Z, Ochoa AJ, Katzgraber HG (2015) Efficient Clus-
ter Algorithm for Spin Glasses in Any Space Dimension.
Phys Rev Lett 115:077201
Minor embedding on large-scale annealers 17
8 Appendix
8.1 Experimental conditions
Random cubic graphs are generated using Algorithm 1 from
(Steger and Wormald, 1999) for a target of |V (I)| vertices
of degree d = 3. Note that this implies that the number of
edges grow as |E(I)|= 3|V (I)|/2.
Bara´basi-Albert graphs are generated as described in (Al-
bert and Baraba´si, 2002). In order to keep the graph con-
nected we start out with m0(= 2) connected vertices. We
then succesively add new vertices to the graph. Every time
we add a new vertex we connect it with the existing vertices
by adding m(= 2≤m0) edges based on preferential attache-
ment. The proceedure is repeated until the graph has reached
a prescribed number of vertices |V (I)|. Note that the final
graph has |E(I)|= m0(m0−1)/2+m(|V (I)|−m0) edges.
Erdo˝s-Re´nyi graphs are created by growing a tree up
to the desired number of vertices. To this end we add one
vertex at a time and connect it to one of the existing ver-
tices with equal probability. Subsequently, we add edges to
the tree by filling unoccupied edges with equal probability
until the prescribed edge density ρ is reached. The slight
modification from the standard Erdo˝s-Re´nyi procedure en-
sures that the resulting random graph is connected. Note
that the number of edges of our random graphs grows as
|E(I)| = ρ|V (I)|(|V (I)|−1)/2. In this paper we always set
ρ = 0.2 using an edge density of 20%. Our results on the
scalability do not crucially depend on the precise value of
ρ , that is, for large hardware graphs PSSA cannot improve
over the embedding performance of the best known com-
plete graph embedding, also for other values of ρ .
Schedule parameters – All data in this paper is produced
using a maximum of tmax = 7× 107 iterations and initial
temperatures T0 = 60.315 and Ttmax/2 = 33.435. Linear sched-
ules always terminate each annealing phase with tempera-
ture T = 0. Exponential schedules use a cooling coefficient
β = 0.9999 to update the temperature every 1000 annealing
steps as T ← β ∗T . Swap and shift proposals are scheduled
with linear schedules using ps(0) = 1, ps(tmax) = 0, pa(0) =
0.095, pa(tmax) = 0.487.
8.2 Implementation of the terminal search phase
In this appendix we describe possible implementations of
the subroutines is deletable(·) and bfs path(·) as required
in the terminal search phase of the improved PSSA.
The subroutine is deletable(·) returns that a hardware
vertex u ∈ V (H) is deletable from its super vertex φ(i), if
both (p1) deleting u from φ(i) does not destroy the non-
empty connected structure of the super vertex (M1) and (p2)
deleting u from φ(i) does not decrease the number of em-
bedded edges Eemb(φ). The subroutine is deletable(·) checks
u u
(a) (b)
b0 b1 b2
b5 b6 b7
b3 b4
b0 b1 b2
b5 b6 b7
b3 b4
Fig. 8 Illustration of the bit pattern encoding the neighborhood of
u ∈ V (H). Bits b j of neighboring vertices belonging to the same su-
per vertex as u are set to 1 and shown in black, while bits belonging
to a super vertex different from u are set to 0 and shown in white. Bit
patterns for which (a) u is not deletable and (b) u is deletable from its
super vertex without destroying property (M1), respectively.
for property (p1) by encoding the super vertex occupation
of the 8 neighboring vertices v0, · · · ,v7 ∈ V (H) of u with
(u,vν) ∈ E(H) into a pattern of 8 bits bν ,ν = 0, · · · ,7. See
Fig. 8. In particular, the ν th bit is set to 1, if vν belongs to
the same super vertex φ(i) as u and 0 otherwise
bν =
{
1 if vν ∈ φ(i) with u ∈ φ(i),
0 if vν /∈ φ(i) with u ∈ φ(i).
(16)
The subroutine is deletable(·) then checks property (p1) by
computing the bit pattern of the current vertex u and compar-
ing it to a precomputed list of deletable patterns. Deletable
patterns have been obtained by inspecting all the 256 possi-
ble bit patterns and hard coding the deletable ones into the
subroutine. If the bit pattern of the current vertex u is not in
the list of deletable bit patterns, the subroutine decides that
u is not deletable and returns. If the bit pattern of the current
vertex u is in the list of deletable bit patterns, the subroutine
proceeds to checking (p2).
To check if u is deletable in the sense of (p2), the sub-
routine proceeds as follows. First, it scans the 8 neighbors
vν ,ν = 0, · · · ,7 of u ∈ φ(i) and checks the corresponding
edges (u,vν)∈ E(H). During this process the corresponding
pre-images jν = φ−1(vν)∈V (I),ν = 0, · · · ,7 are computed.
If none of the pairs (i, jν) represents an edge of the input
graph (i, jν)∈E(I) the subroutine decides that u is deletable
and returns. On the other hand, for every pair (i, jν) ∈ E(I)
which represents an edge of the input graph the subroutine
records the vertex jν ∈V (I) in a unique and sorted list. The
number of occurrences n(i, jν) is tracked in a correspond-
ing list. Finally, the subroutine checks if ∀ jν in the unique
and sorted list of vertices the amount of hardware represen-
tations n(i, jν) of the edge (i, jν) is smaller than the total
number of hardware representations N(i, jν). If this condi-
tion is fulfilled, the subroutine updates N(i, jν)←N(i, jν)−
n(i, jν) for all ∀ jν in the unique and sorted list of vertices
and returns with u being deletable. Otherwise, it returns with
u not being deletable. Note that the data on the amount of
hardware edges N(i, j) representing the edge (i, j) ∈ E(I)
18 Y. Sugie et al.
(a) (b)
ϕ(j)
ϕ(j)
ϕ(j)
ϕ(i)
ϕ(i)
u
ϕ(j1)
ϕ(i)
ϕ(i)
u
ϕ(j1) ϕ(j1)
ϕ(j2) ϕ(j2)
Fig. 9 Illustration of (a) a non-deletable and (b) a deletable vertex u
in the sense of (p2). (a) Checking the neighborhood of the vertex u ∈
φ(i) gives n(i, j) = 3 hardware representations of the edge (i, j)∈ E(I)
and five vertices and edges (gray and black), respectively, which do
not represent an edge of the input graph. Assuming that φ(i) and φ( j)
have no further links elsewhere on the hardware, results in u being
not-deletable. (b) Checking the neighborhood of the vertex u ∈ φ(i)
gives n(i, j1) = 3 hardware representations of the edge (i, j1) ∈ E(I)
and n(i, j2) = 2 hardware representations of the edge (i, j2) ∈ E(I).
Furthermore, there are three vertices and edges, respectively, (gray and
black) which do not represent an edge of the input graph. Since, φ(i)
is linked with φ( j1) by more than n(i, j1) = 3 edges and φ(i) is linked
with φ( j2) by more than n(i, j2) = 2 edges, u is deletable from φ(i).
on the hardware has to be computed before the subroutine
is deletable(·) is called for the first time. A sketch illustrat-
ing the check of deletability in the sense of (p2) is illustrated
in Fig. 9.
The subroutine bfs path(·) will operate on the hardware
graph H. It will try to create a hardware link between φ(i)
and φ( j) using the breath first search algorithm described in
(Cormen et al., 2009, pp 594-602) with slight modifications,
as follows: (i) The queue will be initialized by including all
vertices of φ(i) into the queue, marking their current sta-
tus as gray, i.e., in the queue. The free hardware vertices U ,
Eq. (6), are initialized as white, i.e., unvisited. The vertices
of φ( j) are marked as green, i.e., target reached. Finally, all
other vertices are marked as red, denoting occupied vertices
of the hardware. (ii) The subroutine bfs path(·) will then
successively dequeue vertices u from the queue and check
all its adjacent nodes v in H. If v is an occupied or a vis-
ited node, it will be skipped. If v is an unvisited node, v is
enqueued, u is registered as its parent and v’s status is set
to gray, i.e., in the queue. Finally, if v is a target node with
v ∈ φ( j), breadth first search is stopped and the subroutine
proceeds to the cleanup. After checking all nodes v adjacent
to u in H the color of u is updated to black, i.e., visited and
breadth first search proceeds to dequeuing the next vertex
from the queue. (iii) If a target node v ∈ φ( j) is found, its
parents are traced back to φ(i) and the corresponding path of
vertices is removed from the the set of free hardware vertices
U , Eq. (6), and assigned to the super vertex φ(i). After doing
so bfs path(·) returns successfully and proceeds to the next
edge (i, j)∈ E(I) whose super vertices φ(i),φ( j) are not yet
linked on the hardware H. Alternatively, bfs path(·) might
return unsuccessfully with an empty queue without hitting a
single target vertex. In this case U and φ remain unchanged.
