Towards Efficient Superconducting Quantum Processor Architecture Design by Li, Gushu et al.
Towards Efficient SuperconductingQuantum
Processor Architecture Design
Gushu Li
University of California
Santa Barbara, USA
gushuli@ece.ucsb.edu
Yufei Ding
University of California
Santa Barbara, USA
yufeiding@cs.ucsb.edu
Yuan Xie
University of California
Santa Barbara, USA
yuanxie@ece.ucsb.edu
ABSTRACT
More computational resources (i.e., more physical qubits and
qubit connections) on a superconducting quantum processor
not only improve the performance but also result in more
complex chip architecture with lower yield rate. Optimizing
both of them simultaneously is a difficult problem due to
their intrinsic trade-off. Inspired by the application-specific
design principle, this paper proposes an automatic design flow
to generate simplified superconducting quantum processor
architecture with negligible performance loss for different
quantum programs. Our architecture-design-oriented profiling
method identifies program components and patterns critical to
both the performance and the yield rate. A follow-up hardware
design flow decomposes the complicated design procedure
into three subroutines, each of which focuses on different
hardware components and cooperates with corresponding
profiling results and physical constraints. Experimental results
show that our design methodology could outperform IBM’s
general-purpose design schemes with better Pareto-optimal
results.
1 INTRODUCTION
As a promising computation paradigm, Quantum Comput-
ing (QC) has been rapidly growing in the last two decades
and found its strong potential in many important areas, in-
cluding machine learning [1, 2], chemistry simulation [3, 4],
etc. In particular, the superconducting quantum circuit [5] has
become one of the most promising technique candidates for
building QC systems [6–8] due to the ever-increasing qubit
coherence time, individual qubit addressability, fabrication
technology scalability, etc. Towards efficient superconducting
quantum circuit based QC system, significant research has
recently been conducted, ranging from compiler optimiza-
tion [9, 10] to periphery control hardware support [11, 12]
and device innovation [13, 14].
Despite these system optimizations, the performance of a
superconducting quantum processor is still highly limited by
the amount of computation resource on it. Researchers have
been trying to integrate more qubits and qubit connections
on one superconducting quantum processor substrate. For
example, IBM’s first superconducting quantum chip on the
cloud has 5 qubits with 6 qubit connections, while its latest
published chip has 20 qubits with 37 qubit connections [15].
Increasing the number of physical qubits on a superconduct-
ing quantum processor allows programs with more logical
qubits to be executed. Denser qubit connections can increase
the overall chip performance by reducing the overhead of
qubit mapping and routing [16–19].
Nevertheless, more qubits and qubit connections will, un-
fortunately, increase the probability of defect occurrence on
a chip, leading to lower yield rate and blocking future devel-
opment of larger-scale superconducting quantum processor.
For example, the yield rate of a 17-qubit chip can be lower
than 1% under IBM’s state-of-the-art technology [20]. Such
a low yield rate comes from frequency collision, a unique
defect on superconducting quantum processors [21, 22]. The
frequencies of physically connected qubits may ‘collide’ with
each other when their values satisfy some specific conditions.
More qubit connections naturally increase the probability of
frequency collision and lower the yield rate.
To optimize both the yield rate and performance would
be desirable, but it is difficult in general due to the inherent
trade-off between these two objectives. Most previous efforts
on them are direct device-level improvement [13, 14, 23, 24],
while little attention has been given to the architectural design
of a superconducting quantum processor. This paper fills
the gap by exploring the possibility of efficient application-
specific architecture design to reach an optimized balance
between yield rate and performance. We vision that an array
of QC accelerators, each of which is tailored to a specific
application, is much more likely to be adopted in the near term
where computation resources are still limited before we can
reach a universal quantum computer. Our design shares the
same high-level spirit with the hardware architecture designs
in classical computing (e.g., machine learning [25, 26], graph
processing [27, 28]), but faces different scenarios because
both the program patterns and the hardware design space are
different in QC.
In particular, we highlight two key challenges to be ad-
dressed before the application-specific principle can be ap-
plied in superconducting quantum processor design. First, we
ar
X
iv
:1
91
1.
12
87
9v
1 
 [q
ua
nt-
ph
]  
28
 N
ov
 20
19
Coupling
Degree
List
Coupling 
Strength 
Matrix
Layout 
Design
Hardware Architecture Design Flow
Bus 
Selection
Frequency 
Allocation
Location 
Constraint
Connection 
Constraint
Collision 
Conditions
Physical Constraints
Quantum 
Program
Program
Profiling
Efficient 
Application-
specific 
Architecture
Profiling 
Information
Figure 1: Overview of the Proposed Architecture Design Flow
need to identify and abstract the computation pattern of quan-
tum programs that can guide the hardware architecture de-
sign. Prior quantum program analysis studies [29–34] mainly
focused on software or compiler optimization and cannot
extract appropriate information for hardware architecture op-
timization. Second, the abstracted computation pattern must
give guidance to efficient architectural designs, which em-
ploy fewer computation resources with physical constraints
satisfied to achieve both high yield rate and performance. Ex-
isting superconducting quantum processor design schemes
cannot handle such irregular/complicated application-specific
architecture design tasks [20, 35–37].
To overcome these two challenges, we design a system-
atic design flow to automatically generate efficient supercon-
ducting quantum processor architecture designs for different
quantum programs (shown in Figure 1). We first identify two
key computation patterns in quantum programs, coupling de-
gree list and coupling strength matrix. A profiler is built to
automatically extract them from an input quantum program.
Both of them are critical to the program performance and
hardware yield rate, and thus optimizing their underlying ar-
chitecture support can potentially achieve a better balance
between the performance and yield rate. We then propose an
architecture design flow, which comes with three key subrou-
tines, layout design, bus selection, and frequency allocation.
Each subroutine focuses on different hardware resources and
must cooperate with corresponding profiling results and phys-
ical constraints. We further propose an array of heuristics to
ensure the scalability and effectiveness of the architecture
search process. Empirical studies show that these heuristics
can find ‘near-optimal’ solution in the reduced search space.
In summary, this paper makes the following contributions:
• We are the first to identify the optimization opportu-
nity from the architecture level to push forward the
balance between performance and hardware yield rate
for superconducting QC processors.
• We formalize an end-to-end design flow, equipped with
a set of novel algorithmic primitives, to automatically
generate a series of application-specific architectural
designs under different hardware resource limits.
• Comprehensive experiments show that our design flow
could outperform IBM’s general-purpose designs with
better Pareto-optimal results, e.g., magnitudes of yield
improvement with negligible performance loss.
2 BACKGROUND
In this section, we will introduce the necessary QC basics for
understanding the following program profiling and supercon-
ducting quantum processor architecture design.
2.1 QC Program Basics
A quantum program can be represented in the well adopted
quantum circuit model [38]. We will start from the basic
components in a quantum circuit and then illustrate how they
compose a quantum circuit.
Logical Qubit and Quantum Operation A quantum pro-
gram consists of some logical qubits as variables and some
quantum operations which can modify the state of the qubits.
Qubit is the basic information processing unit in QC, which
has two basis states denoted as |0⟩ and |1⟩. One qubit can
be not only the basis states themselves but also their linear
combinations which can be depicted by a vector in the Hilbert
space. The state of the qubits can be modified by quantum
operations. The first type of quantum operation is unitary
operation, also known as quantum gates in the circuit model,
which can implement a unitary transformation on the qubit
state. Quantum gates can be applied on single qubit or multi-
ple qubits. The second type is measurement operation, which
forces the qubits to collapse to basis states.
Quantum Circuit Quantum circuit is a model of QC in
which the computation is a sequence of quantum gates and
measurement operations. The state of the qubits is first ini-
tialized and then manipulated by a sequence of operations.
Single-qubit gates and measurement operations are applied
on individual qubits while two-qubit gates are applied on two
logical qubits. It has been proved that any multi-qubit gate can
be decomposed into a series of single-qubit gates and CNOT
gates (a specific two-qubit gate) [39]. This is also the basic
gate set directly supported on IBM’s devices. As a result, this
paper assumes that the quantum circuit has been decomposed
and gates with three or more qubits are not considered.
2.2 Superconducting Quantum Circuit Basics
All the qubits and quantum operations in a quantum cir-
cuit must be implemented in a real physical QC system to
execute the program. In this paper, we focus on supercon-
ducting quantum processors with fixed-frequency Josephson-
junction-based transmon qubits [13] and all-microwave cross-
resonance two-qubit gates [40] adopted by IBM [20].
Physical Qubit and Frequency Figure 2 shows the physi-
cal circuit and energy levels of a transmon qubit [13]. Due to
the nonlinearity of the Josephson junction, the gaps between
the energy levels in this quantum anharmonic oscillator are
different, which allows us to use the ground state |0⟩ and
the first-excited state |1⟩ as the computation basis without
populating other states. Suppose the energy gap between |0⟩
and |1⟩ for a qubit is E01. The frequency of this qubit f01 is
defined as f01 = E01h, where h is the Planck constant. Simi-
larly, we use f12 to represent the energy gap between |1⟩ and
|2⟩. For a typical qubit design with effective operations [21],
f01 and f12 are about 5GHz and 4.66GHz, respectively. The
anharmonicity of this qubit is defined to be δ = f12 − f01,
which is −340MHz under this typical design [37, 41].
Qubit Layout The superconducting physical qubits are
confined on a 2-dimensional planar substrate. Although the
qubit placement can be flexible, major vendors fabricate the
qubits in a regularized structure to ensure scalability and
reduce the fabrication complexity. For example, IBM’s 16-
qubit and 20-qubit chips [42] placed their qubits on the nodes
of 2× 8 and 4× 5 lattices, respectively. Google’s 72-qubit
chip placed its qubits on some nodes of an 11×12 lattice [43].
Qubit Connection To enable two-qubit gates between two
physical qubits, resonators, also known as qubit buses, are
employed to connect nearby qubits [40]. For examples, Fig-
ure 2 shows two types of commonly used buses. The first one
is a 2-qubit bus connecting two physical qubits. The second
one is a 4-qubit bus, which connects four physical qubits in
a square together. The coupling graphs of these two types of
buses are shown on the right. Compared with a 2-qubit bus,
4-qubit bus support two-qubit gates on not only the four qubit
pairs on the edges but also two qubit pairs on the diagonals.
Qubit Mapping It is usually assumed that a two-qubit gate
can be applied on arbitrary two logical qubits in a quantum
program but some two-qubit gates may not be executable due
E01
E12
Energy Levels
4-Qubit 
Bus
2-Qubit 
Bus
Physical 
Connection
Coupling 
Graph
Transmon Qubit
Josephson
Junction
|2˃ 
|1˃ 
|0˃ 
Figure 2: Superconducting Qubit and Connection
to the limited qubit connection on a superconducting quantum
processor. On the hardware side, this problem can be relieved
by employing more physical qubit connections so that two-
qubit gates can be directly supported on more qubit pairs.
On the software side, a qubit-remapping compiler [44] can
resolve the dependency of the remaining unexecutable two-
qubit gates while additional operations must be introduced
with longer execution time and higher error rate. Therefore,
more physical qubit connections can help with the overall
performance by allowing native two-qubit gates on more
physical qubit pairs.
Fabrication Variation Variation is inevitable when fab-
ricating a superconducting quantum processor. If a qubit is
designed to have frequency f , the actual frequency after fabri-
cation will be f ′ = f +n f , where n f satisfies Gaussian distri-
bution N0,σ . σ is the fabrication precision parameter, which
is around 130MHz∼ 150MHz under IBM’s state-of-the-art
technology [20]. Such noise makes it hard to predict the post-
fabrication frequency precisely, which brings the probability
of frequency collision.
j
k
k
k
k
i
ii
i
i
j
 Conditions Thresholds 
1 𝑓𝑗 ≅ 𝑓𝑘  ±17𝑀𝐻𝑧 
2 𝑓𝑗 ≅ 𝑓𝑘 − 𝛿/2 ±4𝑀𝐻𝑧 
3 𝑓𝑗 ≅ 𝑓𝑘 − 𝛿 ±25𝑀𝐻𝑧 
4 𝑓𝑗 > 𝑓𝑘 − 𝛿  
5 𝑓𝑖 ≅ 𝑓𝑘  ±17𝑀𝐻𝑧 
6 𝑓𝑖 ≅ 𝑓𝑘 − 𝛿 ±25𝑀𝐻𝑧 
7 2𝑓𝑗 + 𝛿 ≅ 𝑓𝑘 + 𝑓𝑖  ±17𝑀𝐻𝑧 
 
Condition 1, 2, 3, 4
Condition 5, 6, 7
Figure 3: Frequency Collisions Conditions [20, 22]
Frequency Collision When two or three qubits are con-
nected, frequency collision may happen and cause defects on
the device. Figure 3 summaries seven qubit frequency col-
lision conditions in IBM’s devices [20, 22]. On the left is a
table showing the conditions and thresholds of different colli-
sion situations. Condition 1, 2, 3, and 4 involve two connected
qubits (j and k). Condition 5, 6, and 7 involve three qubits of
which two qubits (k and i) both connect to the other qubit j.
The approximate equations and the corresponding thresholds
determine whether one frequency collision happens. For ex-
ample, if qubit j and k are connected and | f j− fk|< 17MHz,
then the first condition is satisfied and frequency collision
occur. Note that the fourth condition has no threshold because
it is an inequality rather than an approximate equation. On
the right is a graphical illustration, showing the geometric
locations of the qubits that may have frequency collisions
of different conditions in two subfigures. Each circle repre-
sents one qubit and the gray square represent a 4-qubit bus
connecting the four surrounding qubits.
3 QUANTUM PROGRAM PROFILING
The first step towards the development of an architecture-
specific quantum processor for both high performance and
yield rate is to determine what program information we should
focus on. There are several different types of components in
a quantum circuit but not all of them will significantly af-
fect the hardware design. Our target program component(s)
should satisfy two conditions. 1) The component’s execution
is a performance bottleneck which can be dramatically im-
proved with optimized hardware support. 2) The component’s
required hardware should significantly affect the yield rate.
We found that two-qubit gates can be a key factor to bridge
performance and yield. To execute two-qubit gates on a quan-
tum processor with limited qubit-to-qubit coupling, a large
number of additional operations are introduced to satisfy their
dependencies. But implementing two-qubit gates on two phys-
ical qubits require on-chip qubit connections which can lower
the yield rate through increasing the probability of frequency
collision. Therefore, we give logical qubits and qubit pairs
priorities based on the number of involving two-qubit gates
to help with the following architecture design. Critical qubits
and qubit pairs will have more hardware support to improve
the efficiency of the generated architectures.
These remaining components, single-qubit gates, initializa-
tion, and measurement operations, do not involve qubit-to-
qubit interactions and all happen locally on individual qubits
when they are implemented on hardware. As a result, hard-
ware support for these components will not affect the chip
yield through frequency collision.
3.1 Profiling Method
As discussed above, our profiling will focus on the logical
qubits and the two-qubit gates. Figure 4 shows an example to
illustrate the profiling procedure. Suppose we have a quantum
circuit as shown in Figure 4 (a). It has 5 logical qubits denoted
by q0,1,2,3,4. All of them are initialized to be |0⟩. Then some
single-qubit gates and two-qubit gates are applied. Measure-
ment operations are at the end.
We first ignore all single-qubit gates, initialization, and
measurement operations. Then we create a logical coupling
graph, in which each vertex represents one logical qubit in the
circuit. Two vertices are connected by an undirected edge if
there exists two-qubit gates applied on the two corresponding
logical qubits. The weight of an edge is the number of two-
qubit gate instances on the two connected vertices. In this
example, Figure 4 (b) shows the generated graph for the
example circuit. The weight of the edge between vertex q0
and vertex q4 is 2 since there are two two-qubit gates on q0
and q4. For all other edges, the weight is 1 because there is
only one two-qubit gate on each of those qubit pairs. The
first profiling result is the weighted adjacency matrix of the
logical coupling graph, namely the coupling strength matrix.
The element with indices i, j represents the number of two-
qubit gates between qi and q j. Figure 4 (c) shows the coupling
strength matrix for the example circuit. Note that coupling
strength matrix is always a symmetric matrix.
The second result is coupling degree list. For each qubit,
we sum the weights of edges that connect to its corresponding
vertex and define the number of two-qubit gates applied on it
as the coupling degree of one qubit. If one qubit is associated
with more two-qubit gates in a quantum circuit than other
qubits, this qubit will use the physical qubit connections more
frequently when executing on the chip. Naturally, we should
pay more attention to those qubits with larger coupling degree.
Therefore, all qubits are placed in a sorted list, namely the
coupling degree list. Figure 4 (d) is the coupling degree list in
this example. The first one in this list is q4 because it has the
largest coupling degree. All qubits are in a descending order.
3.2 Gate Pattern Examples
In this section, we show the existence of distinct two-qubit
gate patterns and discuss the opportunity for application-
specific architecture design with two examples. Figure 5
shows their coupling strength matrices. On the left is an
8-qubit UCCSD ansatz for VQE, a quantum simulation al-
gorithm [4]. The high coupling strength qubit pairs form a
chain structure marked by a red rectangle. Q0 and Q1 have
a large number of two-qubit gates between them, as well
as {Q1Q2, Q2Q3, · · · , Q6Q7}. For other qubit pairs, the cou-
pling strength is much lower (only about 10%). On the right
is a 15-qubit quantum arithmetic function [45]. The coupling
strength among Q0Q1 · · ·Q5 are 0 since there are no two-qubit
gates on any two of them. However, there is a large number
of two-qubit gates where one qubit is in the set Q7,8,9,10 and
the other qubit is in the set Q10,11,12 (marked by a red circle).
The analysis of these two motivating examples provides us
two observations:
(1) The numbers of two-qubit gates on different logical
qubit pairs can vary dramatically in a real quantum
program.
0 1 2 3 4
0
1
2
3
4
0
1
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
0
1
1
1
1
02
2
0
1
2
q0
q1 q2
q3
q4
2 1
111
(a) (b) (c) (d)
q0
q1
q2
q3
q4
3
2
1
1
5
Qubit id CNOT #
Figure 4: Example of the Profiling Method
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0
0
0
0
0
0
0
0
0
8
16
16
6
0
6
0
0
0
0
0
0
0
0
4
4
16
20
10
0
6
0
0
0
0
0
0
0
0
0
0
8
16
6
0
6
0
0
0
0
0
0
0
8
0
12
12
16
4
4
0
0
0
0
0
0
0
0
0
4
8
16
16
4
4
4
0
0
0
0
0
0
8
0
4
4
8
12
22
4
10
0
0
0
0
0
8
0
0
8
0
8
20
8
0
0
0
0
0
8
0
0
0
0
64
30
0
6
132
44
44
0
4
0
0
4
4
8
64
0
32
70
104
64
0
0
8
4
0
12
8
4
0
30
32
0
60
144
40
40
16
16
8
12
16
8
8
0
70
0
132
92
4
4
16
20
16
16
16
12
20
6
104
60
132
0
56
24
42
6
10
6
4
4
22
8
132
64
144
92
56
0
58
64
0
0
0
4
4
4
0
44
0
40
4
24
58
0
6
6
6
0
4
10
0
44
0
40
4
42
64
0
160
160
214
214
0
20
40
60
80
100
120
140
160
180
200
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
0
0
0
0
0
0
0
0
64
48
32
16
0
0
0
96
64
32
0
0
64
0
96
48
0
0
48
96
0
64
0
0
32
64
96
0
0
0
16
32
48
64
0
0
0
0
0
0
0
0
588
588
768
768
748
748
720
720
748
748
768
768
588
588
0
100
200
300
400
500
600
700
UCCSD_ansatz_8,   8 qubits,   VQE Misex1_241,   15 qubits,   quantum arithmatic
Figure 5: Qubit Coupling Strength Patterns for Two
Programs
(2) Different types of quantum programs can have different
two-qubit gate patterns.
These observations suggest that quantum processors can be
customized for different programs with different patterns.
An efficient architecture can focus on supporting the high-
density coupling in a quantum program to reduce the number
of connections on-chip. For example, a quantum processor
with an 8-qubit chain structure (8 qubits and 7 qubit connec-
tions) can immediately support most of the two-qubit gates
in the 8-qubit UCCSD ansatz program. The rest two-qubit
gates can be supported through remapping without introduc-
ing too many additional operations because the total number
of the remaining two-qubit gates is relatively small. Such
application-specific QC accelerators with simplified archi-
tectures can be a more realistic goal in the near term than a
general-purpose quantum processor with a large number of
hardware resources.
4 ARCHITECTURE DESIGN
After a quantum circuit is profiled, a straightforward quan-
tum processor architecture for such a circuit is to organize
the on-chip qubits and qubit connections directly based on
the logical coupling graph. However, we must consider the
physical constraints for a practical architecture. For example,
a logical coupling graph may not be perfectly fabricated on
hardware since the allowed connections among superconduct-
ing qubits are very limited. Moreover, we hope to improve
the yield rate by delivering architecture designs with fewer
hardware resources. Therefore, the proposed hardware design
flow must not only invest more hardware resource on fre-
quent operations based on the profiling results, but must also
obey the physical constraints on the hardware components
arrangement.
To accomplish such a complicated task in a scalable way,
we decouple the hardware design procedure into three sub-
routines and each subroutine focuses on different architecture
components, i.e., qubit layout, connection, and frequency. For
each subroutine, we first review the difficulty and the physical
constraints considered. Then we discuss the design objectives,
and how they are achieved in the proposed design algorithms.
4.1 Layout Design
The first step is to determine where to place the qubits. To
ensure scalability and modularity, we follow the convention
from major vendors introduced in Section 2 and will only
place qubits on the nodes of a 2D lattice. We start from a
large 2D lattice, in which each node is initialized to be empty
(Figure 6 (a)). Then physical qubits can be placed in the empty
nodes and one node can contain at most one qubit.
There are many ways to place a given number of qubits on a
2D lattice. For example, 16 qubits can constitute a 4×4 lattice,
a 2×8 lattice, or other more irregular structures. But we need
to select one qubit layout that is most suitable for executing
the program, i.e., most operations can be directly supported or
indirectly supported with low overhead. The objectives of this
qubit layout design subroutine are summarized as follows.
• Since we need to consider the profiling information, we
create a pseudo mapping between logical qubits in the
profiled program and the physical qubits in hardware
q0
q1
q2
q3
q4
3
2
1
1
5
Qubit id CNOT #
q4q1
q0
q2
q3
1
2
3
4
5
(a) (b)
Figure 6: (a) Empty Lattice (b) Qubit Placement
Example
Algorithm 1: Qubit Placement on 2D Lattice
Input: coupling degree list L, coupling strength
matrix M
Output: Geometric coordinates of placed qubits
1 Place the qubit with the largest coupling degree in L at
one node with coordinate 0,0;
2 R = all the qubits remaining; // R is the set of
qubits that has not been placed yet.
3 while R is not empty do
/* Find the next qubit to place */
4 qubit_candidate_list =∅ ;
5 for q in R do
6 if q is connected to any placed qubits then
7 qubit_candidate_list.appendq;
8 end
9 end
10 Find the qubit q with the largest coupling degree in
qubit_candidate_list;
11 node_cost = ;
/* Determine the placement location */
12 for location of the nodes that are empty and
connected to at least one occupied node do
/* Heuristic Cost function */
13 node_costlocation =
q′∈q.neighbors
Mq,q′ ∗distancelocation,q′.node
14 end
/* q′ must be placed neighbor qubits */
15 Place q in the location with the minimal score;
16 R.removeq;
17 end
architecture to be delivered. For two logical qubits with
a large number of two-qubit gates between them, we
hope to place their corresponding physical qubits in
adjacent nodes so that later those two-qubit gates can
be directly supported by the connection between the
two physical qubits.
• One physical qubit can only have a limited number
of directly connected qubits. For those two-qubit gates
that cannot be directly supported, we hope to reduce the
amount of additional operations introduce for remap-
ping the qubits.
We propose a coupling-based qubit placement algorithm
to determine the geometric locations of the qubits on a 2D
lattice (pseudocode shown in Algorithm 1). We illustrate the
algorithm with an example in Figure 6. First, we put the first
qubit in the coupling degree list, q4, on one node of the 2D
lattice. Since the initial 2D lattice is empty, the location of
q4 does not matter. We set the geometric coordinate of the
first qubit to be 0,0 and then place the rest qubits around
q4. q4 has four neighbors, q{0,1,2,3}, in the logical coupling
graph. We need to select the next one to place. By checking
the coupling degree list, we can see that q0 is the one with
the largest coupling degree. The node occupied by q4 has
four equivalent adjacent nodes and we can place q0 on any of
them. In this example, we select the node on the north of q4
with coordinate 0,1. Such an algorithm design ensures that
the strongly coupled qubit pairs are given higher priority and
placed on adjacent nodes, accomplishing the first objective
mentioned above.
Then we need to place q1 since its coupling degree is larger
than that of q2 and q3. q1 is connected to both q4 and q0 so
that we need a more sophisticated way to evaluate all potential
nodes for q1. We use the function in line 13 of Algorithm 1
to find the node that can make q1 close to its strong coupled
neighbors in the logical coupling graph. This function is the
summation over all q1’s placed neighbors. Each term in the
summation is the product of the coupling strength between
q1 and one logical coupling neighbor q′ and the Manhattan
distance between the evaluated node location and the location
of q′. After evaluating all the empty nodes that are adjacent
to placed nodes q4 and q0, we will find that the nodes on the
east and west of q4 are the best ones because they are closest
to q4 but not far away from q0. Here we select the one on the
west of q4 with coordinate −1,0. This summation function
can help reduce the number of operations for later remapping
and achieve the second design objective.
The remaining qubits can be placed in a similar procedure
until all the qubits have been placed on the 2D lattice. In this
example, q2 and q3 are placed on the nodes with coordinates
0,−1 and 1,0, respectively. All the qubits have their loca-
tions (coordinates) on a 2D lattice where we can fabricate one
physical qubit on each occupied node. Finally, the nodes with
no qubits are removed.
4.2 Bus Selection
In the second step, we need to connect the placed physical
qubits to enable two-qubit gates. The difficulty comes from
the large size of the design space. For N qubits, there are
NN−12 distinct qubit pairs. Any of them can be either con-
nected or disconnected so that there are 2NN−12 different
cases. Even after considering the nearest-neighbor coupling
constraint in which one qubit can only connect with few
qubits around it on the lattice, the size of the design space is
still OexpN. More importantly, more qubit connections will
improve the performance but lower the yield rate in general
so that we need to identify those connections with the most
potential performance benefit in a very large design space.
Algorithm 2: 4-qubit Bus Selection
Input: Geometric coordinates of placed qubits, coupling
strength matrix, Maximum number of 4-qubit
buses K
Output: Locations of 4-qubit Buses
1 Calculate the cross coupling weight for each square;
2 while K > 0 do
// Select one square in each iteration
3 for square(i, j) in all squares do
4 filtered_weight(i, j) = weight(i, j) - weight(i+1, j)
- weight(i, j+1) - weight(i-1, j) - weight(i, j-1);
5 end
6 if no square available for 4-qubit bus then
7 Break;
8 end
9 Select the square with the highest f iltered_weight;
10 Set the weights of squares (i+1, j), (i, j+1), (i-1, j),
and (i, j-1) to be 0 and mark them to be blocked;
11 K = K−1;
12 end
This paper simplifies the connection design problem by
considering two types of common buses, 2-qubit bus and
4-qubit bus (shown in Figure 2). These two types of buses
naturally fit in the 2D lattice qubit layout and can be easily
fabricated because at most 4 nearby qubits are connected by
one bus. After placing the qubits on a 2D lattice in the first
step, 2-qubit buses can be directly generated on the edges that
connect two occupied nodes but the qubits on a diagonal of
a 4-qubit square can never be connected with only 2-qubit
buses. Replacing some 2-qubit buses with 4-qubit buses could
provide more qubit connection by trading in yield rate while
it is not yet clear where to apply the 4-qubit buses can achieve
the Pareto-optimal results. The bus selection subroutine was
proposed to identify the locations for 4-qubit buses. Other
q1
q3
q0
q2
jii j
i
j j
i
(a) 
(b) (c) 
Figure 7: (a) Prohibited Condition (b) Corner Case
(c) Filtered Weight
potential bus designs are left as future research directions and
will be discussed in Section 6.
Instead of considering the nodes in a 2D lattice, we consider
the squares that are naturally formed by the edges in the 2D
lattice. Each square can be configured to 2-qubit bus or 4-
qubit bus. Now the problem is on which squares we should
use 4-qubit buses. The size of search space, even for this
4-qubit bus square selection problem, is still OexpN. But
the simplification allows us to design high-quality heuristics
to guide the selection. Before introducing our solution, one
additional prohibited condition must be considered.
Prohibited Condition One physical constraint that we
must consider when applying 4-qubit buses is that we cannot
have 4-qubit buses in two adjacent squares. The reason is
explained with the example in Figure 7 (a). Suppose we have
two adjacent squares and both of them are using 4-qubit buses.
Then there will be two physical connections between qubit
i and j. When we use one of the connections, the other one
will bring unexpected effects so that employing 4-qubit bus
in one square will immediately block using 4-qubit buses in
any of its adjacent squares.
Considering the physical constraints mentioned above, the
objectives of this step are summarized as follows:
• Since adding more qubit connections will increase the
probability of frequency collision and lower the yield,
we hope to apply 4-qubit buses on those squares that
can benefit the performance most. In other words, the
additional connections are expected to directly support
as many two-qubits gates as possible.
• Applying 4-qubit bus in one square will block adjacent
squares, making it impossible to directly support some
two-qubit gates in those blocked squares. This effect
should also be considered when selecting the 4-qubit
squares.
We propose a 4-qubit bus selection algorithm to select some
squares for 4-qubit buses (pseudocode shown in Algorithm 2).
In each iteration, one square that could benefit most from a
4-qubit bus will be selected. Users can specify the maximum
number of 4-qubit buses they hope to have. By varying the
number of selected squares, a series of architectures can be
generated with a trade-off between yield and performance.
To find the most fitting square, we first need to calculate
how much one square could benefit from a 4-qubit bus. Since
the difference between a 2-qubit bus square and a 4-qubit bus
square is whether the qubit pairs on the diagonals are con-
nected, we define the cross-coupling weight for each square
as the sum of the coupling strength of the qubit pairs on the
diagonals. For the example in Figure 7 (c), the cross-coupling
weight of the green square is the coupling strength of q0,q3
plus that of q1,q2. A corner case in the coupling weight com-
putation is the square with only 3 qubits (shown in Figure 7
(b)). In such squares, 4-qubit buses can naturally reduce to
3-qubit buses which support coupling between any two of
the three connected qubits. The weight of a 3-qubit square is
only the weight of logical coupling between the two qubits
on one diagonal since the other diagonal only has one qubit.
For example, the weight of the 3-qubit square in Figure 7 (b)
is the i, j element in the coupling strength matrix. Except for
this small modification, 3-qubit squares are treated equally
as other 4-qubit squares in our bus selection step. This cross
coupling weight can estimate the potential benefit of applying
4-qubit bus in one square and realize the first objective.
However, the cross-coupling weight is not accurate enough
to evaluate the benefit of 4-qubit for a square because the
prohibited condition is not yet considered. We design a filter
to apply this constraint. For each square, the filtered weight
is its original cross-coupling weight minus all its neighbors’
weights. For example in Figure 7 (c), the filtered weight of the
green square is its original weight minus the weights of the
four blue squares. This filter can take the prohibited condition
into consideration and achieve the second objective.
After applying the filter, we will select one square with
the highest filtered weight. Then we will label the selected
square and its adjacent neighbors so that it will no longer
be available for future 4-qubit buses. We also change their
weights to zero because they should not affect the 4-qubit
selection among the remaining squares. The algorithm will
iterate again to select the next square until there are not more
squares available or we have already applied enough number
of 4-qubit buses.
4.3 Frequency Allocation
After the two steps above, we now have a complete coupling
topology design of a superconducting quantum processor. In
the third step, we need to designate the pre-fabrication fre-
quency of each qubit. IBM’s 5-frequency scheme is a regular
frequency designation [20]. However, the generated qubit
layout and connection in our design flow can be irregular
since more hardware sources are invested in locations that
can benefit the performance most. Thus, we need a more flex-
ible frequency allocation scheme to leverage this unbalanced
qubit layout and connection. The objective of this step is to
minimize the probability of post-fabrication frequency colli-
sion and improve the yield rate. The physical constraints are
the frequency collision conditions in Figure 3.
Finding the qubit frequency allocation plan to maximize
the yield rate is a hard problem. The complex collision con-
ditions make it difficult to find an analytic expression for the
yield rate and a brute-force search over all possible frequency
configurations will be very time-consuming. For example, if
there are M candidate frequencies for each qubit and we have
Algorithm 3: Frequency Allocation
Input: Qubit Location and Connection
Output: Frequency Configuration of Each Qubit
1 Select the qubit in the geometric center of the placed
qubits and set its frequency to be the middle of the
allowed frequency range;
2 repeat
3 Find the next qubit qi in breadth-first traversal order;
4 for temp_ f req in all frequency samples do
5 Set the frequency of qi to be temp_ f req;
6 Simulate the yield rate within qi’s local region;
7 end
8 Assign the frequency with maximal yield rate to qi;
9 until the frequencies of all qubits are determined;
N qubits in total, the total number of possible frequency con-
figurations is MN . For each of these potential configurations,
we need to run a yield simulation (introduced in Section 4.3.1)
and then select the one with maximal yield rate. This method
is not acceptable due to its high complexity. We propose to
optimize the qubit frequency allocation algorithm based on
the facts that 1) the physical qubits in the geometric center
of the qubit lattice are more likely to involve in a frequency
collision since they usually have more qubit connections, and
2) frequency collision only happens among nearby qubits.
Our algorithm determines the qubit frequencies from the
center to the periphery (pseudocode shown in Algorithm 3).
Since this step is purely about hardware, the input of our
algorithm is only the qubit location and connection generated
from the previous two subroutines. To reduce the manufac-
turing difficulty and help prevent the collision condition 4,
we follow the convention from IBM and set an allowed fre-
quency interval 5.00GHz to 5.34GHz. All pre-fabrication
frequencies are limited within this interval. First, we locate
the qubit that is closest to the center of the qubit lattice and
assign its frequency to be the center of the allowed frequency
interval. Then we apply breadth-first traversal on the coupling
graph from the first qubit in the center. For example, q5 is
the center qubit in the example shown in Figure 8. In the
breadth-first traversal, we will first access q4,9,10,6,1 as shown
on the right. Each time we access one new qubit, we will
q0 q1
q3
q2
q4 q5 q6 q7
q8 q9 q10 q11 q12
local region of q12
q5
q4
q9
…...
q10
q6
q1
q3 q0
Figure 8: Breath First Frequency Allocation
immediately determine its frequency. A list of candidate fre-
quencies is prepared. In this paper, the candidate frequencies
are 5.00,5.01,5.02, . . . ,5.33,5.34GHz to achieve an accuracy
of 0.01GHz. We can also have more candidate frequencies
but it will take more time to evaluate all of them.
To evaluate a candidate frequency on a new qubit, we
temporarily assign the candidate frequency to the new qubit
and then simulate the yield rate within its local region. The
local region of a qubit is defined as a sub-graph of the original
chip coupling graph in which a qubit may collide with the
new qubit. For example in Figure 8, when we are searching
for the best frequency of q12, the local region is marked in
blue. Qubits not in this region like q5 cannot collide with
q12. We will select the frequency with the maximal yield
rate and assign it to the new qubit. Now the time complexity
of the frequency allocation algorithm is OMN where M is
the number of candidate frequencies and N is the number of
qubits.
4.3.1 Yield Simulation. We developed a yield simulator based
on IBM’s yield model [20, 22]. The fabrication process can
be modeled by adding a Gaussian noise N0,σ to the pre-
fabrication frequency of a qubit to generate its post-fabrication
frequency where σ is the fabrication precision parameter. For
a given superconducting quantum processor design, we es-
timate its yield rate through Monte Carlo simulation. Each
time we will simulate if one fabrication is successful. We first
generate the post-fabrication frequencies by adding a random
noise sampled from Gaussian distribution mentioned above.
Then we check if any frequency collision condition listed in
Figure 3 occurs in the post-fabrication frequencies. If so, this
fabrication fails. Otherwise, it is successful. All possible cases
are taken into account. For example, we will examine the two
frequencies of all connected physical qubit pairs for condi-
tion 1, 2, 3, and 4. If they meet any one of the inequalities
of the conditions, frequency collision is considered to occur
in this simulation. This simulation process is repeated many
times. The yield rate can be estimated by the ratio between
the number of successful simulations and the total number of
simulations.
5 EVALUATION
To demonstrate that the proposed application-specific archi-
tecture design flow can deliver hardware designs with better
Pareto-optimal results in terms of performance and yield rate,
we conduct experiments over various benchmarks to show
not only the overall improvement but also the breakdown of
benefits from each of our hardware design subroutines.
1 2 3 4 5
3 4 5 1 2
5 1 2 3 4
2 3 4 5 1
(3) 20Q, 4X5, 2-qubit Bus (4) 20Q, 4X5, 4-qubit Bus
1 2 3 4 5
3 4 5 1 2
5 1 2 3 4
2 3 4 5 1
5.00GHz
5.07GHz
5.13GHz
5.20GHz
5.27GHz
3 4 5 1 2 3 4 5
1 2 3 4 5 1 2 3
3 4 5 1 2 3 4 5
1 2 3 4 5 1 2 3
(1) 16Q, 2X8, 2-qubit Bus
(2) 16Q, 2X8, 4-qubit Bus
1
2
3
4
5
Figure 9: Baseline Qubit Frequency, Layout,
and Connection Designs
5.1 Experiment Setup
Benchmarks Twelve quantum programs are collected from
IBM’s QISKit [46] and RevLib [45], or compiled from Scaf-
fCC [29]. These benchmarks cover several important domains
(e.g., simulation, arithmetic) and have various sizes (from 7-
to 16-qubit) for a versatility test of the proposed design flow.
Metrics To evaluate the efficiency of an architecture, we
need both the yield rate and performance. An architecture with
a higher yield rate can be successfully fabricated with fewer at-
tempts, indicating a lower hardware cost. In our experiments,
the yield rate is simulated with IBM’s yield model [20, 22] as
introduced in Section 4.3.1. For the performance evaluation,
we adopt the total post-mapping gate count metric widely
used in previous studies [16–18]. More gates lead to longer
execution time and a larger probability of error on QC devices.
If a hardware architecture could execute the program with
fewer gates, then its performance is considered to be better.
Yield Simulation Configuration The number of trials in
the Monte-Carlo simulation for each architecture is 10,000,
which is 10× of that used in IBM’s experiments [22, 37, 47]
to ensure the simulation accuracy. The fabrication precision
parameter σ is set to be 30MHz, a realistic extrapolation of
progress in hardware by IBM [20, 37]. IBM has improved the
σ from 200MHz [48] to 130MHz [20] in the last few years
and 30MHz is a reasonable projection to achieve a useful
yield as predicted by IBM [37].
5.2 Experiment Methodology
To illustrate the benefit of our design flow, five experiment
configurations are designed to show the overall improvement
and the performance/yield trade-off gain at each of the three
subroutines in Section 4. Among them, ibm is a set of general-
purpose architectures from IBM and they are not tailored
for any applications. The remaining four configurations are
application-specific architectures generated by the entire or
part of the proposed design flow.
ibm We use IBM’s design scheme as the baseline configu-
ration. It has two layout options, a 2×8 lattice with 16 qubits,
and a 4×5 lattice with 20 qubits. The qubit connection design
can be either 2-qubit bus only or using 4-qubit buses as many
as possible. In total, there are four architectures combining
the layout and connection options (shown in Figure 9). The
frequency allocation scheme is a 5-frequency scheme [20, 37].
The five frequencies are an arithmetic progression from 5GHz
to 5.27GHz and their arrangement is also in Figure 9.
eff-full We apply all three subroutines and generate a se-
ries of efficient superconducting quantum processor architec-
tures by varying the number of 4-qubit buses. The number of
designs we can obtain for a quantum program depends on the
number of qubits as more qubits can provide more squares to
apply 4-qubit buses in the generated layout. This experiment
can show the overall architecture design improvement when
comparing with the baseline ibm.
eff-5-freq We only apply the first two subroutines to gen-
erate qubit layout and connection design but the frequency
allocation is done with IBM’s 5-frequency scheme. The yield
benefit from the proposed frequency allocation algorithm can
be demonstrated by comparing with results from eff-full.
eff-rd-bus We keep the first and the third subroutines
but randomly select some squares to employ 4-qubit buses
with the prohibited condition constraint satisfied. This will
demonstrate the effect of our filtered-weight-based 4-qubit
bus selection algorithm by comparing with results from eff-
full.
eff-layout-only We apply our profiling method and per-
form a layout design. The connection design has two options.
One is only using 2-qubit buses. The other is using 4-qubit
buses as much as possible. The frequency design follows the
baseline ibm. The benefit of our layout optimization can be
shown when comparing with the results from ibm.
For each benchmark, we run all the five configurations to
generate different superconducting quantum processor archi-
tectures with different yield rates. Then we apply one state-of-
the-art qubit mapping algorithm [18] on these architectures to
obtain the total number of gates when running the generated
or baseline architectures.
5.3 Overall Improvement
Figure 10 shows the result of yield and performance for all
benchmarks and the five experiment configurations. There are
12 subfigures and one subfigure contains the results of the five
experiment configurations for one benchmark. The X-axis
represents the normalized reciprocal of post-mapping gate
count and data points on the right have better performance.
The Y-axis represents the yield rate and data points on the top
have higher yield rates. The legend at the bottom of Figure 10
shows the markers for the five configurations. The data points
for the four designs in the baseline are labeled by (1), (2), (3),
and (4), according to Figure 9.
Optimality A series of architectures with better Pareto-
optimal results of performance and yield since the data of
eff-full is on the upper right of ibm. The most simplified
designs (the most left top blue data point) generated by our
design flow outperforms the 16-qubit baseline design without
4-qubit buses in both performance (∼ 7.7%) and yield rate
(∼ 4×). Compared with the 16-qubit baseline with four 4-
qubit buses, we achieve over 100× better yield rate with < 1%
performance loss. On the other side, compared with IBM’s
20-qubit chip design with six 4-qubit buses (the baseline
design with the most hardware resources), the designs with the
maximum number of 4-qubit buses generated from our design
flow have over 1000× yield rate improvement on average
with only about 3.5% performance loss.
Controllability The proposed design flow can easily con-
trol the trade-off between yield and performance by only
changing the number of 4-qubit buses without traversing
across, or sampling a large number of designs in, the entire
search space. Depending on the number of qubits in different
target programs, we can trade in around 10× ∼ 50× yield
rate for 10%∼ 33% performance improvement.
5.3.1 Special Case. The results of ising_model are signif-
icantly different because the logical qubit coupling in this
benchmark forms a chain structure. The mapping algorithm
can always find the perfect initial mapping without insert-
ing additional operations. As a result, the post-mapping gate
count is the same for all tested hardware architectures. All
data points for this program lie in one vertical line. Only one
architecture is generated from our design flow because there
is no need to add 4-qubit bus. All the two-qubit gates can be
executed through the edges on the 2D lattice. There are no
two-qubit gates applied on two qubits on a diagonal because
of the chain coupling structure. In this case, 4-qubit buses can
only lower the yield rate without improving the performance.
5.4 Effects from Individual Subroutines
The overall improvement has already been discussed, but one
interesting question is how much improvement the layout
and connection optimization contribute and how much comes
from the optimized yield allocation directly. The five config-
urations decouple the proposed design flow and provide a
breakdown of the effect of individual subroutines.
5.4.1 Effect of Layout Design. The difference between ibm
and eff-layout-only illustrates the effect of layout design
since the rest two subroutines are the same. An architecture
with more hardware resources is expected to provide higher
performance by allowing more flexibility in qubit mapping.
But our optimized layout design could use comparable or
fewer hardware resources while the performance can be even
better. For example, we compare the 2-qubit bus only data
point (the upper left one) with the 16-qubit baseline with four
4-qubit buses (labeled by (2) in each subfigure). eff-layout-
only provides better or comparable performance most of the
time with about 35× yield improvement on average. The
improvement at this step depends on the program size and
1 1.1 1.2 1.3 1.4
adr4_197, 13-qubit
1 1.1 1.2 1.3 1.4
rd84_142, 15-qubit
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
1 1.1 1.2 1.3 1.4 1.5
misex1_241, 15-qubit square_root_7, 15-qubit
radd_250, 13-qubit
1 1.1 1.2 1.3 1.4 1.5
cm152a_212, 12-qubit
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
1 1.1 1.2 1.3 1.4
dc1_220, 11-qubit
1 1.05 1.1 1.15 1.2 1.25
z4_268, 11-qubit
1 1.1 1.2 1.3 1.4 1.5
sym6-145, 7-qubit
0.98 1 1.02 1.04 1.06 1.08 1.1
UCCSD_ansatz_8, 8-qubit
0.9 0.95 1 1.05 1.1
ibm eff-full eff-rd-bus eff-5-freq eff-layout-only
ising_model_16, 16-qubit
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
1 1.1 1.2 1.3 1.4 1.5 1.6
qft_16, 16-qubit
Yield
 R
ate
Normalized Reciprocal of Gate #
1 1.05 1.1 1.15 1.2
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
1 1.1 1.2 1.3 1.4
(1)
(2)
(3)
(4)
(1) (1)
(1) (1) (1)
(1) (1) (1)
(1) (1)
(2) (2)
(2) (2) (2)
(2) (2) (2)
(2) (2)
(3) (3)
(3) (3) (3)
(3) (3) (3)
(3)(3)
(4) (4)
(4) (4) (4)
(4) (4) (4)
(4) (4)
Figure 10: Yield v.s. Normalized Reciprocal of Post-mapping Gate Count
programs with fewer qubits will use fewer qubits and connec-
tions in an optimized architecture. This result proves that our
layout design could generate qubit layout with high perfor-
mance but using much fewer hardware resource for different
programs.
5.4.2 4-qubit Bus SelectionQuality. By comparing the results
from eff-full and eff-rd-bus, we can see that the architec-
tures generated from our bus selection algorithm are better
than that of random selection in trading in yield for perfor-
mance most of the time. The data points of eff-rd-bus reveal
the distribution of the yield and performance sampled from
random bus designs. Note that the performance of eff-rd-bus
is usually confined by the two data points in eff-layout-only
because adding connections can improve the performance
most of the time. For most benchmarks except qft, the re-
sults from eff-full are close to the upper bound formulated
by the random samples, which shows that our weight-based
bus selection could generate a series of near Pareto-optimal
hardware architectures with various numbers of qubit connec-
tions.
The result of qft is much worse than that of other programs
due to the unique uniform two-qubit gate pattern in this pro-
gram. The number of two-qubit gates between arbitrary two
logical qubits is always two in qft, which makes all the logical
qubit pairs are the same in the sense the coupling strength dur-
ing profiling. Then in bus selection subroutine, all the squares
share the same weight and the weight-based selection is the
same as random selection.
For the two small benchmarks, sym6 and UCCSD_ansatz,
the number of available squares in the generated qubit layout
is small and there are very few options when applying 4-qubit
buses. Therefore, most of the architectures generated from the
random 4-qubit bus selection are the same as those from the
proposed design flow, which makes the results from eff-full
and eff-rd-bus very close.
5.4.3 Frequency Allocation Optimization. By comparing eff-
full and eff-5-freq, we can see that the proposed frequency
allocation algorithm provides about 10× yield rate improve-
ment on average. This improvement is slightly worse when
the yield from the baseline 5-frequency is already high, e.g.,
results from sym6 and UCCSD_ansatz. The fabrication vari-
ance makes the ideal yield 100% unreachable and it is hard
to optimize yield when it is already high.
6 DISCUSSION
This paper studies application-specific efficient superconduct-
ing quantum processor design. In particular, we formalize the
architecture design for superconducting quantum processors
with three key steps, each of which comes with an optimiza-
tion subroutine. This is the first attempt, to the best of our
knowledge, to identify the optimization opportunity from the
architecture level to push forward the balance between QC
performance and hardware yield rate. Effort towards this di-
rection can be of significant demand in the near term QC
with limited computation resource and immature fabrication
technology.
Although we show that improved Pareto-optimal designs
can be generated with a static program analysis and three op-
timized design algorithms, several future research directions
can be explored as with any initial research.
Improving Profiling Method This paper focused on the
logical qubit coupling topology in a quantum program but
other patterns may also be leveraged. We omitted the tem-
poral information of the two-qubit gates and all information
about other program components. But the locations of two-
qubit gates in a quantum program may also be leveraged for
finer-grained evaluation of the coupling strength for different
logical qubit pairs at different times during the execution. The
single-qubit patterns can also help with the basic gate set
design.
Exploring More Design Space In the proposed design
flow, the number of physical qubits is the same as that of
logical qubits for higher yield rate. However, we can still add
auxiliary physical qubits since they can also be used during
the qubit routing, trading in more yield rate for higher perfor-
mance. How to add auxiliary qubit to appropriate locations
and how to connect them are interesting problems to explore
in the future. To ensure modularity and scalability, the qubits
are forced to be embedded in a 2D lattice and only consider
two types of buses lying in the lattice. However, the qubit
placement and connection could be more flexible if we trade
in part of the scalability. For example, one bus could also
connect more than four qubits [49]. The design space in this
direction is not yet explored.
Optimizing Frequency Allocation This paper tried to op-
timize the qubit frequency selection from the center to periph-
ery and only searched for the optimal frequency for one qubit,
resulting in a sub-optimal frequency allocation. A global op-
timization like formal methods can be explored to further
optimize the frequency allocation result. One alternative ap-
proach to resolve the frequency collision issue is to use flux-
tunable transmon qubits [23], of which the frequencies can
be dynamically tuned with additional control signals. The de-
sign trade-off of different types of qubits is not yet explored
and additional signals bring more noise and increase the con-
trol complexity. The proposed design flow is still valuable
even with frequency-tunable qubits because the simplified
architectures with fewer the on-chip connections can not only
reduce the fabrication complexity but also benefit the overall
performance by lowering the crosstalk error.
7 RELATED WORK
This paper ranges across multiple topics, i.e., program profil-
ing, superconducting processor design, application-specific
design, qubit mapping. We briefly introduce related work for
all of them.
Application-specific Design The closest related work is
SPARQS, a superconducting planar architecture proposed by
Wilhelm et al. [35, 36] targeting a specific Fermi-Hubbard
model simulation program. However, they only provide an
implementation-independent design from theoretical physics
level. This paper formalizes a systematic end-to-end design
flow with automatic program profiling and realistic physical
constraints included, for the first time. With no limitation on
the target program, we can generate a series of Pareto-optimal
hardware architecture designs in a controllable way.
Quantum Program Profiling and Analysis Program pro-
filing and analysis are very important for software and com-
piler optimization. Previous works on quantum program anal-
ysis [29–34] have studied entanglement, termination, non-
cloning checking, etc. The profiling method in this paper is
proposed to guide the hardware design, fulfilling a different
goal.
Superconducting Quantum Processors As one of the
most promising candidate technology to implement QC, su-
perconducting quantum techniques have been employed in
two mainstream QC computation models. The circuit model
based processors [42, 43, 50] support quantum circuit model
[38] and the quantum annealers [51] can implement adiabatic
QC [52]. Their programming model and hardware architec-
ture are different for these two QC approaches. The design
flow in this paper is proposed for circuit model based quan-
tum processors while efficient quantum annealer design can
be a future research direction.
Qubit Mapping Formal and heuristic methods have been
attempted to solve this problem [16–18, 53, 54] and minimize
the total gate count. Recently several studies [10, 55, 56] have
applied the actual gate error rates for fine-grained optimiza-
tion. All these optimizations are pure software-level modifi-
cation. This paper attempts to improve the performance by
reducing the mapping overhead from the hardware level. We
adopt the gate count metric to estimate the mapping overhead
since our experiments are performed on artificial hardware
architectures.
8 CONCLUSION
The demand for larger computation capability in a supercon-
ducting quantum processor naturally calls for more hardware
resources which will also increase the design complexity
and lower the yield rate. This paper explored application-
specific architecture design for superconducting quantum pro-
cessors to achieve both high performance and higher yield
rate. Gate patterns in a quantum program can be extracted
by the proposed profiling method and then utilized in the
follow-up hardware architecture design. Three subroutines
are designed to generate the qubit layout, connection, and
frequency respectively with physical constraints taken into
consideration. Experimental results show that the proposed
design flow could deliver architectures with both high yield
rate and performance automatically for different applications
except those with extremely special gate patterns.
REFERENCES
[1] Aram W Harrow, Avinatan Hassidim, and Seth Lloyd. Quantum
algorithm for linear systems of equations. Physical review letters,
103(15):150502, 2009.
[2] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum
approximate optimization algorithm. arXiv preprint arXiv:1411.4028,
2014.
[3] Sam McArdle, Suguru Endo, Alan Aspuru-Guzik, Simon Benjamin,
and Xiao Yuan. Quantum computational chemistry. arXiv preprint
arXiv:1808.10402, 2018.
[4] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung,
Xiao-Qi Zhou, Peter J Love, Alán Aspuru-Guzik, and Jeremy L
OâA˘Z´brien. A variational eigenvalue solver on a photonic quantum
processor. Nature communications, 5:4213, 2014.
[5] Michel H Devoret and Robert J Schoelkopf. Superconducting circuits
for quantum information: an outlook. Science, 339(6124):1169–1174,
2013.
[6] Hanhee Paik, D. I. Schuster, Lev S. Bishop, G. Kirchmair, G. Catelani,
A. P. Sears, B. R. Johnson, M. J. Reagor, L. Frunzio, L. I. Glazman,
S. M. Girvin, M. H. Devoret, and R. J. Schoelkopf. Observation of high
coherence in josephson junction qubits measured in a three-dimensional
circuit qed architecture. Phys. Rev. Lett., 107:240501, Dec 2011.
[7] R. Barends, J. Kelly, A. Megrant, D. Sank, E. Jeffrey, Y. Chen, Y. Yin,
B. Chiaro, J. Mutus, C. Neill, P. O’Malley, P. Roushan, J. Wenner,
T. C. White, A. N. Cleland, and John M. Martinis. Coherent josephson
qubit suitable for scalable quantum integrated circuits. Phys. Rev. Lett.,
111:080502, Aug 2013.
[8] Yu Chen, C Neill, P Roushan, N Leung, M Fang, R Barends, J Kelly,
B Campbell, Z Chen, B Chiaro, and A Dunsworth. Qubit architecture
with high coherence and fast tunable coupling. Physical review letters,
113(22):220502, 2014.
[9] Yunong Shi, Nelson Leung, Pranav Gokhale, Zane Rossi, David I Schus-
ter, Henry Hoffmann, and Frederic T Chong. Optimized compilation of
aggregated instructions for realistic quantum computers. In Proceed-
ings of the Twenty-Fourth International Conference on Architectural
Support for Programming Languages and Operating Systems, pages
1031–1044. ACM, 2019.
[10] Prakash Murali, Jonathan M. Baker, Ali Javadi-Abhari, Frederic T.
Chong, and Margaret Martonosi. Noise-adaptive compiler mappings
for noisy intermediate-scale quantum computers. In Proceedings of
the Twenty-Fourth International Conference on Architectural Support
for Programming Languages and Operating Systems, ASPLOS 2019,
Providence, RI, USA, April 13-17, 2019, pages 1015–1029. ACM, 2019.
[11] X Fu, M. A. Rol, C. C. Bultink, J. van Someren, N. Khammassi,
I. Ashraf, R. F. L. Vermeulen, J. C. de Sterke, W. J. Vlothuizen, R. N.
Schouten, C. G. Almudever, L. DiCarlo, and K. Bertels. An experi-
mental microarchitecture for a superconducting quantum processor. In
Proceedings of the 50th Annual IEEE/ACM International Symposium
on Microarchitecture, pages 813–825. IEEE/ACM, 2017.
[12] Jeroen PG van Dijk, Edoardo Charbon, and Fabio Sebastiano.
The electronic interface for quantum processors. arXiv preprint
arXiv:1811.01693, 2018.
[13] Jens Koch, M Yu Terri, Jay Gambetta, Andrew A Houck, DI Schuster,
J Majer, Alexandre Blais, Michel H Devoret, Steven M Girvin, and
Robert J Schoelkopf. Charge-insensitive qubit design derived from the
cooper pair box. Physical Review A, 76(4):042319, 2007.
[14] David C McKay, Stefan Filipp, Antonio Mezzacapo, Easwar Magesan,
Jerry M Chow, and Jay M Gambetta. Universal gate for fixed-frequency
qubits via a tunable bus. Physical Review Applied, 6(6):064007, 2016.
[15] Andrew W Cross, Lev S Bishop, Sarah Sheldon, Paul D Nation, and
Jay M Gambetta. Validating quantum computers using randomized
model circuits. arXiv preprint arXiv:1811.12926, 2018.
[16] Marcos Yukio Siraichi, Vinícius Fernandes dos Santos, Sylvain Col-
lange, and Fernando Magno Quintão Pereira. Qubit allocation. In
Proceedings of the 2018 International Symposium on Code Generation
and Optimization, pages 113–125. ACM, 2018.
[17] Alwin Zulehner, Alexandru Paler, and Robert Wille. Efficient mapping
of quantum circuits to the ibm qx architectures. In Design, Automation
& Test in Europe Conference & Exhibition (DATE), 2018, pages 1135–
1138. IEEE, 2018.
[18] Gushu Li, Yufei Ding, and Yuan Xie. Tackling the qubit mapping prob-
lem for nisq-era quantum devices. In Proceedings of the Twenty-Fourth
International Conference on Architectural Support for Programming
Languages and Operating Systems, pages 1001–1014. ACM, 2019.
[19] Prakash Murali, Norbert Matthias Linke, Margaret Martonosi,
Ali Javadi Abhari, Nhung Hong Nguyen, and Cinthia Huerta Alderete.
Full-stack, real-system quantum computer studies: Architectural com-
parisons and design insights. In Proceedings of the 46th International
Symposium on Computer Architecture, ISCA ’19, pages 527–540, New
York, NY, USA, 2019. ACM.
[20] Sami Rosenblatt, Jared Hertzberg, José Chavez-Garcia, Nicholas Bronn,
Hanhee Paik, Martin Sandberg, Easwar Magesan, John Smolin, Jeng-
Bang Yau, Vivekananda Adiga, Markus Brink, and Jerry M. Chow.
Enablement of near-term quantum processors by architectural yield
engineering. Bulletin of the American Physical Society, 2019.
[21] Easwar Magesan and Jay M Gambetta. Effective hamiltonian models
of the cross-resonance gate. arXiv preprint arXiv:1804.04073, 2018.
[22] Markus Brink, Jerry M Chow, Jared Hertzberg, Easwar Magesan, and
Sami Rosenblatt. Device challenges for near term superconducting
quantum processors: frequency collisions. In 2018 IEEE International
Electron Devices Meeting (IEDM), pages 6–1. IEEE, 2018.
[23] Julian Kelly, R Barends, AG Fowler, A Megrant, E Jeffrey, TC White,
D Sank, JY Mutus, B Campbell, Yu Chen, and Z Chen. State preserva-
tion by repetitive error detection in a superconducting quantum circuit.
Nature, 519(7541):66, 2015.
[24] Sami Rosenblatt, Jason S Orcutt, and Jerry M Chow. Laser annealing
qubits for optimized frequency allocation, July 2 2019. US Patent App.
10/340,438.
[25] Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu,
Yunji Chen, and Olivier Temam. Diannao: A small-footprint high-
throughput accelerator for ubiquitous machine-learning. In ACM Sig-
plan Notices, volume 49, pages 269–284. ACM, 2014.
[26] Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A
Horowitz, and William J Dally. Eie: efficient inference engine on
compressed deep neural network. In 2016 ACM/IEEE 43rd Annual
International Symposium on Computer Architecture (ISCA), pages 243–
254. IEEE, 2016.
[27] Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish,
and Margaret Martonosi. Graphicionado: A high-performance and
energy-efficient accelerator for graph analytics. In 2016 49th Annual
IEEE/ACM International Symposium on Microarchitecture (MICRO),
pages 1–13. IEEE, 2016.
[28] Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiy-
oung Choi. A scalable processing-in-memory accelerator for paral-
lel graph processing. ACM SIGARCH Computer Architecture News,
43(3):105–117, 2016.
[29] Ali JavadiAbhari, Shruti Patil, Daniel Kudrow, Jeff Heckey, Alexey
Lvov, Frederic T Chong, and Margaret Martonosi. Scaffcc: Scalable
compilation and analysis of quantum programs. Parallel Computing,
45:2–17, 2015.
[30] Mingsheng Ying and Yuan Feng. Quantum loop programs. Acta
Informatica, 47(4):221–250, 2010.
[31] Mingsheng Ying, Nengkun Yu, Yuan Feng, and Runyao Duan. Ver-
ification of quantum programs. Science of Computer Programming,
78(9):1679–1700, 2013.
[32] Shenggang Ying, Yuan Feng, Nengkun Yu, and Mingsheng Ying.
Reachability probabilities of quantum markov chains. In International
Conference on Concurrency Theory, pages 334–348. Springer, 2013.
[33] Kentaro Honda. Analysis of quantum entanglement in quantum pro-
grams using stabilizer formalism. arXiv preprint arXiv:1511.01572,
2015.
[34] Simon Perdrix. Quantum entanglement analysis based on abstract
interpretation. In International Static Analysis Symposium, pages 270–
282. Springer, 2008.
[35] Pierre-Luc Dallaire-Demers and Frank K Wilhelm. Quantum gates and
architecture for the quantum simulation of the fermi-hubbard model.
Physical Review A, 94(6):062304, 2016.
[36] Per J Liebermann, Pierre-Luc Dallaire-Demers, and Frank K Wilhelm.
Implementation of the ifredkin gate in scalable superconducting archi-
tecture for the quantum simulation of fermionic systems. arXiv preprint
arXiv:1701.07870, 2017.
[37] Christopher Chamberland, Guanyu Zhu, Theodore J Yoder, Jared B
Hertzberg, and Andrew W Cross. Topological and subsystem codes on
low-degree graphs with flag qubits. arXiv preprint arXiv:1907.09528,
2019.
[38] Michael A Nielsen and Isaac L Chuang. Quantum computation and
quantum information. Quantum Computation and Quantum Infor-
mation, by Michael A. Nielsen, Isaac L. Chuang, Cambridge, UK:
Cambridge University Press, 2010, 2010.
[39] Adriano Barenco, Charles H Bennett, Richard Cleve, David P DiVin-
cenzo, Norman Margolus, Peter Shor, Tycho Sleator, John A Smolin,
and Harald Weinfurter. Elementary gates for quantum computation.
Physical review A, 52(5):3457, 1995.
[40] Chad Rigetti and Michel Devoret. Fully microwave-tunable univer-
sal gates in superconducting qubits with linear couplings and fixed
transition frequencies. Physical Review B, 81(13):134507, 2010.
[41] Sarah Sheldon, Easwar Magesan, Jerry M Chow, and Jay M Gambetta.
Procedure for systematically tuning up cross-talk in the cross-resonance
gate. Physical Review A, 93(6):060302, 2016.
[42] IBM. IBM Q Experience Device. https://www.research.ibm.com/
ibm-q/technology/devices/, 2018.
[43] Julian Kelly. A Preview of Bristlecone, GoogleâA˘Z´s New
Quantum Processor. https://ai.googleblog.com/2018/03/
a-preview-of-bristlecone-googles-new.html, 2018.
[44] Dmitri Maslov, Sean M Falconer, and Michele Mosca. Quantum circuit
placement. IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 27(4):752–763, 2008.
[45] Robert Wille, Daniel Große, Lisa Teuber, Gerhard W Dueck, and Rolf
Drechsler. Revlib: An online resource for reversible functions and
reversible circuits. In Multiple Valued Logic, 2008. ISMVL 2008. 38th
International Symposium on, pages 220–225. IEEE, 2008.
[46] Gadi Aleksandrowicz, Thomas Alexander, Panagiotis Barkoutsos, Lu-
ciano Bello, Yael Ben-Haim, David Bucher, Francisco Jose Cabrera-
Hernádez, Jorge Carballo-Franquis, Adrian Chen, Chun-Fu Chen,
Jerry M. Chow, Antonio D. Córcoles-Gonzales, Abigail J. Cross, An-
drew Cross, Juan Cruz-Benito, Chris Culver, Salvador De La Puente
González, Enrique De La Torre, Delton Ding, Eugene Dumitrescu,
Ivan Duran, Pieter Eendebak, Mark Everitt, Ismael Faro Sertage, Albert
Frisch, Andreas Fuhrer, Jay Gambetta, Borja Godoy Gago, Juan Gomez-
Mosquera, Donny Greenberg, Ikko Hamamura, Vojtech Havlicek, Joe
Hellmers, Łukasz Herok, Hiroshi Horii, Shaohan Hu, Takashi Imamichi,
Toshinari Itoko, Ali Javadi-Abhari, Naoki Kanazawa, Anton Karazeev,
Kevin Krsulich, Peng Liu, Yang Luh, Yunho Maeng, Manoel Marques,
Francisco Jose Martín-Fernández, Douglas T. McClure, David McKay,
Srujan Meesala, Antonio Mezzacapo, Nikolaj Moll, Diego Moreda Ro-
dríguez, Giacomo Nannicini, Paul Nation, Pauline Ollitrault, Lee James
O’Riordan, Hanhee Paik, Jesús Pérez, Anna Phan, Marco Pistoia, Viktor
Prutyanov, Max Reuter, Julia Rice, Abdón Rodríguez Davila, Raymond
Harry Putra Rudy, Mingi Ryu, Ninad Sathaye, Chris Schnabel, Eddie
Schoute, Kanav Setia, Yunong Shi, Adenilton Silva, Yukio Siraichi,
Seyon Sivarajah, John A. Smolin, Mathias Soeken, Hitomi Takahashi,
Ivano Tavernelli, Charles Taylor, Pete Taylour, Kenso Trabing, Matthew
Treinish, Wes Turner, Desiree Vogt-Lee, Christophe Vuillot, Jonathan A.
Wildstrom, Jessica Wilson, Erick Winston, Christopher Wood, Stephen
Wood, Stefan Wörner, Ismail Yunus Akhalwaya, and Christa Zoufal.
Qiskit: An open-source framework for quantum computing, 2019.
[47] MD Hutchings, Jared B Hertzberg, Yebin Liu, Nicholas T Bronn,
George A Keefe, Markus Brink, Jerry M Chow, and BLT Plourde. Tun-
able superconducting qubits with flux-independent coherence. Physical
Review Applied, 8(4):044003, 2017.
[48] Sami Rosenblatt, Jared Hertzberg, Markus Brink, Jerry Chow, Jay
Gambetta, Zhaoqi Leng, Andrew Houck, JJ Nelson, Britton Plourde,
Xian Wu, et al. Variability metrics in josephson junction fabrication for
quantum computing circuits. In APS Meeting Abstracts, 2017.
[49] Joydip Ghosh, Andrei Galiautdinov, Zhongyuan Zhou, Alexander N Ko-
rotkov, John M Martinis, and Michael R Geller. High-fidelity controlled-
σ z gate for resonator-based superconducting quantum computers. Phys-
ical Review A, 87(2):022309, 2013.
[50] Rigetti. The Quantum Processing Unit (QPU). https://www.rigetti.com/
qpu, 2018.
[51] D-Wave Systems Inc. D-Wave System Documentation. https://docs.
dwavesys.com/docs/latest/, 2018.
[52] Edward Farhi, Jeffrey Goldstone, Sam Gutmann, and Michael Sipser.
Quantum computation by adiabatic evolution. arXiv preprint quant-
ph/0001106, 2000.
[53] Davide Venturelli, Minh Do, Eleanor Rieffel, and Jeremy Frank. Com-
piling quantum circuits to realistic hardware architectures using tempo-
ral planners. Quantum Science and Technology, 3(2):025004, 2018.
[54] Alireza Shafaei, Mehdi Saeedi, and Massoud Pedram. Qubit placement
to minimize communication overhead in 2d quantum architectures. In
Design Automation Conference (ASP-DAC), 2014 19th Asia and South
Pacific, pages 495–500. IEEE, 2014.
[55] Swamit S Tannu and Moinuddin K Qureshi. Not all qubits are created
equal: a case for variability-aware policies for nisq-era quantum com-
puters. In Proceedings of the Twenty-Fourth International Conference
on Architectural Support for Programming Languages and Operating
Systems, pages 987–999. ACM, 2019.
[56] Abdullah Ash-Saki, Mahabubul Alam, and Swaroop Ghosh. Qure:
Qubit re-allocation in noisy intermediate-scale quantum computers. In
Proceedings of the 56th Annual Design Automation Conference 2019,
page 141. ACM, 2019.
