Integrated Analysis of Performance and Resource of Large-Scale Quantum
  Computing by Hwang, Yongsoo et al.
ar
X
iv
:1
80
9.
07
90
1v
1 
 [q
ua
nt-
ph
]  
21
 Se
p 2
01
8
Integrated Analysis of Performance and Resource of
Large-Scale Quantum Computing
Yongsoo Hwang,1 Taewan Kim,1 Chungheon Baek,1 and Byung-Soo Choi1
1Electronics and Telecommunications Research Institute
To see the feasibility of a large-scale quantum computing, it is required to accurately analyze
the performance and the quantum resource. However, most of the analysis reported so far have
focused on the statistical examination, i.e., simply calculating the performance and resource based
on individual data, and even worse usually only a few components have been considered. In this
work, to achieve more exact analysis, we propose an integrated analysis method for a practical
quantum computing model with three components (algorithm, error correction and device) under
a realistic quantum computer system architecture. To implement the above method, we develop a
quantum computing framework composed of three functional layers: compile, system and building
block. This framework can support, for the first time, the mapping of quantum algorithm from
physical qubit level to system architecture level with a given fault-tolerant scheme. Therefore,
the proposed method can measure the effect of dynamic situation when the quantum computer
practically runs. By using our method, we found that Shor algorithm to factorize 512-bit integer
requires 8.78×105 hours. We also show how the proposed method can be used for analyzing optimal
concatenation level and code distance of fault-tolerant quantum computing.
I. INTRODUCTION
Over the last decades, diverse quantum computing
components from application to physical device have
been actively researched and developed. Some of them
already have theoretically optimal performance [3, 4, 20,
29, 30, 35]. Gigantic IT corporations have announced
that they succeeded in the development of dozens of qubit
system [1, 15, 22]. Even they expect to see a thousand
qubit system within ten years. Besides, several startups
also started to develop a quantum computing hardware
and software [32]. It seems that the era of a quantum
computing is gradually coming close to us.
So far, to see the feasibility of a practical large-scale
quantum computing, some efforts have been dedicated
to investigation into the quantity of the required quan-
tum resource [10, 19, 34, 37, 40–42] and the expected
performance [8, 21, 41, 45]. Based on such analysis, the
security of conventional cryptography against a theoret-
ically super-fast quantum computing also has been ana-
lyzed [11]. However, we think that the analysis reported
so far are not fully satisfactory because they usually have
considered only a few components of a quantum comput-
ing and/or focused on statistical examination. In par-
ticular, the analysis results with the statistical examina-
tion are nothing more than simple calculation based on
individual data of several quantum computing compo-
nents [41, 42].
To achieve more exact analysis, we believe the practical
operation of a quantum computing needs to be consid-
ered. For that, one first has to consider quantum comput-
ing components in terms of the practical operation. For
example, we have to prepare a quantum algorithm de-
composed into individual gates instead of only the dom-
inant part of a quantum algorithm. Second, the above
components have to be properly integrated with a target
quantum computer system architecture. For example, we
have to find a path for qubit movements (or braiding) on
the target system architecture without raising any con-
flicts with other paths. By doing so, the quantity of qubit
movements which cannot be considered in the statistical
analysis can be exactly measured, and therefore we can
say that the analysis results are realistic.
In this work, to perform the integrated analysis more
efficiently, we propose and develop a quantum computing
framework composed of three functional layers (compile,
system and building block), where each layer has a well-
defined role for a quantum computing as follows.
• Compile: Decomposition of a quantum algorithm
into a sequence of quantum gates called a quantum
assembly code
• System: Integration of a quantum algorithm and
building blocks under a quantum computer system
architecture
• Building Block: Implementation of logical qubits
and gates according to a quantum error-correcting
code and a fault-tolerant scheme
With the proposed framework, we can conduct the anal-
ysis in the most fine-grained manner. For example,
we even count a qubit movement one by one, and the
quantity of physical qubits based on the provided quan-
tum computer architecture and the ancilla qubits re-
quired for error correction and magic state. Further-
more, the framework examines the performance and the
resource with the criterion, 100% fidelity quantum com-
puting [49]. By doing so, we can provide full quantum
resource and compare various quantum computing con-
figurations fairly.
The objective of this work is to estimate the most ac-
curate quantum computing performance and quantum
resource by the help of the framework we developed. For
that, we first need to configure a quantum computing
2by choosing specific protocols and device properties. By
doing so, as mentioned above, we can see the quantum
resource and the performance of a quantum computing
based on most of quantum computing components from
algorithm to hardware. Besides, it is also possible to see
which component affects a quantum computing most se-
riously by applying realistic technology from the top layer
to the bottom layer gradually. In this regards, we can see
that a theoretically optimal protocol really leads to opti-
mal quantum computing or instead works suboptimally
in composition with other components.
By exploiting the accurate analysis function, we can
show numerical data for some theoretical conjectures in
the fault-tolerant quantum computing. In Steane code
based quantum computing, increasing a concatenation
level makes it possible to achieve more reliable quantum
computing, but we guess that the more does not lead to
the better all the time because a higher concatenation
level requires more quantum resource and longer execu-
tion time. Therefore, it is reasonable to guess that there
may exist a trade-off between the reliability and the per-
formance (resource). In this work, we show there exists
such trade-off, and in surface code based quantum com-
puting too.
The remainder of this paper is organized as follows.
Section II reviews some related works. Section III
overviews the proposed quantum computing framework
and describes how to configure and analyze a quantum
computing with the proposed framework. Section IV de-
scribes each layer of the proposed framework in detail.
Section V describes the performance metric we evaluated
in this work, and the analysis results will be shown in sec-
tion VI. Section VII discusses, by exploiting the proposed
framework, what happens in a quantum computing if an
individual component is improved. This paper concludes
with discussions in section VIII.
II. RELATED WORKS
We review several quantum computing frameworks dis-
cussing the performance and the quantum resource of a
quantum computing. Table I briefly summarizes the re-
lated works.
Quipper [12–14, 37] and ScaffCC [16, 18, 19] are frame-
works for quantum compile and resource estimation of
a quantum algorithm. They basically compile a pro-
grammed quantum algorithm into a sequence of quantum
instructions of a quantum gate and target qubit(s). From
the compile results, they statistically analyze the quan-
tum resource such as the quantities of gates and qubits.
All the data come from a quantum algorithm only. We
believe, without considering physical implementation of
the algorithm, the analysis results cannot be used for the
reference to the feasibility of a quantum computing.
Fowler et al. [8] approximated the quantum computer
size and the execution time of Shor algorithm (N = 2000)
with a surface code quantum computing. However, their
analysis is based on the dominant part of the algorithm
such as the modular exponentiation circuit only. There-
fore, the execution time completely depends on the quan-
tity of Toffoli gates in the circuit, the decomposition of
Toffoli gates and measurement gate execution time. The
size of a quantum computer was also calculated based on
the quantity of magic states to implement logical T gates
and the quantity of logical (algorithm) qubits. Jones et
al. [21] also estimated the performance and resource of a
quantum algorithm within their layered architecture for
a quantum computing. But, their analysis methods are
almost the same with Fowler et al. [8].
On the other hand, Reiher et al. [34] applied a quan-
tum compile and surface code error correction to estimate
the performance and the resource for quantum simulation
of complex chemical system. Since the authors did not
consider any quantum computer system architecture and
the system integration on the architecture, their analy-
sis corresponds to the statistical calculation based on the
data from compiled algorithm and fault-tolerant scheme.
Note that in the literature, the authors claimed that de-
spite overhead of quantum error correction and compile,
quantum computations can be performed within reason-
able time.
The toolbox for estimating quantum resource
QuRE [41, 42] considers most of quantum comput-
ing components such as quantum compile, quantum
error correction, physical qubit layout and quantum
computing device. By taking quantum compile, they
prepared all quantum gates of a quantum algorithm.
Furthermore, they covered both of block-type quantum
codes (Steane code and Bacon-Shor code) and surface
code and employed 2-dimensional qubit layout for their
logical qubits. In this regards, this toolbox performs the
analysis based on more quantum computing components
than before. However, their analysis method is still
statistical investigation. For example, their estimation
on the execution time of a quantum computing simply
depends on the quantity of quantum gates and its
running time,
∑
1/gparallel × #g × gT . Note that
gparallel denotes the quantity of gates g executed in
maximally parallel and gT is the execution time of the
gate. As mentioned before, it is difficult to say their
analysis results coincide with the practical situation.
III. CONFIGURATION OF QUANTUM
COMPUTING
We describe how to configure a quantum computing
with the proposed quantum computing framework. We
first overview the structure and functionality of our quan-
tum computing framework, and second describe how to
configure a quantum computing. We also describe which
schemes and protocols are currently supported by the
framework.
3TABLE I. Summary of the related works. Note that “△” indicates the corresponding component is partially applied. Namely,
as mentioned before, Refs. [8] and [21] perform the performance analysis based on only the dominant part of a quantum
algorithm, not covering all quantum gates. Besides, surface code basically requires that physical qubits are arranged on the
2-dimensional nearest neighbor layout. Therefore, the related works applying surface code error correction implicitly consider
a 2D layout of physical qubits in spite of explicitly no mentioning about the qubit layout.
Compile FTQC
Micro-Architecture System Analysis
(Qubit Layout) Synthesis Criterion
Quipper [13, 14, 37]
© X X X
One Time
ScaffCC [16, 18, 19] Computing
Fowler et al. [8]
△ Surface △ X
One Time
Jones et al. [21] Computing
Reiher et al. [34] © Surface △ X
One Time
Computing
QuRE [41, 42] ©
Steane,
Layout of Physical Qubits X
One Time
Computing
Bacon-Shor,
Surface
Present Work ©
Steane, Layouts of Physical and Logical Qubits,
© Fidelity 100%
Surface Communication Bus, Computing Regions
A. Overview of Framework
We describe the structure and functionality of the pro-
posed quantum computing framework. It deals with
quantum computing programming, quantum compile,
quantum computer architecture, fault-tolerant quantum
computing scheme and quantum computing device. To
this end the proposed framework is composed of three
functional layers: compile, system and building block.
Each layer has a well-defined role and provides several
options to conduct their functions. All layers are closely
related to each other. FIG. 1 shows the data flow on the
framework for performance analysis and quantum com-
puting.
The compile layer covers programming and compiling
a quantum algorithm. A quantum algorithm is written
in a high-level programming language, and then com-
piled into a sequence of quantum gates, called a quantum
assembly code. To compile a programmed quantum algo-
rithm, target quantum gates and compile algorithm have
to be determined beforehand. FIG. 2 shows the input
and output of the compile layer. In the present work, we
hire an open-source quantum computing compiler Scaf-
fCC [16, 18, 19] that supports programming language
Scaffold [17]. The details about quantum compile, quan-
tum gates and a quantum assembly code are described in
section IVA. Why we exploit ScaffCC (Scaffold) in this
work is also discussed there.
The system layer synthesizes a quantum computing by
integrating quantum algorithm, quantum computer ar-
chitecture and building block (qubit/gate and quantum
computing device). It deals with everything required to
run a quantum algorithm on a quantum computer sys-
tem architecture. For example, it recasts a quantum al-
gorithm for a target quantum computer architecture via
system synthesis (also called system mapping). For the
system synthesis, this layer first takes a quantum com-
puting system architecture. A qubit connectivity must
Compile System
Building
Block
Quantum
Algorithm
Physical
Device
Performance
Result
(a)
Device 
Signal
Compile System
Building
Block
Quantum
Algorithm
Physical
Device
(b)
FIG. 1. Data flow over the layers. (a) For the performance
analysis, the compile layer and the building block layer pro-
vide a quantum assembly code and the performance of logical
operations to the system layer. The system layer then per-
forms the performance analysis during the system mapping.
(b) The platform can be utilized to perform a practical quan-
tum computing. For that, all data is flowing sequentially from
the most top layer to the bottom layer. In the bottom layer
control signal for physical device has to be generated for the
practical computing.
be definitely defined there, and a communication bus
may be included for an efficient interaction over distant
qubits. Issues related to a quantum computer architec-
ture depend on a chosen quantum error-correcting code
and a fault-tolerant quantum computing scheme. FIG. 3
describes the input and output in the system layer. In
terms of the performance analysis, the output of the sys-
tem layer is the performance analysis result, but to run a
quantum algorithm this layer generates an architecture-
specific description of a quantum algorithm as shown in
FIG. 1.
The building block layer is associated with qubits and
gates of quantum algorithm. This work basically assumes
4Compile 
Layer
(Programmed)
Quantum
Algorithm
Target
Gates
Compile
Algorithm
Quantum
Assembly
Code
FIG. 2. The input and output in the compile layer. By
a compile, a programmed quantum algorithm is decomposed
into a quantum assembly code. For the compile, a compile
algorithm and target gates have to be determined beforehand.
Target gates can be varied according to a quantum computing
type such as a fault-tolerant quantum computing or a non-
fault-tolerant (physical) computing. The selection of target
gates can also be influenced by a qubit technology.
System 
Layer
Quantum
Assembly
Code Performance
Result
Logical
Gates
System
Mapping
Algorithm
Quantum
Computer
Architecture
FIG. 3. The input and output in the system layer. Quan-
tum assembly code and logical gates are passed from the com-
pile layer and the building block layer respectively. Quantum
computer architecture describes the layout of logical/physical
qubits and a communication bus over qubits. System map-
ping algorithm describes how to integrate a quantum algo-
rithm (quantum assembly code) and a quantum computer ar-
chitecture. It includes the qubit placement, the gate schedul-
ing and so on.
a fault-tolerant quantum computing, and therefore such
qubits and gates are related to logical qubits and gates
encoded in a quantum error-correcting code. In this re-
gards, the main functionality of this layer is to assemble
physical qubits and gates to implement a logical qubit
and gate. For that, a quantum error-correcting code
should be determined first, and then logical qubit and
gate will be implemented according to a fault-tolerant
quantum computing scheme of the chosen quantum code.
During the implementation, it is able to generate the per-
formance of logical qubit and gate based on the properties
of physical operations, time and fidelity (or error rate).
FIG. 4 shows the input and output at the building block
layer. In terms of the performance analysis, the output
of the building block layer is the performance of logical
quantum operations, time and fidelity. Note that the
proposed platform supports [[7, 1, 3]] Steane code and a
surface code.
Building
Block 
Layer
Device 
Properties
Quantum
Code
Logical Gates
FTQC
Scheme
FIG. 4. The input and output in the building block layer. To
make logical qubits/gates, a quantum error-correcting code
should be determined first. Besides, to derive the performance
of logical operations, the properties of physical device, time
and fidelity, have to be considered. Then, based on the physi-
cal device property and the fault-tolerant quantum computing
scheme, logical operations with a specific performance will be
generated.
B. Configuration of Quantum Computing
We describe how to configure a quantum computing
with the proposed framework. As mentioned above, it is
possible to configure a quantum computing by selectively
choosing specific protocols or the properties of physical
device. For example, you can configure a Steane code
based fault-tolerant quantum computing to run Shor al-
gorithm. For that, through compile you generate a quan-
tum assembly code. Logical building blocks are built
up based on the chosen quantum error-correcting code,
quantum device and the size of the quantum algorithm.
The next task is to determine a quantum computer archi-
tecture of a certain physical and/or logical qubit layout.
The architecture depends on the chosen fault-tolerant
quantum computing scheme. Note that it is also possible
to choose a physical non-fault-tolerant quantum comput-
ing by assuming high-fidelity quantum computing device
(see Section VIIB).
In Table II, we show some options our framework cur-
rently supports. In the table, the compile type and the
mapping type (with qubit layout) completely depend on
the type of a quantum assembly code, a structured code
and a non-structured code. A structured quantum as-
sembly code is processed by a structured system mapping
for a structured quantum computer architecture. Simi-
larly, a non-structured quantum assembly code is taken
by a non-structured system mapping based on a simple
qubit layout such as a regular 1- or 2-dimensional lattice.
The details of the structured and non-structured quan-
tum assembly codes will be described in Section IVA. In
Sections VI and VII, we configure quantum computings
with such options and analyze the performance and the
quantum resource.
With the proposed framework, a quantum comput-
ing can be configured as follows. The first step is
to determine the type of a quantum computing, a
fault-tolerant quantum computing or a non-fault-tolerant
physical quantum computing. In general, such deter-
5TABLE II. List of protocols and layouts currently supported by the framework. In the compile layer, we choose a compile
type and a target gate set. The compile type is the format of a quantum assembly code. The compile type is closely related
to the mapping type and qubit layout. A specific qubit layout can be chosen within a selection. In the building block layer,
the scheme for a fault-tolerant quantum computing is determined. According to the scheme, the protocol of logical operations
are fixed.
Layer Options Values
Compile
Compile Type Structured code, Non-Structured code
Target Gate Set
{X,Z,H,S(S†), T (T †), RZ(θ), CNOT},
{X,Z,H,S(S†), T (T †), CNOT}
System
Mapping Type Structured, Non-structured
Qubit Layout
(Non-structured) All-to-All, 1D, 2D, Arbitrary
(Structured) All-to-All, (1D, 1D), (1D, 2D), (2D, 2D)
Building FTQC Scheme Steane code, Surface code
Block Device Time, Fidelity
mination depends on the size of a quantum algorithm
and the assumption on the reliability of physical device.
The larger quantum algorithm requires the more reliable
qubits and gates. If you decided a fault-tolerant quan-
tum computing, you need to choose a quantum error-
correcting code. Our platform supports [[7, 1, 3]] Steane
code and a surface code now. According to a quantum
algorithm and the reliability (error rate) of physical de-
vice, the concatenation level for Steane code or the code
distance for a surface code will be determined.
The choice of the quantum computing type affects a
succeeding quantum compile step. Exactly, target quan-
tum gates for a compile completely depends on the cho-
sen quantum computing type. For example, RZ(θ) gate
for a rotational angle θ is very exploited in a quantum
algorithm, but its logical version is not generally imple-
mented in a fault-tolerant manner. Therefore, to realize
a fault-tolerant quantum computing, you have to decom-
pose such RZ(θ) gate into a sequence of H , S and T
gates those are fault-tolerantly implementable. On the
other hand, since the physical RZ(θ) gate can be easily
performed on physical qubits, there is no problem with
quantum gates including the rotational gate for a non-
fault-tolerant physical quantum computing.
The second step is to make a quantum computing pro-
gram and compile it into a quantum assembly code. As
mentioned above, for the compile, you have to exploit
target quantum gates determined in the previous step.
The proposed framework exploits open-source program-
ming language Scaffold and compiler ScaffCC. You can
see how to make a Scaffold program in Ref. [17] and how
to use ScaffCC compiler in Ref. [16, 19]. In section IVA,
we show a simple example of a Scaffold program and the
associated quantum assembly code. Note that ScaffCC
decomposes an arbitrary one-qubit gate into a sequence
of H , S and T by exploiting gridsynth [35] or sqct [24].
The third step is to choose a quantum computing archi-
tecture, in particular a qubit array. The proposed plat-
form takes not only a simple regular 1- or 2-dimensional
lattice, but also a hierarchically structured qubit array. A
communication bus also should be considered to make an
efficient interaction over distant qubits. FIG. 11 shows an
example of a hierarchically structured quantum comput-
ing architectures. In case of Steane code quantum com-
puting, the structure of a qubit array seriously affects a
quantum computing due to the limited local qubit inter-
action. We will show such limitation raises highly non-
trivial temporal overhead in section VIC. On the other
hand, surface code quantum computing scheme is fun-
damentally established based on local qubit interaction
on the 2-dimensional lattice. FIG. 14 shows a quantum
computing architecture for a surface code quantum com-
puting.
If the configuration is completed, we can perform the
system synthesis. During the system synthesis, the quan-
tum algorithm is reformulated for the target quantum
computer architecture. As will be discussed later, from
the system synthesis, the expected performance (circuit
depth, execution time, fidelity, KQ, and so on) and the
required quantum resource (qubits and gates) of a quan-
tum computing are evaluated.
C. Analysis of Quantum Computing
We briefly mention what performance are evaluated by
the framework, but the detailed analysis method will be
described in section V. The framework first inspects the
quantities of qubits and gates. Those figures are usually
analyzed by a quantum compile without considering a
quantum computer architecture [19, 37]. But, our frame-
work examines such quantities with the consideration for
a quantum computer system architecture. In the system
synthesis, a fault-tolerant quantum computing scheme in
particular a magic state factory is taken into considera-
tion. Therefore, it is possible to estimate the temporal
and spatial overhead caused by factors those are veiled
in a quantum algorithm, and therefore we believe our es-
timation nearly coincides with the resource to perform a
real quantum computing.
The framework examines the expected quantum com-
puting execution time (circuit depth, fidelity, KQ and
so on) of a quantum algorithm based on quantum com-
puter architecture, fault-tolerant protocol and physical
6device. By applying the properties of physical device
and fault-tolerant protocol, we deduce the performance
of logical gates and quantum error correction. Besides,
by conducting a system integration, we are able to obtain
the single-round execution time of a quantum algorithm
by applying the performance of logical gates and error
correction above. Our framework goes further for more
detailed analysis. During the system integration, as men-
tioned above, the fidelity of a quantum computing can
be calculated. By reflecting the fidelity, it is possible to
estimate the execution time for a quantum computing
achieving a fidelity 100%. In doing so, we fairly compare
fault-tolerant quantum computing and non-fault-tolerant
quantum computing. The quantity of qubits is also based
on this performance criterion.
Besides the above-mentioned, the framework can gen-
erate various performance data. Based on the data, we
can estimate the temporal and spatial overhead of a
quantum computing. For example, as mentioned above
limited local interaction between nearest neighbor qubits
sometimes requires additional qubit movements to per-
form 2-qubit CNOT gate. The quantity of such qubit
movements corresponds to the temporal overhead. The
platform evaluates such temporal overhead as a ration of
SWAP gates to total quantum gates. As will be shown
in section VIC, a quantum computing requires highly
nontrivial temporal overhead.
IV. PROPOSED QUANTUM COMPUTING
FRAMEWORK
A. Compile Layer
A quantum compile is a process that decomposes a
quantum algorithm into a sequence of quantum gates.
Here a quantum algorithm is in entirely programmed
form by using a high-level abstract programming lan-
guage. Recently, several research groups have developed
programming environments for a quantum computing by
modifying conventional classical programming languages
such as python and C/C++ [12, 14, 17, 28, 40, 50].
It is well known that an arbitrary quantum algorithm
can be decomposed into a combination of 1-qubit rota-
tional gates and 2-qubit CNOT gate [31]. The target
quantum gates for a compile can be varied according to
situation. For example, the set of H , T , and CNOT
is de facto standard for a universal fault-tolerant quan-
tum computing. But, to reduce the complexity in com-
pile or to provide a flexibility to a programmer, one
usually add more quantum gates to target gates. Fur-
thermore, according to specific quantum mechanical sys-
tem, physically implementable quantum gates slightly
differ [27, 31]. In this work, we utilize two sets of quan-
tum gates {X,Z,H, S(S†), T (T †), RZ(θ), CNOT } and
{X,Z,H, S(S†), T (T †), CNOT }. The difference between
both is RZ(θ) gate. As mentioned before, since the rota-
tional gate for an arbitrary rotational angle θ can not be
implemented in the fault-tolerant manner, the first set
is exploited for a physical quantum computing and the
second is used for a fault-tolerant quantum computing.
The output of the quantum compile, the sequence of
quantum gates, is called a quantum assembly code. A
quantum assembly code is a list of quantum instructions
that is a combination of a quantum gate and its tar-
get qubit(s). It is an intermediate representation of a
quantum algorithm between a mathematical description
and a physical machine instruction description [5, 19, 43].
There is not any standard for a quantum assembly code,
and therefore a specific representation and a structure of
which slightly differ according to literature. For exam-
ple, a quantum instruction to apply a Hadamard gate
to a qubit q is represented as “hadamard q” or “H(q)”.
Besides, a certain quantum assembly code has its own
special structure. For example, Open QASM by IBM [5]
provides a conditional statement “if... then... else” usu-
ally supported by conventional programming languages.
In the present work, we hire open-source quantum
compiler ScaffCC [16, 19]. It compiles a quantum com-
puting program written in quantum computing program-
ing language Scaffold [17]. A Scaffold program consists of
one main module and multiple sub-modules (see FIG. 7).
A module generally seems like a function or a procedure
of conventional programming languages such as C/C++,
Python and so on. The module is composed of instruc-
tions calling quantum gates and/or other modules. The
execution of a Scaffold program begins with the first in-
struction of the main module, and terminates by conduct-
ing the last instruction of the same module. During the
execution of the main module, other sub-modules may
be called.
Previously, we mentioned that some quantum assem-
bly codes have unique feature in their structure. So does
the output of ScaffCC. The compiler generates a hier-
archically structured quantum assembly code, in which
a quantum algorithm is composed of multiple modules.
A module consists of performing quantum gates and/or
other modules. In the previous paragraph, we also men-
tioned that a Scaffold program consists of modules. To
avoid any ambiguity, we need to distinguish both. A
module in a Scaffold program is completely defined and
written by a programmer, and through a compile it is
converted to a module in a quantum assembly code.
Therefore, both are technically identical. FIG. 5 shows
an example of a module in a quantum assembly code.
We need to say why we exploit ScaffCC in this work.
As mentioned above, a quantum assembly code is a list
of quantum instructions. As the size of a quantum algo-
rithm increases, the size of a quantum assembly code non-
trivially increases. It completely follows the complexity
of a quantum algorithm. Obviously, the size of meaning-
ful quantum algorithm in reality is beyond the capability
of conventional super-computing. In other words, the size
of a quantum assembly code for our interested algorithm
will be very huge, and such scalability will cause a prac-
tical problem in classical control of a quantum comput-
7// module name: module_name 
// parameter qubits: a, b, .. 
module module_name ( qbit a , qbit* b, …)       
{ 
    // local qubits 
    qbit scratch[n]; 
                            
    // classical bit 
    cbit answer[m];                             
    // initializing a local qubit as |0> 
    PrepZ ( scratch[0] , 0 );                   
    …     
    // one-qubit gate 
    H ( scratch[0] ); 
                           
    // two-qubit gate 
    CNOT ( scratch[2] , scratch[1] );           
    // sub-module 
    moduleA ( scratch[1] , scratch[2] , a[2] ); 
    // measure 
    answer[0] = MeasZ ( a[0] );                 
}
FIG. 5. An example of a module. Parameter qubits passed
from external modules are clearly specified at the beginning.
A module is defined by the preparation of local qubits and
classical bits, calling quantum gates and other sub-modules,
and measurement of local qubits.
ing. For example, the size of a quantum assembly code
of Shor algorithm to factorize 512 bit integer is around
39 TB (see FIG. 6). Therefore, we have had trouble in
generating and managing such a huge code with a clas-
sical computing system. We could not even attempt to
generate more larger-sized quantum algorithm than the
algorithm above due to the lack of classical storage and
memory.
On the other hand, it is believed that the hierar-
chy provided by ScaffCC can suppress the scalability.
For example, to perform a composite quantum opera-
tion of N gates as much as M times, a normal (non-
structured) quantum assembly code requires MN quan-
tum instructions (#gates × #iteration). However, the
hierarchical assembly code requires only M +N instruc-
tions (#gates + #iteration) by defining the operation
as a module. By doing this, the hierarchical quantum
assembly code is much smaller than the non-structured
code as shown in FIG. 6. To the best of our knowl-
edge, only ScaffCC supports such hierarchically struc-
tured quantum assembly code. This is the main reason
whey we exploit ScaffCC in the proposed platform.
To conclude this section, we show a simple exam-
ple of a Scaffold program to make a 5-qubit CAT state
1√
2
(|0〉⊗5 + |1〉⊗5) and a corresponding quantum assem-
bly code. Readers can see how to make a Scaffold pro-
gram in Ref. [17] and how to use ScaffCC compiler in
Ref. [16]. FIG. 7 shows a quantum circuit to implement
a 5-qubit CAT state and a corresponding Scaffold pro-
gram. A structured and a non-structured quantum as-
sembly code are respectively shown in FIG. 8.
1.7TB
14.2TB
39.0TB
23.5MB
88.1MB
338.6MB
Non-Modular QASM
Modular QASM
Q
A
S
M
 S
iz
e 
(B
y
te
s)
108
1010
1012
1014
Shor Algorithm Inputs
N 128 N 256 N 512
QASM Size: Non-Modular QASM and Modular QASM
FIG. 6. The size of the quantum assembly codes for Shor
algorithm n = 128, 256, and 512. Both codes are generated by
ScaffCC. As the input increases, the size of quantum assembly
code in the non-structured format increases over dozens TB.
|0〉 H •
|0〉 •
|0〉 •
|0〉 •
|0〉
(a)
#define size 5 
module MakeCAT(qbit* data) 
{ 
    H(data[0]); 
    for(int i=0; i<size-1; i++) 
    { 
        CNOT(data[i], data[i+1]); 
    } 
} 
int main() 
{ 
    qbit data[size]; 
    for(int i=0;i<size; i++) 
    { 
        PrepZ(data[i], 0); 
    } 
    MakeCAT(data); 
    return 0; 
}
module MakeCAT ( qbit* data )  
{ 
    H ( data[0] ); 
    CNOT ( data[0] , data[1] ); 
    CNOT ( data[1] , data[2] ); 
    CNOT ( data[2] , data[3] ); 
    CNOT ( data[3] , data[4] ); 
} 
module main (  )  
{ 
    qbit data[5]; 
    PrepZ ( data[0] , 0 ); 
    PrepZ ( data[1] , 0 ); 
    PrepZ ( data[2] , 0 ); 
    PrepZ ( data[3] , 0 ); 
    PrepZ ( data[4] , 0 ); 
    MakeCAT ( data ); 
qubit data0 
qubit data1 
qubit data2 
qubit data3 
qubit data4 
PrepZ data0 
PrepZ data1 
PrepZ data2 
PrepZ data3 
PrepZ data4 
H data0 
CNOT data0,data1 
CNOT data1,data2 
CNOT data2,data3 
CNOT data3,data4
(b)
FIG. 7. An example of a Scaffold program to implement a
5-qubit CAT state. (a) A quantum circuit and (b) its Scaffold
program. The module MakeCAT makes the CAT state.
B. System Layer
A quantum algorithm (quantum assembly code) is a
logic of how to solve a given problem. It is based on ideal
physical situation where noiseless physical gates and arbi-
trary long qubit interaction are allowed. In other words,
a quantum algorithm is developed without considering
any physical implementation.
On the other hand, a quantum computer where a quan-
tum algorithm is executed has a certain logical and phys-
ical architecture such as a qubit layout. Therefore, to
run a quantum algorithm on such quantum computer,
we need to reformulate the algorithm to be compatible
with the given quantum computer architecture. For ex-
ample, real quantum computing devices in IBM Quan-
tum Experience [15] have very limited qubit layout and
allow only one-directional CNOT. Therefore, the quan-
tum assembly codes shown in FIG. 8 can not be directly
executed on the IBM QX4 device (see FIG. 9 (a)) because
the codes include unallowable CNOT gates. Therefore,
8#define size 5 
module MakeCAT(qbit* data) 
{ 
    H(data[0]); 
    for(int i=0; i<size-1; i++) 
    { 
        CNOT(data[i], data[i+1]); 
    } 
} 
int main() 
{ 
    qbit data[size]; 
    for(int i=0;i<size; i++) 
    { 
        PrepZ(data[i], 0); 
    } 
    MakeCAT(data); 
    return 0; 
module MakeCAT ( qbit* data )  
{ 
    H ( data[0] ); 
    CNOT ( data[0] , data[1] ); 
    CNOT ( data[1] , data[2] ); 
    CNOT ( data[2] , data[3] ); 
    CNOT ( data[3] , data[4] ); 
} 
module main (  )  
{ 
    qbit data[5]; 
    PrepZ ( data[0] , 0 ); 
    PrepZ ( data[1] , 0 ); 
    PrepZ ( data[2] , 0 ); 
    PrepZ ( data[3] , 0 ); 
    PrepZ ( data[4] , 0 ); 
    MakeCAT ( data ); 
}
qubit data0 
qubit data1 
qubit data2 
qubit data3 
qubit data4 
PrepZ data0 
PrepZ data1 
PrepZ data2 
PrepZ data3
PrepZ data4
H data0 
CNOT data0,data1 
CNOT data1,data2 
CNOT data2,data3 
CNOT data3,data4
(a)
#define size 5 
module MakeCAT(qbit* data) 
{ 
    H(data[0]); 
    for(int i=0; i<size-1; i++) 
    { 
        CNOT(data[i], data[i+1]); 
    } 
} 
int main() 
{ 
    qbit data[size]; 
    for(int i=0;i<size; i++) 
    { 
        PrepZ(data[i], 0); 
    } 
    MakeCAT(data); 
    return 0; 
module MakeCAT ( qbit* data ) 
{ 
    H ( data[0] ); 
    CNOT ( data[0] , data[1] ); 
    CNOT ( data[1] , data[2] ); 
    CNOT ( data[2] , data[3] ); 
    CNOT ( data[3] , data[4] ); 
} 
module main (  ) 
{ 
    qbit data[5]; 
    PrepZ ( data[0] , 0 ); 
    PrepZ ( data[1] , 0 ); 
    PrepZ ( data[2] , 0 ); 
    PrepZ ( data[3] , 0 );
    PrepZ ( data[4] , 0 ); 
    MakeCAT ( data ); 
qubit data0 
qubit data1 
qubit data2 
qubit data3 
qubit data4 
PrepZ data0 
PrepZ data1 
PrepZ data2 
PrepZ data3 
PrepZ data4 
H data0 
CNOT data0,data1 
CNOT data1,data2 
CNOT data2,data3 
CNOT data3,data4
(b)
FIG. 8. Quantum assembly codes to generate a 5-qubit CAT
state. (a) A structured code and (b) a non-structured code.
0 1
3 4
2
(a)
qubit data0 
qubit data1 
qubit data2 
qubit data3 
qubit data4
PrepZ data0 
PrepZ data1 
PrepZ data2 
PrepZ data3
PrepZ data4 
H data1
CNOT data1,data0
H data0
H data2
CNOT data2,data1
H data1
H data3
CNOT data3,data2
H data2
H data3
CNOT data3,data4
qubit data0 
qubit data1 
qubit data2 
qubit data3 
qubit data4 
PrepZ data0 
PrepZ data1 
PrepZ data2 
PrepZ data3 
PrepZ data4 
H data0 
H data0 
H data1 
CNOT data1,data0
H data0 
H data1 
H data1 
H data2 
CNOT data2,data1 
H data1 
H data2 
H data2 
H data3 
CNOT data3,data2 
H data2 
H data3 
CNOT data3,data4
(b)
FIG. 9. (a) The qubit layout of IBM QX4 device [15]. A node
indicates a qubit, and an edge with a direction implies that
the application of a controlled-CNOT gate is possible, where
the control qubit and the target qubit are the root and end
of the arrow. Therefore, as you see no bi-directional CNOT
is allowed on the IBM QX4. (b) The recasted assembly code
from FIG. 7 (b). Since the instruction “CNOT data0,data1“
is not allowed directly on the IBM QX4, Hadamard gates, “H
data1” and “H data2”, have to add before and after of the
instruction. Note that the node index k indicates the qubit
datak. We have not cancel out repetitive Hadamard gates. By
cancelling out those gates, the circuit depth can be reduced
from 12 to 9.
for the execution, we have to recast a quantum assembly
code for IBM QX4. In FIG. 9 (b), we show the recasted
(non-structured) quantum assembly code for the device.
This is the motivation of a quantum computing system
mapping.
The principle of a quantum computing system map-
ping is very simple, 1) set up a quantum computer archi-
tecture and 2) recast a quantum assembly code for the
architecture. In what follows, we first describe a quantum
computer architecture, and then show how to actualize a
quantum algorithm on the target architecture.
1. Quantum Computer Architecture
We discuss a hierarchically structured quantum com-
puter architecture for the proposed framework. In gen-
module PARSENODEEVEN ( qbit* a , qbit* even )  
{ 
    qbit scratch[3]; 
    PrepZ ( scratch[0] , 0 ); 
    PrepZ ( scratch[1] , 0 ); 
    PrepZ ( scratch[2] , 0 ); 
    X ( scratch[2] ); 
    ToffoliImpl ( scratch[1] , scratch[2] , a[2] ); 
    ToffoliImpl ( even[0] , scratch[2] , a[2] ); 
    X ( scratch[2] ); 
    CNOT ( scratch[2] , scratch[1] ); 
    CNOT ( scratch[2] , scratch[0] ); 
    X ( scratch[2] ); 
    ToffoliImpl ( scratch[0] , scratch[2] , a[1] ); 
    X ( scratch[2] ); 
    CNOT ( scratch[2] , scratch[1] ); 
    X ( scratch[2] ); 
    ToffoliImpl ( scratch[1] , scratch[2] , a[2] ); 
    X ( scratch[2] ); 
}
a[0] a[1] a[2]
a[3]
even 
[0]
scratch 
[0]
scratch 
[1]
scratch 
[2]
NULL
(a)
module PARSENODEEVEN ( qbit* a , qbit* even ) 
{ 
    qbit scratch[3];
    PrepZ ( scratch[0] , 0 ); 
    PrepZ ( scratch[1] , 0 ); 
    PrepZ ( scratch[2] , 0 ); 
    X ( scratch[2] ); 
    ToffoliImpl ( scratch[1] , scratch[2] , a[2] ); 
    ToffoliImpl ( even[0] , scratch[2] , a[2] ); 
    X ( scratch[2] ); 
    CNOT ( scratch[2] , scratch[1] ); 
    CNOT ( scratch[2] , scratch[0] ); 
    X ( scratch[2] ); 
    ToffoliImpl ( scratch[0] , scratch[2] , a[1] ); 
    X ( scratch[2] ); 
    CNOT ( scratch[2] , scratch[1] ); 
    X ( scratch[2] ); 
    ToffoliImpl ( scratch[1] , scratch[2] , a[2] ); 
    X ( scratch[2] ); 
a[0] a[1] a[2]
a[3]
even 
[0]
scratch 
[0]
scratch 
[1]
scratch 
[2]
NULL
(b)
FIG. 10. (a) An example of a module in a quantum assembly
code and (b) the associated computing region on a quantum
computer architecture. In (a), the qubits a and even are
parameter qubits passed from other modules, and the qubit
scratch is a local qubit locally used within the module. In (b),
the dark grey cells are for the parameter qubits, the white cells
are for the local qubits and the light grey cells are just empty
space or null qubits not working anything. While the size
of the parameter qubits a and even are not specified in the
module definition, it can be determined by tracing all modules
that calls the module.
eral, there is no restriction to the architecture. In other
literature, a regular 1- or 2-dimensional lattice is usually
exploited. But, in this work we assume that it is hier-
archically structured. A quantum computer is composed
of several computing regions called a module and a com-
munication bus connecting to the modules. By assuming
such structured architecture, the system mapping with a
hierarchically structured quantum assembly code can be
done efficiently.
A computing region is completely associated with a
module in a quantum assembly code. It consists of mul-
tiple cells for logical (or physical) qubits described in the
module. Some cells are allocated for parameter qubits
passed from other modules, and others are allocated for
local qubits which are temporarily used within a module.
Additional space may be sometimes required to form the
rectangular shape of a module. FIG. 10 shows an ex-
ample of a module in a quantum assembly code and its
associated computing region.
FIG. 11 shows examples of the above-mentioned quan-
tum computer architecture. The box labelled Mi,j (Mi)
indicates a module (computing region). We call the ar-
rangement of modules a global layout and the arrange-
ment of qubits within a module a local layout. All mod-
ules communicate with each other via a communication
bus. In the figure, the bus is depicted as a white space
outside of modules. We will discuss the bandwidth of the
communication bus in Section V.
Qubit that resides inside a module supports universal
quantum operation. The logical qubit is composed of
data qubits for holding data and ancilla qubits for error
correction and logical operations. On the other hand,
qubits for a communication bus only perform error cor-
rection and logical Clifford operations. Therefore, the
9M1,1 M1,2 … M1,n
M2,1 M2,2 … M2,n
… … … …
Mn,1 Mn,2 … Main
BUS
bandwidth
b
a
n
d
w
id
th
(a)
BUS
…
b
a
n
d
w
id
th
MainM1 M2 M3 M4 M5 M6
(b)
FIG. 11. An example of a proposed quantum computer
architecture. (a) 2D global layout and 2D local layout. (b) 1D
global layout and 2D local layout. tLogical qubits of different
color play different role in a module; parameter qubits (dark
grey), local qubits (white) and dummy qubits (light grey).
composition of logical qubits for a communication bus
can be differ from that for modules according to a quan-
tum error-correcting code and a fault-tolerant quantum
computing scheme.
2. System Mapping
FIG. 1 shows that all data are collected in the sys-
tem layer and the performance of a quantum computing
is evaluated there. In this section, we will describe the
system mapping in terms of the gate reformulation for
the target quantum computer structure and the perfor-
mance evaluation. A specific mapping process definitely
depends on the type of quantum instructions. Quantum
instructions in the hierarchically structured quantum as-
sembly code are classified into three types: 1-, 2-qubit
gate and module.
The set of 1-qubit gates includes X , Z, H , S (S†), T
(T †), RZ(θ) and a preparation and a measurement in the
Z basis. The mapping process for such gates is straight-
forward and can be done independently. Suppose that a
Hadamard gate is applied to a qubit q. If the application
of a certain quantum operation to the qubit was sched-
uled previously, the application of Hadamard gate will be
performed after the previous operation. If the previous
operation was over at time t(q), then Hadamard opera-
tion will start at t(q) and finish at t(q) +Ht, where Ht
is the execution time of the gate. This is everything for
the mapping of an 1-qubit gate. Note that the execution
time and fidelity of a quantum gate is provided from the
building block layer.
The present work deals with a CNOT gate only for a
2-qubit gate. Even though there is a case when a SWAP
gate is required, it is possible to implement a SWAP gate
with three CNOT gates. We deal with a CNOT gate as a
local gate acting on two qubits located in nearest neigh-
bor. Suppose that a CNOT gate is applied to qubits qa
and qb. Then, to execute the gate, both qubits have to
be in temporary and spatially ready status. If the are
apart, we have to move both qubits to be in neighbor
via SWAP operations. If one qubit is being manipu-
lated by other operation, we have to delay the CNOT
operation until both qubits are in idle status. Then, the
CNOT operation definitely begins at max{t(qa), t(qb)},
and finishes at time max{t(qa), t(qb)} + CNOTt. Note
that max{t(qa), t(qb} is the time both qubits become idle
status, and CNOTt is the execution time of a CNOT
gate.
The third type quantum instruction, a module, seems
like a multi-qubit composite quantum operation. There-
fore, on the surface, it seems that the mapping of a mod-
ule is very similar with the mapping of a 2-qubit CNOT
gate. For the mapping of a module, argument qubits for
a module should be temporally and spatially ready. The
critical difference from the case of a CNOT gate is that
a distinguished physical space1 is allocated for a module.
Therefore, to perform the mapping of a module, we have
to consider qubit movements between a present module
and a target module.
Suppose that a module A is being mapped now, and
we have to treat a quantum instruction “M (qa, qb, qc)”
that calls a module M with argument qubits qa, qb and
qc. Then, we have to move the argument qubits to the
designated area of the module M . The qubit movements
are achieved by SWAP operations through a communica-
tion bus. We call this movement a forward qubit passing.
After the forward qubit passing, the qubits will be placed
at the parameter qubit section of the module M . Please
see FIG. 10 (b). Quantum instructions of the module
M will then be executed. If it is faced with a quantum
instruction calling other module, then some qubits in the
module M will be passed to the designated space of the
newly called module and manipulated there by following
the quantum instructions of the module. After executing
all quantum instructions of the module M , the passed
qubits have to be back to the original module A. We
call this returning movement a backward qubit passing.
FIG. 12 shows the module operation including the for-
ward and backward qubit passings.
1 The physical space for a module is the computing region we de-
scribed before.
10
Module A
Module B
②
③ ①
④
⑤ ⑥
⑦
FIG. 12. An example of the module operation that consists
of seven steps: 1. (forward) move qubits to the bus, 2. (for-
ward) move to the target module, 3. (forward) move to the
parameter qubit cells (dark grey cells), 4. module operations,
5. (backward) move qubits to the bus, 6. (backward) move to
the original module, and 7. (backward) move to the original
qubit positions.
We perform the mapping for all modules sequentially
as they appear in a quantum assembly code. For that,
we have to keep two lookup tables, a global lookup ta-
ble and a local lookup table. For each module, we first
initialize a local lookup table for all qubits, and update
the manipulation time of each qubit as we process each
quantum instruction. After processing all instruction of
a module, we determine the execution time of the mod-
ule by picking up the maximum time among all qubits.
The performance of a module is recorded in the global
lookup table. During the mapping of a module, if a mod-
ule which was already mapped is called then we can refer
the performance information of the called module from
the global lookup table.
After mapping all modules, we can determine the ex-
ecution time of a quantum algorithm as the maximum
time among the qubits in the main module. For exam-
ple, in FIG. 8 (a), the execution time of the algorithm
is PrepZt + FPmain→MakeCAT + BPmain←MakeCAT +
MakeCATt, where MakeCATt = Ht + 4CNOTt. Note
that FPmain→MakeCAT (BPmain←MakeCAT ) is the time
for the forward (backward) qubit passing. The execu-
tion time of the qubit passing depends on the distance
between modules.
So far, we have described a system mapping algorithm
for a hierarchically structured quantum assembly code.
By the way, the presented algorithm can be applied to
a non-structured quantum assembly code. In such code,
there are two types of quantum instructions: 1- and 2-
qubit gate. Regardless of the type of a quantum assembly
code, as mentioned before the heart of the system map-
ping is 1) set up a quantum computer architecture and
2) recast quantum algorithm for the architecture. To be
compatible with the quantum assembly code, a simple
qubit array such as a regular 2-dimensional lattice may
be enough. The proposed framework supports a system
mapping on an arbitrary qubit layout as shown in FIG. 9
(a).
C. Building Block Layer
We apply fault-tolerant quantum computing protocols
based on [[7, 1, 3]] Steane code [38] and surface code [8, 9].
Both codes have well-studied logical gate protocols. The
concatenation level for Steane code and the code distance
for a surface code are completely determined by a given
quantum algorithm [21, 41]. In this work, we set both
figures by using KQ formalism [39].
1. Steane code
[[7, 1, 3]] Steane code encodes logical quantum informa-
tion in a qubit into seven physical qubits, and protects
it from an arbitrary 1-qubit quantum noise. Since the
transversal implementations for a logical Hadamard and
a logical CNOT gate are supported, many studies on the
fault-tolerant quantum computing based on the Steane
code have been conducted. In Ref. [44], an optimal design
of a logical qubit for Steane code under the 2-dimensional
nearest neighbor qubit interaction was proposed. They
achieved the threshold O(10−5) with 48 physical qubits
and modified quantum error correction.
In this work, we have redesigned a logical qubit with
30 physical qubits. Seven among them are used for hold-
ing data, and the others are temporarily used for log-
ical operations and error correction. In particular, we
applied the Shor quantum error correction [36] that ex-
ploits 4-qubit Shor state for the syndrome measurement.
For that, we prepare and verify the Shor state [48]. We
implemented the preparation of a logical state by follow-
ing Ref. [10]. Most of logical gates are implemented as
transversal gates, and the non-Clifford T gate is imple-
mented by exploiting a magic state. We generate magic
states by employing a 7-qubit Shor state without magic
state distillation [47].
Accuracy threshold theorem [2, 25] says that if we have
a quantum device of physical error rate below a code
threshold, it is possible to achieve an arbitrarily reliable
quantum computing. But, for a very large-sized quan-
tum algorithm, encoding only once may not be enough.
Fortunately, by encoding a qubit recursively [26], we can
lower the effective error rate to the degree where a reli-
able quantum computing is possible.
Given a quantum algorithm, we can calculate KQ and
determine the maximum tolerable error rate Pmax as
1/KQ. We then determine the concatenation level l by
the following inequalities satisfies
Pmax ≥
(copp
2)
2
l
cop
, (1)
where op is quantum error correction and logical opera-
tions, and cop is the constant factor of a specific logical
operation op. We obtained the constant values of each
logical operation from KQ of a quantum circuit for the
operation. For example, cQEC corresponds to KQ of the
11
TABLE III. The arrangement of qubits to implement a
logical qubit in the concatenation level l. The component
qubits are in the concatenation level l−1. The qubit denoted
by D[i] indicates a i-th data qubit. The qubits 4Sh[i] and
7Sh[j] are for 4- and 7-qubit Shor states for syndrome
measurement and a logical T gate, and V4Sh and V7Sh are
used to verify the 4- and 7-qubit Shor states respectively.
The qubit M [i] is also used to implement a logical T gate.
V4Sh[1] V4Sh[2] D[1] D[2] D[3]
4Sh[1] 4Sh[2] D[4] D[5] D[6]
4Sh[4] 4Sh[3] D[7] V7Sh[1] V7Sh[2]
M [1] M [2] M [3] M [4] M [5]
M [6] M [7] V7Sh[3] 7Sh[1] 7Sh[2]
7Sh[3] 7Sh[4] 7Sh[5] 7Sh[6] 7Sh[7]
QEC quantum circuit. In this work, we have not op-
timized the arrange of qubits (see Table III), and there-
fore the quantum error correction and a logical operation
work sub-optimally, and therefore the threshold is lower
than the optimal value2.
Suppose that a concatenation level for a quantum com-
puting is determined as l. The implementation of a log-
ical T gate in the level l consists of only Clifford oper-
ations in a lower level k < l. Then, in the level k, the
implementation of a logical T gate is not necessary and
therefore the qubits to implement a magic state, the 7-
qubit Shor state, are not strongly required. Therefore
only 23 qubits are required to implement a logical qubit
in the lower level k. But, to form a rectangular shape of
a logical qubit, we require 25 qubits (5 × 5 layout) for a
lower level qubit in the level k. In Table III, the qubits
denoted by 7Sh[i] is not required in the lower level qubit
k. On the other hand, the qubits V7Sh[i], the main role
of which is to verify the 7-qubit Shor state, are used for
the other purpose, logical measurement.
2. Surface code
2D surface code based fault-tolerant quantum comput-
ing is recognized as the most promising fault-tolerant
quantum computing scheme due to physically less chal-
lenging requirements. The code has a high threshold
aroundO(10−3) [9, 33, 46], and its structure is well suited
to nearest neighbor interacted qubits arranged on the 2-
dimensional lattice. In this work, we have implemented
double defect based logical qubits and logical gates de-
scribed in Refs. [8, 9, 41]. The detailed protocols are be-
yond the scope of the present work, and we will describe
performance parameters only.
We use the KQ formalism to determine a code dis-
tance d [21, 41]. The objective error rate of a quantum
2 Please note that the objective of our work is not to increase
a code threshold, but to configure a quantum computing and
analyze its performance accurately.
computing is determined by Pfail ≈ KQǫL, where ǫL is
a logical error rate. The code distance d is determined
as
d ≈
2
(
log ǫL − logC1
)
logC2 + log
ǫp
ǫth
− 1, (2)
where ǫp and ǫth are physical error rate and the threshold
of the surface code respectively. C1 and C2 are code
parameters, and we use the specific figures, C1 ≈ 0.13,
C2 ≈ 0.61, from Ref. [21]. We apply the code threshold
ǫth = 0.009.
Now it is possible to determine the execution time of
surface code logical gates. Above all, we need to stress
that the surface code error correction has to iterate d
rounds of a syndrome measurement. We assume that
logical Pauli operators are performed in classical control
software by updating logical Pauli frame [8]. A logical
CNOT gate between the same type (X-cut or Z-cut)
logical qubits consists of three CNOT gates between dif-
ferent type logical qubits. For that, we have to prepare a
pair of different type logical ancilla qubits [8]. A logical
Hadamard gate protocol consists of cutting and recon-
necting a target logical qubit from/to a whole qubit ar-
ray and performing transversal physical Hadamard and
SWAP gates [6, 8]. The Hadamard gate makes the role of
syndrome qubits interchanged, and the syndrome qubit
reverts to the original position (role) via the SWAP op-
eration.
We now turn the attention to the non-transversal gates
S and T . A logical S gate is deterministically imple-
mented by using a high fidelity magic state |YL〉 =
1√
2
(
|0L〉 + i|1L〉
)
[8, 21]. Since the magic state is not
destroyed during the gate protocol, it is possible to reuse
it if plenty of high fidelity magic states are prepared at
the beginning. Therefore before running a quantum al-
gorithm, we prepare a number of |YL〉 states. We include
the duration of preparing the states in the execution time
of a quantum computing. How many |YL〉 states should
be prepared will be discussed later.
A logical T gate is implemented by consuming a high
fidelity magic state |AL〉 =
1√
2
(
|0L〉 + exp
iπ/4 |1L〉
)
[8].
A magic state has to be prepared for every T gate. We
assumed that a magic state is prepared and supplied in
offline. In other words, the preparation of a high fidelity
|AL〉 is not included in the quantum computing time.
On the other hand, the logical T gate operation is prob-
abilistically achieved up to the logical S gate correction.
Therefore to implement a logical T gate, a |YL〉 state is
probabilistically required.
We now discuss the quantity of the required |YL〉
states. It depends on the quantity of the states maxi-
mally required at one time. Since a logical T gate prob-
ably requires a |YL〉, we have to prepare |YL〉 as much as
max{parallelT, parallelS} where parallelT (parallelS)
is the number of T (S) gates executed in parallel. Note
that the quantities of parallelT and parallelS can be
found from the system mapping process.
12
The preparation of a high fidelity magic state takes
two steps, state injection and state distillation [7, 8].
The state injection in the surface code quantum com-
puting injects an arbitrary logical state into the distance
1 logical qubit called a short qubit and makes the log-
ical qubit larger [8]. Enlarging a double defect logical
qubit consists of multi-cell qubit movements and mea-
surement on data qubits. The state distillation protocol
takes m noisy states and generates k less noisy states,
where m > k. By performing multiple rounds of the dis-
tillation, the magic spreaded over many states are con-
centrated on only a few states and therefore we can ob-
tain high fidelity magic states. In this work, we deal
with the magic state distillation protocols described in
Refs [8, 9]. The required iteration of the protocol is com-
pletely determined by the objective fidelity and a phys-
ical error rate [41]. We set the objective error rate of
the magic states as 10−12 to achieve high fidelity for the
configured quantum computing3, and empirically the 2-
round distillation achieved the objective error rate in the
physical error rate 10−3 ∼ 10−5.
We determine the capacity of a magic state factory that
prepares and supplies |AL〉 states. The capacity depends
on a quantum algorithm and the durations of a state dis-
tillation and a logical T gate. Suppose that a logical T
gate is applied consecutively to a qubit. Then, a magic
state factory has to supply high fidelity magic states con-
tinuously. If a magic state factory generates only one
magic state at a time, there may happen a latency for the
supply of the magic states if the magic state distillation
takes more time than the duration of a logical T gate.
Therefore, the magic state factory has the capacity to
prepare at least max{parallelT }× time(MSD)/time(T )
states at a time where time(MSD) and time(T ) are the
durations of the magic state distillation and the logical
T gate protocol. Empirically, the time(MSD)/time(T )
is approximately 20 in our estimation.
To conclude, the required physical qubits for |AL〉 and
|YL〉 are respectively
max{parallelT } ·
time(MSD)
time(T )
·
(
15 ·QL)
r−1 · (16 ·QL),
(3)
and
max{parallelT, parallelS} ·
(
7 ·QL
)r−1
· (8 ·QL), (4)
where QL is the number of physical qubits to implement
a logical qubit and r is the required iterations. The last
distillation round requires one more logical qubit from
the Bell state [8]. Above this, the ancilla qubits to per-
form CNOT gates during the distillation protocol also
should be included.
3 1/#T gates
V. PERFORMANCE METRIC
We describe how to evaluate the quantum comput-
ing metrics, execution time, fidelity and the quantity of
qubits.
A. Execution Time
We examine the quantum computing time in two steps.
In the system mapping, we obtain the single round exe-
cution time Tone of a quantum algorithm. At the same
time, the fidelity Falg of a quantum computing can be
determined. Note that how to calculate the fidelity of a
quantum computing is described in the following section.
Since Tone is the time for running a quantum algorithm
once, and there is no guarantee about a reliable quan-
tum computing. Noisy components may make a quan-
tum computing broken. To overcome the problem, we
calculate the average execution time Tavg by reflecting
the number of the required iterations to achieve the fi-
delity 1 as
Tavg = Tone/Falg. (5)
We believe this averaged time shows the time required for
getting a reliable answer from a quantum computing4.
B. Fidelity
The fidelity of a quantum computing can be calcu-
lated based on the fidelity of logical quantum gates as
follows [41],
Falg =
∏
g
Fg
Ng , (6)
where g is a quantum gate utilized in the algorithm. Fg is
the fidelity of the gate g, and Ng is the total count of the
gate in the algorithm. The value Ng can be found from
the system mapping and Fg is determined in the building
block layer. By the way, this fidelity calculation is only
applicable to Steane code based quantum computing. As
shown in Section IVC2, the final fidelity of a surface code
based quantum computing is given by Falg = 1−KQ ·ǫL.
C. The Number of Physical Qubit
We examine the quantity of physical qubits required
to run a quantum algorithm. Since the quantity of the
4 This does not indicate that the output from a quantum comput-
ing is an exact solution. We do not consider the probabilistic
nature of a quantum algorithm.
13
required qubits differs according to a fault-tolerant quan-
tum computing scheme, we first identify the common fac-
tor, the qubits in a quantum algorithm, and then go in-
side specific cases later.
The proposed hierarchical quantum computer struc-
ture consists of multiple modules and a communication
bus connecting all modules. In the quantum assembly
code, we can find the quantity of logical (or physical)
qubits for a module.
Qcomp =
∑
M
(
QMlocal +Q
M
param
)
, (7)
where QMlocal (Q
M
param) is the number of local (parameter)
qubits of a (computing) module M .
1. Steane Code Quantum Computing
We consider the Steane code quantum computing. The
structure of a communication bus depends on the chosen
global layout over all modules. On the 1D global layout,
the number of the qubits can be simply calculated as
Qcomm = bandwidth× length, where length is obtained
as
length =
∑
M
Mwidth, (8)
whereMwidth is the width of a module, which is 1 for 1D
local layout in common and ⌊
√
QM⌋ for 2D local layout.
Note that QM = QMlocal +Q
M
param.
On the other hand, on the 2D global layout, the
number of qubits can be calculated as follows. Let us
suppose that the number of modules is |M |. Then,
⌊
√
|M |⌋× ⌊
√
|M |⌋-sized 2D layout is necessary. To keep
the shape of a module on the 2D layout, all modules
have the same size cells n×n, where n = ⌊
√
max{QM}⌋.
Then, the required logical qubits for the communication
bus is
Qcomm = 2 · bandwidth · n · A ·B +
(
n · A
)2
(9)
where A = ⌊
√
|M |⌋ − 1 and B = ⌊
√
|M |⌋. In this
work, we determine the bandwidth of a bus as the
maximum number of parameter qubits, bandwidth =
maxM{Q
M
param}.
So far, we have identified the number of logical qubits
required for a quantum computing. As we mentioned in
Section IVC1, a logical qubit in the concatenation levels
k = 1 ∼ l−1 is composed of 25 qubits and the qubit in the
level l consists of 30 qubits. According to a physical error
rate, we need to apply recursive encoding. Therefore,
the number of total physical qubit in the Steane code
quantum computing is
Qsteane = 25
r−1 · 30 ·Qcomp + 25r ·Qcomm (10)
where r is the determined concatenation level.
p
a
th
 f
o
r 
F
T
 b
ra
id
in
g
 t
ra
n
s
fo
rm
a
ti
o
n
FIG. 13. A double Z-cut qubit of a code distance 3. The blue
dots indicate data qubits. One of the green chains indicates
a logical Z operator, and the red chain indicates a logical X
operator. Through the yellow line, it is possible to perform
a fault-tolerant braiding operation from other X-cut qubit to
this Z- cut qubit. Each defect has to be away from bound-
ary as much as 3 data qubits, and both defects have to be
separated 6 data qubits. 126 data qubits and 125 syndrome
qubits are required to implement a distance-3 logical qubit.
2. Surface Code Quantum Computing
We implement double defect based logical qubits. For
a double defect logical qubit with code distance d, each
defect has to be apart from a boundary as much as d
data qubits, and double defects also should be separated
as much as d data qubits. On the other hand, to perform
a braiding operation in a fault tolerant manner, the space
between double defects has to be at least 2d+⌊d/4⌋ rather
than only d. Therefore, to implement a double defect
logical qubit of code distance d, (2A+1)(2B+1) physical
qubits are required where A =
(
2d − 2 + ⌊d/4⌋
)
and
B =
(
4d − 4 + 3⌊d/4⌋
)
. FIG. 13 shows a double defect
logical qubit of code distance 3. A total of 253 physical
qubits, 126 data qubits and 125 syndrome qubits, are
required.
Two neighboring logical qubits also have to be sep-
arated as much as ⌊d/4⌋ data qubits to keep the code
distance between both during the braid transformation.
In this regards, if N logical qubits are arranged on the
2-dimensional layout of nh × nw, we need
(
2
(
nwA+(nw−1)⌊d/4⌋
)
+1
)(
2
(
nhB+(nh−1)⌊d/4⌋
)
+1
)
(11)
qubits are necessary, where A and B are what we men-
tioned above.
We have to consider ancilla qubits required for a
CNOT gate. As mentioned before, the CNOT gate
between the same type logical qubits consists of three
CNOT gates between different logical qubits. For that
two ancilla qubits, X-cut qubit |gL〉 and Z-cut qubit |+L〉
are required. We allocate a pair of both ancilla qubits to
every module where a CNOT gate is performed. In that
case, the number of logical qubits for a module is the
sum over parameter qubits, local qubits and two ancilla
qubits.
In case of a surface code quantum computing, the com-
14
· · ·
· · ·
· · ·
d d
d
d
d
boundary
b
o
u
n
d
a
ryb
o
u
n
d
a
ry
boundary
module i module j module k
FIG. 14. A quantum computer structure based on the surface
code quantum computing. Dark green dots indicate defects
and yellow cells are used as a path for the braiding trans-
formations. Blue cells can be used for the forward/backward
qubit passings over distance modules. An enclosed section by
dotted red line is a computing region, a module. A defect has
to be away from the boundary of a logical qubit or a braiding
path as much as d data qubits, and two logical qubits are
mutually separated as much as ⌊d/4⌋ data qubits.
munication bus may be also necessary to efficiently per-
form forward/backward qubit passings5. By the way, un-
like the Steane code quantum computing that performs a
sequence of SWAP operations as much as the passing dis-
tance, the qubit movement in the surface code quantum
computing is much efficient. It can be achieved by only
performing multi-cell qubit movements [8, 41]. There-
fore, we assumed that the surface code quantum com-
puting performs the qubit passing in sequentially on the
bus of the narrow bandwidth. We set the bandwidth of
the bus as ⌊d/4⌋, and additionally the movement path
should be away from a boundary as much as d data
qubits. FIG. 14 shows the quantum computer architec-
ture based on a surface code and a structured quantum
assembly code, where all the modules are arranged on
the 1-dimensional layout by keeping the space as much
as ⌊d/4⌋ data qubits between both modules.
We conclude this section by repeating the physical
qubits for the magic state factory. The required phys-
ical qubits for |YL〉 are max{parallelT, parallelS} ×
(7QL)
r−1 × (8QL) and for |AL〉 are max{parallelT } ×
(15QL)
r−1 × (16QL)× time(MSD)/time(T ). Note that
5 It is possible to perform a fault-tolerant braiding between dis-
tant logical qubits in different modules. On considering that,
the qubit passings may not be required. By the way, a braiding
between distant logical qubits requires so many physical mea-
surements. Which makes a quantum computing unreliable. In
this regards, we believe that moving logical qubits to nearby loca-
tion (a target module) and performing a braiding between close
qubits is more reliable. In this regards, we also perform qubit
passing in the surface code quantum computing.
QL is the physical qubits to implement a logical double
defect qubit, and r is the iteration of the magic state
distillation.
VI. ANALYSIS OF PERFORMANCE AND
RESOURCE
We show the performance analysis of quantum com-
putings we configured. For that, we set the objective
fidelity of a quantum computing as 0.7. That is, a sin-
gle round quantum computing time Tone is the time of a
quantum computing that achieves a fidelity at least 0.7.
We set the error rate of physical operation for Steane code
quantum computing as 10−9, but for surface code quan-
tum computing we apply the physical error rate 10−3.
We also assume that the duration of a physical oper-
ation is 1 µs conservatively. This assumption may be
pessimistic than other literature assuming tens ∼ hun-
dreds nano seconds for a physical operation. The Shor
algorithm we test comes from the benchmark [16, 41].
A. Case of applying Compile
We show the performance changes by applying a quan-
tum compile, i.e., decomposition of a RZ(θ) gate for an
arbitrary angle θ. Even though such decomposition is
required to implement a fault-tolerant quantum comput-
ing, in this section we perform physical quantum com-
puting without error correction to see the effect of the
compile only. For that, the components of the other lay-
ers are completely fixed.
For the decomposition, we set the precision of the de-
composition as 10−2, which means that a decomposition
of RZ(θ) gate achieves the RZ(θ) operation with an er-
ror probability 10−2. Consequentially, both RZ(θ1) and
RZ(θ2) can be decomposed into the same sequence of H ,
S and T if |θ1−θ2| ≤ 0.01. Under such precision, a RZ(θ)
gate is usually decomposed into a sequence of 40 ∼ 50
H , S and T gates.6 Please note that the decomposition
algorithm in the compile works probabilistically.
FIG. 15 compares the performance. By decompos-
ing RZ(θ) gate, the quantum computing time increases
as much as 4 ∼ 5 times, but the number of physical
qubits stays equivalently. By the way, in general RZ(θ)
gate takes more than half of all quantum gates in Shor’s
factoring algorithm (see Table IV). On considering that
RZ(θ) gate is decomposed into a sequence of dozens of
6 If we set the precision degree with a smaller number, we will get
a longer sequence of H, S and T gates for a RZ(θ) gate. On the
one hand, such sequence can achieve the target RZ(θ) gate more
exactly. On the other hand, the quantum computing time will
be larger than the time shown in this work. Besides, practically
the duration to conduct the performance analysis also increase
nontrivially. In this regards, we have set 10−2 for the precision.
15
1.427×103
9.910×103
7.320×104
2.889×102
2.230×103
1.751×104
Decomposition of RZ(θ)
No Decomposition of RZ(θ)
T
im
e 
(s
ec
s)
103
104
105
Input Size
N 128 N 256 N 512
Quantum Computing with only Compile: Shor Algorithm
(a)
1.402×105
5.424×105
2.132×106
1.391×105
5.403×105
2.128×106
Decomposition of RZ(θ)
No Decomposition of RZ(θ)
Q
u
b
it
s
105
106
Input Size
N 128 N 256 N 512
Quantum Computing with only Compile: Shor Algorithm
(b)
FIG. 15. We show the quantum computing performance change by the compile effect. (a) Quantum computing time and (b)
Qubits.
TABLE IV. The proportion of RZ(θ) gate in Shor N = 128.
Input Size RZ(θ) Total Gates Proportion
128 2.036 × 109 3.399 × 109 59.90%
256 1.630 × 1010 2.719 × 1010 59.94%
512 1.304 × 1011 2.175 × 1011 59.95%
H , S and T gates as we mentioned above, readers may
guess that the performance difference between both cases
should be more larger than the shown in the figure.
As mentioned above, we have set the precision of the
decomposition as 10−2. Most θ in the Shor algorithm are
very small (< 0.01),7 and therefore the decomposition of
such rotation operation works as the identity operation.
We show the top dominant θ used in Shor N=128 al-
gorithm in Table V. All the angles are less than 0.01.
While we have not described all θ in the algorithm in
the table, empirically 75% of the angles applied in the
algorithm are less than 0.01. In this regards, the perfor-
mance degradation by decomposing RZ(θ) gates is not
so remarkable regardless of the quantity of RZ(θ) gates
in the algorithm.
B. Case of applying Compile and Error Correction
We show the performance of a fault-tolerant quantum
computing but without considering local qubit interac-
tion on a quantum computer architecture. We assume
that all qubits are directly interacted with an arbitrary
qubit regardless of its position, and therefore a commu-
nication bus is not required in the architecture. This
setting is to see the effect of quantum error correction.
For that, we only configured Steane code based quantum
7 θ = pi/2n−1 with n = 1 ∼ N for N-bit integer factoring.
TABLE V. List of top 10 dominant angles in Shor’s
factoring algorithm, N=128. The θ listed in this table is
less than 0.01 and therefore RZ(θ) works as an identity
operator. The rotational angle θ of the gate is from pi/2n−1
in Quantum Fourier Transform, and the exact represen-
tation of the angle is limited by a classical computer precision.
θ Count Proportion
0.000000 × 100 6.88× 108 0.3381
−0.000000 × 100 3.44× 108 0.1691
−5.000000 × 10−5 3.13× 107 0.0154
−1.000000 × 10−4 3.11× 107 0.0153
5.000000 × 10−5 3.10× 107 0.0152
−2.000000 × 10−4 3.09× 107 0.0152
1.000000 × 10−4 3.08× 107 0.0151
−4.000000 × 10−4 3.08× 107 0.0151
2.000000 × 10−4 3.07× 107 0.0151
−7.500000 × 10−4 3.06× 106 0.0150
computing because surface code computing inherently
takes 2-dimensional nearest neighbored qubit layout into
consideration. As mentioned above, for the fault-tolerant
quantum computing, we compile the quantum algorithm
by decomposing RZ(θ) gate into H , S and T gates.
FIG. 16 shows the quantum computing performance.
The increase of the execution time and the number of
qubits is very remarkable when the input size increases
from 128 to 256. This is because the required concate-
nation level increases from 1 to 2 there to satisfy the ob-
jective fidelity 0.7. But, as the concatenation level stays
when the input increases from 256 to 512, the increases
of a quantum computing time and the number of qubits
are rather modest.
Since in this section we assume a fault-tolerant quan-
tum computing but with non-local qubit interaction, the
performance change shown in the figure is only caused
by the fault-tolerant quantum computing protocol. For
example, in FIG. 16 (b), the numbers of qubits in the
Steane code quantum computing are bigger than physi-
16
2.541×105
2.334×108
1.725×109
3.093×105
2.336×108
1.732×109
1.427×103
9.910×103
7.320×104
Steane FTQC: Tone
Steane FTQC: Tavg
Physical: Tone
T
im
e 
(s
ec
s)
104
106
108
1010
Input Size
N 128 N 256 N 512
Non-Local Steane FTQC: Shor Algorithm
(a)
Steane FTQC
Physical
4.206×106
4.882×108
1.919×109
1.402×105
5.424×105
2.132×106Q
u
b
it
s
105
106
107
108
109
Input Size
N 128 N 256 N 512
Non-Local Steane FTQC: Shor Algorithm
(b)
FIG. 16. Quantum computing performance based on quantum error correction. For the fault-tolerant operation, we have
compiled a quantum algorithm by decomposing RZ(θ) gates into a sequence of H , S and T gates. In this evaluation, the
quantum computer architecture and local qubit interaction are not completely considered. (a) Quantum computing time and
(b) Qubits. The concatenation level for the input size 128 is 1, and 2 for the other cases.
cal computing as much as respectively 30, 900 and 900
times. Please recall that we have designed a logical qubit
by assembling 30 physical qubits.
C. Case of applying Compile, Error Correction and
System Architecture
In this section, we show the quantum computing per-
formance by considering all the realistic factors we have
described previously. We apply fault-tolerant quantum
computings based on certain quantum computer archi-
tectures where only local qubit interaction is permitted.
FIG. 17 shows the performance analysis of the Steane
code quantum computing. We have used the quantum
computer architectures of the layouts; (1D global, 1D lo-
cal), (1D global, 2D local) and (2D global, 2D local).
Please see FIG. 11 for the quantum computer architec-
tures. To see the influence of local qubit interaction, we
also compare the performance of the quantum comput-
ing based on non-local qubit interaction shown in the
previous section.
As shown in the figure, the performance degradation
by the local qubit interaction on a quantum computer
architecture is highly nontrivial. This is because many
modules are spreaded over the quantum computer, and
the communication (qubit passing) are performed fre-
quently. Table VI shows the proportion of SWAP gates
in the implementation of Shor algorithm. Surprisingly,
on the proposed quantum computer architecture with a
nearest neighbor qubit interaction, most of quantum op-
erations in the Steane code quantum computing are qubit
movements. We think the quantity of the qubit move-
ments is a temporal overhead to implement a quantum
algorithm on a quantum computer. Such large overhead
caused by the qubit movements can be reduced by im-
proving a quantum computer structure, a fault-tolerant
quantum computing scheme or a system mapping algo-
TABLE VI. The proportion of SWAP gate in Shor’s
factoring algorithm. The layout indicates a combination of
Global Layout and Local Layout.
Input Size Layout SWAP Total Gates Proportion
128
(1D, 1D) 7.371 × 1011 7.405 × 1011 99.54%
(1D, 2D) 1.262 × 1012 1.266 × 1012 99.73%
(2D, 2D) 1.527 × 1013 1.527 × 1013 99.97%
256
(1D, 1D) 1.116 × 1013 1.118 × 1013 99.76%
(1D, 2D) 1.856 × 1013 1.859 × 1013 99.85%
(2D, 2D) 2.719 × 1014 2.720 × 1014 99.99%
512
(1D, 1D) 1.068 × 1014 1.070 × 1014 99.80%
(1D, 2D) 1.811 × 1014 1.813 × 1014 99.88%
(2D, 2D) 5.789 × 1015 5.789 × 1015 99.99%
rithm.
The figure shows that a quantum computer architec-
ture of the 1D global layout provides the better perfor-
mance than a quantum computer of the 2D global layout.
However, it may not always be the case. It completely
depends on the number of modules in a quantum com-
puting program (see FIG. 7), and the arrangement of the
modules on the architecture. In general, the 2D global
layout is a better architecture in terms of a qubit com-
munication when the number of modules is very large.
On average, the arrangement of the modules on the 2D
global layout can reduce the distance between modules
than the 1D global layout. Therefore the communication
cost of the qubit passing is less than the 1D global layout.
As an example, FIG. 18 shows that a ground state esti-
mation algorithm [16, 19, 41] works better on a quantum
computer architecture with 2D global layout8.
8 In the benchmark program, the algorithm is composed of at least
tens of thousands modules, but the program of Shor algorithm
consists of thousands modules.
17
1.785×1013
2.785×1014
3.194×1015
1.761×1013
2.661×1014
2.709×1015
3.046×1014
4.035×1015
1.019×1017
3.093×105
2.336×108
1.732×109
1D Global & 1D Local
1D Global & 2D Local
2D Global & 2D Local
All-to-All
T
im
e 
(s
ec
s)
106
109
1012
1015
1018
Input Size
N 128 N 256 N 512
Local Steane FTQC: Shor Algorithm
(a)
5.643×1010
2.221×1011
7.602×1011
1.468×1011
7.190×1011
3.437×1012
1.471×1013
1.145×1014
7.495×1014
4.206×106
4.882×108
1.919×109
1D Global & 1D Local
1D Global & 2D Local
2D Global & 2D Local
All-to-All
Q
u
b
it
s
106
108
1010
1012
1014
1016
Input Size
N 128 N 256 N 512
Local Steane FTQC: Shor Algorithm
(b)
FIG. 17. Quantum computing performance of the Steane code based local fault-tolerant quantum computing. Too see the
influence by the architectural limitation, we also add the performance when the arbitrary long qubit interaction is allowed. (a)
Quantum computing time and (b) Qubits. All concatenation levels for the local qubit interaction cases (black, red and blue
lines) are 3 in common. But, in case of the non-local qubit interaction (green line), the concatenation level is 1 for the input
size 128 and 2 for the others. Please see FIG. 11 about the quantum computer architectures.
4.526×105
1.078×108
2.740×109
2.731×1010
1.233×106
3.833×108
1.102×1010
1.248×1011
1.425×105
1.803×107
3.138×108
2.434×109
1D Global & 1D Local
1D Global & 2D Local
2D Global & 2D Local
T
im
e 
(s
ec
s)
106
108
1010
Input Size
M 20 M 40 M 60 M 80
Noiseless and Local Gate: Ground State Estimation Algorithm
(a)
2.287×105
6.413×106
4.638×107
1.859×108
5.055×105
1.798×107
1.454×108
6.521×108
2.582×106
1.266×108
1.265×109
6.475×1091D Global & 1D Local Layout
1D Global & 2D Local Layout
2D Global & 2D Local Layout
Q
u
b
it
s
105
106
107
108
109
1010
Input Size
M 20 M 40 M 60 M 80
Noiseless and Local Gate: Ground State Estimation Algorithm
(b)
FIG. 18. The quantum computing performance of a ground state estimation algorithms over input size M = 20, 40, 60, 80.
Quantum gates are noiseless, and only nearest neighbor qubits are mutually interacted on the quantum computer architectures.
(a) Quantum computing time and (b) Qubits.
FIG. 19 shows the quantum computing performance
of the surface code quantum computing. The quantum
computer architecture for a surface code quantum com-
puting is shown in FIG. 14. In the error rate 10−3, as the
input size increases, the required code distance is raising
25, 27 and 30 to satisfy the objective fidelity. In the fig-
ure, we also show the quantity of physical qubits to run
a magic state factory that supplies |AL〉 states during
the quantum computing. As shown in the figure, in this
work, the capacity of a magic state factory stays almost
the same regardless of the input size of Shor algorithm.
It increases as much as the code distance.
In Ref. [8], the authors estimated the surface code
quantum computing execution time of Shor algorithm
for factoring a 2000-bit integer. They did the estima-
tion only by focusing on the quantity of the sequential
Toffoli gates in the modular exponentiation circuit. By
following their estimation method, the quantum comput-
ing time to factorize a 512-bit integer will be only 0.45
hours (40× 5123 × 3× 100 ns) regardless of physical er-
ror rate. However, as shown in the figure, our estimation
shows 8.78 × 105 hours are required for the same task
when the physical error rate is 10−3. The algorithm exe-
cution time, as shown in Section VII B, is reduced as the
physical error rate decreases.
We believe the reasons for such enormous gap between
both estimations are as follows. First, our analysis is
based on all quantum gates included in Shor algorithm
but their analysis focuses on the Toffoli gates only. In
our estimation, the proportion of transversal gates takes
about 60 % (# transversal gates/# total gates), in other
words, their estimation ignores the execution time by
such huge transversal gates and the involved error cor-
rection. Second, we have assumed that a physical gate
operates in 1 µs conservatively while their estimation is
18
5.602×107
4.428×108
3.163×109
Surface FTQC
Physical
1.427×103
9.910×103
7.320×104T
im
e 
(s
ec
s)
104
106
108
1010
Input Size
N 128 N 256 N 512
Surface FTQC: Shor Algorithm
(a)
4.743×109
1.990×1010
9.508×1010
2.429×108 2.781×10
8 3.481×108
1.402×105
5.424×105
2.132×106
Surface FTQC: Whole
Surface FTQC: MSF
Physical
Q
u
b
it
s
106
108
1010
Input Size
N 128 N 256 N 512
Surface FTQC: Shor Algorithm
(b)
FIG. 19. The quantum computing performance based on the surface code quantum computing. (a) Time and (b) Qubits.
The code distances are respectively 25, 27 and 30. In (b), we also show the required physical qubits for a magic state factory.
based on 100 ns measurement gates9. Third, while they
used an efficient decomposition of a Toffoli gate, we have
used the de facto standard decomposition [31]. Fourth,
they might assume that braiding operations for parallel
CNOT gates can be performed in parallel without any
path conflicts, but we applied a braiding operation at
a time to avoid conflicts between other braiding paths.
Please note that for Shor algorithm of N = m, m CNOT
gates can be performed in parallel in ideal case. Fifth, we
have considered architecture related issues such as qubit
passing over distant modules, but they do not. Lastly,
we have applied d round quantum error correction where
the code distance d is determined based on the physical
error rate, but they did not.
One of the reasons why a surface code has attracted
much attention is it requires relatively less overhead. In
what follows, we simply compare the Steane code and
surface code in light of quantum resource without con-
sidering their theoretical foundation. For the fair com-
parison, we assumed the error rate of physical device is
10−9 for both cases. FIG. 20 shows the quantum com-
puting time and qubits to run Shor algorithm. As we
mentioned before, the performance of the Steane code
quantum computing completely depends on a quantum
computer architecture. Therefore to focus on the differ-
ence in quantum resource only by a fault-tolerant quan-
tum computing, we also compare the situation where a
non-local qubit interaction is allowed. As shown in the
figure, in the small input size, the Steane code quantum
computing requires less time and qubits than the surface
code quantum computing. This is because of the non-
locality of multi-qubit operation. But, as the input size
increases, the surface code quantum computing shows
the better performance than the Steane code quantum
computing even non-local qubit interaction is allowed.
9 Their implementation of a Toffoli gate completes the operation
within three measurement operations.
VII. USABILITY OF THE PROPOSED
FRAMEWORK
The objective of the proposed framework is to help
to design and analyze a quantum computing. In this re-
gards, in this section, we show how to use it for analyzing
previously proposed high performance quantum comput-
ing methods with realistic quantum computer. The first
is an efficient compile (Section VII A), and the second is
an improved physical gate (Section VII B) and the last is
the strategy for the fault-tolerance (Section VII C).
A. Efficient Decomposition of Controlled-Rn
Authors proposed an efficient decomposition algorithm
for a controlled-Rn gate [23]. By hiring an ancilla qubit,
they achieved that the total number of quantum gates
{H,S, T } is reduced to from 35 (Ref. [24]) to 21. We
show how the proposed decomposition affects the execu-
tion time of Shor algorithm. Even though the proposed
compile algorithm itself requires more qubits, by reduc-
ing the algorithm execution time and increasing the fi-
delity of a quantum computing simultaneously, in total
less qubits are required.
FIG. 21 shows the performance improvement by the
efficient compile in the Steane code quantum computing.
At the input size N = 128, the improved decomposi-
tion lowers the quantum computing time as much as over
400-fold and the qubits as much as 30-fold. The degree
of the performance improvement depends on the input
size. As shown in the figure, at the input size where
the required concatenation level lowers by applying the
proposed decomposition, the performance improvement
is remarkable.
FIG. 22 shows the performance improvement by the ef-
ficient compile in the surface code quantum computing.
Unlike the Steane code quantum computing, the perfor-
mance improvement in the quantum computing time in-
19
4.712×106
3.152×107
2.611×108
1.785×1013
2.785×1014
3.194×1015
3.093×105
2.336×108
1.732×109
Surface FTQC
Steane FTQC (1D Global, 2D Local)
Steane FTQC (All-to-All)
T
im
e 
(s
ec
s)
104
106
108
1010
1012
1014
1016
Input Size
N 128 N 256 N 512
Resource Compare: Steane FTQC and Surface FTQC
(a)
7.574×107
2.901×108
1.112×109
5.643×1010
2.221×1011
7.602×1011
4.206×106
4.882×108
1.919×109
Surface FTQC
Steane FTQC (1D Global, 2D Local)
Steane FTQC (All-to-All)
Q
u
b
it
s
106
108
1010
1012
Input Size
N 128 N 256 N 512
Resource Compare: Steane FTQC and Surface FTQC
(b)
FIG. 20. We simply compare the quantum resource in the Steane code quantum computing and the surface code quantum
computing. The physical error rate is 10−9. The quantum computer architecture for the Steane code quantum computing is
the 1D global layout and the 2D local layout because as shown in FIG. 17 the architecture shows the best performance in this
work. We also compare the Steane code quantum computing with non-local qubit interaction because the performance of the
Steane code quantum computing significantly depends on the quantum computer architecture.
1.761×1013
2.661×1014
2.709×1015
3.972×1010
9.230×1013
9.621×1014
Naive Controlled-Rn
Efficient Controlled-Rn
T
im
e 
(s
ec
s)
1010
1011
1012
1013
1014
1015
1016
Input Size
N 128 N 256 N 512
Efficient Controlled-Rn: Steane FTQC
(a)
Naive Controlled-Rn
Efficient Controlled-Rn
1.468×1011
7.190×1011
3.437×1012
 4.868×109
 7.138×1011
 3.424×1012
Q
u
b
it
s
1010
1011
1012
Input Size
N 128 N 256 N 512
Efficient Controlled-Rn: Steane FTQC
(b)
FIG. 21. The performance comparison between a naive compile and the proposed compile under Steane code based quantum
computing. (a) The algorithm execution time. (b) The required qubits.
creases gradually as the input size increases. This is be-
cause there always exists the difference in the code dis-
tance. By the improved decomposition, the required code
distance lowers to 22, 24, 27 from 25, 27, 30 respectively.
B. Accurate Quantum Gates
Previously, we basically assumed that physical error
rate is 10−9 for Steane code quantum computing and
10−3 for surface code quantum computing. In this sec-
tion, we show what happens in a quantum computing if
we have more accurate quantum device. For that we show
the performance evaluations based on the physical error
rates 10−9 ∼ 10−15 for Steane code quantum computing
and based on the physical error rates over 10−3 ∼ 10−9.
FIG. 23 shows the Steane code quantum computing
performance over physical error rates 10−9 ∼ 10−15. We
also compare a physical quantum computing to those
fault-tolerant quantum computings at the physical er-
ror rate 10−15. The performance improvement by low-
ering the error rate from 10−9 to 10−12 is highly non-
trivial because the required concatenation level is re-
duced from 2 and 3 to 1 in both case respectively. But,
lowering the error rate more does not lead to the bet-
ter fault-tolerant quantum computing performance. In
other words, the fault-tolerant quantum computing in
the physical error rate 10−15 does not show any advan-
tage against the quantum computing in the physical er-
ror rate 10−12. This is because as the physical error
rate lowers the fault-tolerant quantum computings with
the same concatenation level achieves very high fidelity
(> 0.9). If both quantum computings are performed with
the same concatenation level, both have the same single
round quantum computing time. In that case, if there is
no big difference between fidelities, the average quantum
computing Tavg will be very similar.
In the same reason, in the physical error rate 10−15,
20
5.602×107
4.428×108
3.163×109
1.151×107
5.991×107
2.546×108
Naive Controlled-Rn
Efficient Controlled-Rn
T
im
e 
(s
ec
s)
107
108
109
Input Size
N 128 N 256 N 512
Efficient Controlled-Rn: Surface FTQC
(a)
4.743×109
1.990×1010
9.508×1010
3.624×109
1.543×1010
7.575×1010
Naive Controlled-Rn
Efficient Controlled-Rn
Q
u
b
it
s
1010
1011
Input Size
N 128 N 256 N 512
Efficient Controlled-Rn: Surface FTQC
(b)
FIG. 22. The performance comparison between a naive compile and the proposed compile under surface code based quantum
computing. (a) The algorithm execution time and (b) The required qubits.
physical quantum computing shows the better perfor-
mance than a fault-tolerant quantum computing because
the physical quantum computing already achieves high
fidelity. Empirically, Tone of the physical quantum com-
puting in the error rate 10−15 is 6.89 × 108 with the
fidelity 0.6433. On the other hand, Tone of the fault-
tolerant quantum computing is 7.78 × 1010 with the fi-
delity 0.9999. Obviously, physical quantum computing
shows the better performance in terms of Tavg,
6.89×108
0.6433 >
7.78×1010
0.9999 .
FIG. 24 shows the performance improvement in the
surface code quantum computing over physical error
rates 10−3 ∼ 10−9. In the figure, we also compare the
required code distance. As shown in the figure, as the
physical error rates lowers, the required code distance
decreases and therefore the performance increases. But,
since the code distance is already too low, 4 or 5, there
is not enough room for the performance improvement as
the gate is improved more.
C. Degree of Fault Tolerance
Accuracy threshold theorem [2, 25] says that if we have
quantum device of physical error rates below a threshold,
it is possible to achieve an arbitrary long quantum com-
putation is possible. By applying a recursive concate-
nated coding [26], we can lower the effective error rate to
where a reliable quantum computing is possible. As we
increase the concatenation level, the fidelity of a quan-
tum computing also definitely improves. But, the dura-
tion of a quantum computing is also increased by raising
a concatenation level. Therefore, the higher concatena-
tion level does not always make the better quantum com-
puting possible. FIG. 25 shows that there exists a rec-
ommendation for the concatenation level in the Steane
code quantum computing, in particular for a quantum
computing time. The number of qubits unconditionally
becomes larger as the concatenation level increases.
In case of a surface code quantum computing, the per-
formance completely depends on the code distance. The
code distance is determined to satisfy the objective fi-
delity of a quantum computing, but in most case the
accuracy of the quantum computing by the chosen code
distance exceeds than the target fidelity. In this regards,
on considering the averaged quantum computing time
Tavg, the chosen code distance may not bring the best
quantum computing performance as shown in the Steane
code case.
FIG. 26 shows that the surface code quantum comput-
ing has the best performance with a code distance 31,
but the code distance determined by the equation is 30.
Even though the code distance determined from the tar-
get fidelity 0.7 is 30, the goal of a quantum computing is
to find an exact answer, not a probable answer. By ap-
plying the code distance 31, we can reduce the quantum
computing time as much as 1400 days than the case of
the code distance 30 at the cost of qubits.
VIII. DISCUSSION
We have proposed an integrated method for analyzing
the performance and the resource of a quantum comput-
ing. In particular, by considering practically running a
quantum algorithm on a quantum computer hardware
of a specific system architecture, we have obtained the
most realistic performance and resource where the ef-
fects by all of fully decomposed algorithm, fault-tolerant
scheme and system architecture are involved. For that,
we have proposed and developed a quantum comput-
ing framework composed of three functional layers where
each layer plays a definite role of a quantum comput-
ing. By exploiting the framework, we can configure a
quantum computing model by selecting specific proto-
col and/or properties. By doing so, we can analyze not
only the performance and resource of a quantum com-
puting, but also the impact of specific components on
the entire quantum computing. For example, we have
21
1.761×1013
2.661×1014
2.709×1015
5.062×108
7.673×109
8.063×1010
5.061×108
7.646×109
7.780×1010
pe: 10
-9
pe: 10
-12
pe: 10
-15 (logical)
pe: 10
-15 (physical)
4.462×106
7.061×107
1.071×109
T
im
e 
(s
ec
s)
106
108
1010
1012
1014
1016
Input Size
N 128 N 256 N 512
Accurate Quantum Gates: Steane FTQC
(a)
1.468×1011
7.190×1011
3.437×1012
1.632×108
7.988×108
3.819×109
1.632×108
7.988×108
3.819×109
pe: 10
-9
pe: 10
-12
pe: 10
-15 (logical)
pe: 10
-15 (physical)
5.438×106
2.663×107
1.273×108
Q
u
b
it
s
106
108
1010
1012
Input Size
N 128 N 256 N 512
Accurate Quantum Gates: Steane FTQC
(b)
FIG. 23. The performance comparison over physical error rates 10−9 ∼ 10−15 under Steane code based quantum computing.
(a) The algorithm execution time and (b) The required qubits. At the error rate 10−15, as shown in this figure, a fault-tolerant
quantum computing is not required.
discussed optimal concatenation level and code distance
of fault-tolerant quantum computing. We believe such
discussion was possible due to the proposed framework.
The analysis results completely depend on the proto-
cols and properties of physical device we employed. As
shown in the figures, the quantity of the required qubits
is too enormous and the execution time is too long. The
feasibility of a quantum computing seems too bad from
our analysis results. However, we need to emphasize that
the very those figures are not so important now. Readers
need to see the tendency of the analysis results as the in-
put size of a quantum algorithm increases. As quantum
computing components are being improved, the analysis
results will be better than the shown in this work, but
the tendency may be stayed.
As mentioned above, the objective of the present work
is to provide the most realistic performance and resource
of a quantum computing. On the other hand, we believe
the proposed software framework can play a significant
role in practically running a quantum computing with a
real quantum computing hardware later if some compo-
nents are added (see FIG. 1). For example, components
to control a real quantum device have to be added in the
building block layer. The system layer also requires func-
tions that execute a quantum algorithm and a quantum
error correction efficiently. Besides, the existing compo-
nents have to be improved as much as possible.
ACKNOWLEDGMENTS
This work was supported by Electronics and Telecom-
munications Research Institute (ETRI) grant funded by
the Korean government [18ZH1400, Research and Devel-
opment of Quantum Computing Platform and its Cost-
Effectiveness Improvement].
[1] Quantum Computing — Intel Newsroom.
https://newsroom.intel.com/press-kits/quantum-computing.
[2] Dorit Aharonov and Michael Ben-Or. Fault-Tolerant
Quantum Computation with Constant Error Rate. SIAM
Journal on Computing, 38(4):1–241, July 2008.
[3] H Bomb´ın and M A Martin-Delgado. Optimal resources
for topological two-dimensional stabilizer codes: Com-
parative study. Physical Review A, 76(1):012305, July
2007.
[4] He´ctor Bomb´ın. Single-Shot Fault-Tolerant Quantum Er-
ror Correction. Physical Review X, 5(3):031043, Septem-
ber 2015.
[5] Andrew W Cross, Lev S Bishop, John A Smolin, and
Jay M Gambetta. Open Quantum Assembly Language.
https://arxiv.org/abs/1707.03429.
[6] Austin G Fowler. Low-overhead surface code logical
hadamard. Quantum Information and Computation,
12(11&12):970–982, August 2012.
[7] Austin G Fowler, Simon J Devitt, and Cody Jones. Sur-
face code implementation of block code state distillation.
Scientific Reports, 3(1):022316–6, June 2013.
[8] Austin G Fowler, Matteo Mariantoni, John M Martinis,
and Andrew N Cleland. Surface codes: Towards practical
large-scale quantum computation. Physical Review A,
86(3):032324, September 2012.
[9] Austin G Fowler, Ashley M Stephens, and Peter
Groszkowski. High-threshold universal quantum com-
putation on the surface code. Physical Review A,
80(5):052312, November 2009.
[10] Hayato Goto. Minimizing resource overheads for fault-
tolerant preparation of encoded states of the Steane code.
Scientific Reports, 5:1–7, January 2016.
[11] Markus Grassl, Brandon Langenberg, Martin Roet-
teler, and Rainer Steinwandt. Applying Grover’s
22
25
27
30
9
10
11
5
6 6
4 4 4
pe: 10
-3
pe: 10
-5
pe: 10
-7
pe: 10
-9
C
o
d
e 
D
is
ta
n
ce
10
20
30
Input Size
N 128 N 256 N 512
Accurate Quantum Gates: Surface FTQC
(a)
5.602×107
4.428×108
3.163×109
1.043×107
7.638×107
5.955×108
6.109×106
4.613×107
3.631×108
4.712×106
3.152×107
2.611×108
pe: 10
-3
pe: 10
-5
pe: 10
-7
pe: 10
-9
T
im
e 
(s
ec
s)
107
108
109
Input Size
N 128 N 256 N 512
Accurate Quantum Gates: Surface FTQC
(b)
4.743×109
1.990×1010
9.508×1010
5.676×108
2.596×109
1.170×1010
1.688×108
8.835×108
3.392×109
pe: 10
-3
pe: 10
-5
pe: 10
-7
pe: 10
-9
7.574×107
2.901×108
1.112×109
Q
u
b
it
s
108
109
1010
1011
Input Size
N 128 N 256 N 512
Accurate Quantum Gates: Surface FTQC
(c)
FIG. 24. The performance comparison over physical error
rates 10−3 ∼ 10−9 under surface code based quantum com-
puting. (a) The code distance, (b) The algorithm execution
time and (c) The required qubits.
algorithm to AES: quantum resource estimates.
https://arxiv.org/abs/1707.03429
[12] Alexander S Green, Peter LeFanu Lumsdaine, Neil J
Ross, Peter Selinger, and Benoit Valiron. An Introduc-
tion to Quantum Programming in Quipper. In Reversible
Computation, pages 1–15, May 2013.
[13] Alexander S Green, Peter LeFanu Lumsdaine, Neil J
Ross, Peter Selinger, and Benoit Valiron. Quipper. In
the 34th ACM SIGPLAN conference, pages 333–342, New
York, New York, USA, 2013. ACM Press.
[14] Alexander S Green, Peter LeFanu Lumsdaine, Neil J
3.616×1044
2.709×1015
5.082×1017
9.495×1019
7.780×107
1.450×1013
2.708×1015
5.082×1017
9.495×1019
Average Computing Time, Tavg
Single Round Computing Time, Tone
T
im
e 
(s
ec
s)
1010
1020
1030
1040
1050
Concatenation Level
1 2 3 4 5
Degree of Fault Tolerance: Steane FTQC
FIG. 25. The quantum computing time of Shor algorithm
with input size N = 512. We have evaluated the quantum
computing performance Tone and Tavg according to the con-
catenation levels 1 ∼ 5. After the concatenation level 3, the
fidelity of a quantum computing is almost 1 and therefore
the average computing time closely approaches to the single
round computing time. When, the concatenation level is 1,
the fidelity of a quantum computing is almost vanishing and
therefore the average computing goes to almost infinity.
4.277×109
3.163×109
3.042×109
3.084×109
3.165×109
3.256×109
3.350×109
2.780×109
2.875×109
2.970×109
3.065×109
3.160×109
3.255×109
3.350×109
Average Computing Time, Tavg
Single Round Computing Time, Tone
T
im
e 
(s
ec
s)
3.0×109
3.5×109
4.0×109
4.5×109
Code Distance
29 30 31 32 33 34 35
Degree of Fault Tolerance: Surface FTQC
FIG. 26. The quantum computing time of Shor algorithm
with input size N = 512. We have evaluated the quantum
computing performance Tone and Tavg by varying the code
distance from 29 to 35. The calculated code distance for
the objective fidelity is 30. As shown in the figure, the code
distance 31 introduces the best quantum computing perfor-
mance. By taking the code distance 31, we can reduce the
quantum computing time as much as 1400 days than the case
of the distance 30 at the cost of qubits.
Ross, Peter Selinger, and Benoit Valiron. Quipper: A
Scalable Quantum Programming Language. In the th
ACM SIGPLAN Conference on Programming Language
Design and Implementation, pages 333–342, April 2013.
[15] IBM Quantum Experience.
https://quantumexperience.ng.bluemix.net/qx
[16] ScaffCC. https://github.com/ScaffCC/ScaffCC
[17] Ali JavadiAbhari, Arvin Faruque, Mohammad Javad
Dousti, Lukas Svec, Oana Catu, Amlan Chakrabati,
Chen-Fu Chiang, Seth Vanderwilt, John Black, Fred
Chong, Margaret Martonosi, Martin Suchara, Ken
23
Brown, Massoud Pedram, and Todd Brun. Scaffold:
Quantum Programming Language. Technical Report
TR-934-12, June 2012.
[18] Ali JavadiAbhari, Shruti Patil, Daniel Kudrow, Jeff
Heckey, Alexey Lvov, Frederic T Chong, and Margaret
Martonosi. ScaffCC: a framework for compilation and
analysis of quantum computing programs. a framework
for compilation and analysis of quantum computing pro-
grams. ACM, New York, New York, USA, May 2014.
[19] Ali JavadiAbhari, Shruti Patil, Daniel Kudrow, Jeff
Heckey, Alexey Lvov, Frederic T Chong, and Margaret
Martonosi. ScaffCC: Scalable compilation and analysis
of quantum programs. Parallel Computing, 45(C):2–17,
June 2015.
[20] Cody Jones. Multilevel distillation of magic states for
quantum computing. Physical Review A, 87(4):042305,
April 2013.
[21] N Cody Jones, Rodney Van Meter, Austin G Fowler, Pe-
ter L McMahon, Jungsang Kim, Thaddeus D Ladd, and
Yoshihisa Yamamoto. Layered Architecture for Quantum
Computing. Physical Review X, 2(3):031007, July 2012.
[22] Julian Kelly. Google AI Blog.
https://ai.googleblog.com, March 2018.
[23] Taewan Kim and Byung-Soo Choi. Efficient decompo-
sition methods for controlled-Rn using a single ancillary
qubit. Scientific Reports, pages 1–7, March 2018.
[24] Vadym Kliuchnikov, Dmitri Maslov, and Michele Mosca.
Fast and efficient exact synthesis of single qubit unitaries
generated by Clifford and T gates. Quantum Information
and Computation, 13(7, 8):607–630, March 2013.
[25] E Knill, Raymond Laflamme, and W Zurek.
Threshold Accuracy for Quantum Computation.
https://arxiv.org/abs/quant-ph/9610011
[26] Emanuel Knill and Raymond Laflamme.
Concatenated Quantum Codes.
https://arxiv.org/abs/quant-ph/9610011
[27] Chia-Chun Lin, Amlan Chakrabarti, and Niraj K Jha.
FTQLS: Fault-Tolerant Quantum Logic Synthesis. IEEE
Transactions on Very Large Scale Integration (VLSI)
Systems, 22(6):1350–1363, April 2014.
[28] Microsoft. Quantum Development Kit.
https://www.microsoft.com/en-us/quantum/development-kit
[29] Sreraman Muralidharan, Linshu Li, Jungsang Kim, Nor-
bert Lu¨tkenhaus, Mikhail D Lukin, and Liang Jiang. Op-
timal architectures for long distance quantum communi-
cation. Scientific Reports, pages 1–10, January 2016.
[30] John Napp and John Preskill. Optimal Bacon-
Shor Codes. Quantum Information and Computation,
13(5&6):490–510, September 2012.
[31] Michael A Nielsen and Isaac L Chuang. Quantum Com-
putation and Quantum Information. Cambridge Univer-
sity Press, Cambridge, January 2000.
[32] Quantum Computing Report. Quan-
tum Computing Startup Companies.
https://quantumcomputingreport.com
[33] Robert Raussendorf and Jim Harrington. Fault-Tolerant
Quantum Computation with High Threshold in Two Di-
mensions. Physical Review Letters, 98(19):190504, May
2007.
[34] Markus Reiher, Nathan Wiebe, Krysta M Svore, Dave
Wecker, and Matthias Troyer. Elucidating reaction mech-
anisms on quantum computers. Proceedings of the Na-
tional Academy of Sciences, 114(29):201619152–6, July
2017.
[35] Neil J Ross and Peter Selinger. Optimal ancilla-free Clif-
ford+T approximation of z-rotations. Quantum Informa-
tion and Computation, 16(11&12):901–953, June 2016.
[36] Peter W Shor. Fault-Tolerant Quantum Computation. In
FOCS Proceedings of the th Annual Symposium on Foun-
dations of Computer Science, page 56. IOP Publishing,
July 1996.
[37] Jonathan M Smith, Neil J Ross, Peter Selinger,
and Benoit Valiron. Quipper: Concrete Re-
source Estimation in Quantum Algorithms.
https://arxiv.org/abs/1707.03429
[38] Andrew Steane. Multiple-particle interference and quan-
tum error correction. Proceedings of the Royal Soci-
ety A: Mathematical, Physical and Engineering Sciences,
452(1954):2551–2577, November 1996.
[39] Andrew M Steane. Overhead and noise threshold of fault-
tolerant quantum error correction. Physical Review A,
68(4):042322, October 2003.
[40] Damian S Steiger, Thomas Haner, and Matthias Troyer.
ProjectQ: An Open Source Software Framework for
Quantum Computing. Quantum, 2:49, January 2018.
[41] Martin Suchara, Arvin Faruque, Ching-Yi Lai, Gerardo
Paz, Frederic Chong, and John D Kubiatowicz. Esti-
mating the Resources for Quantum Computation with
the QuRE Toolbox. Technical Report UCB/EECS-2013-
119, EECS Department, University of California, Berke-
ley, May 2013.
[42] Martin Suchara, John Kubiatowicz, Arvin Faruque,
Frederic T Chong, Ching-Yi Lai, and Gerardo Paz.
QuRE: The Quantum Resource Estimator toolbox. In
2013 IEEE 31st International Conference on Computer
Design (ICCD), pages 419–426. IEEE, October 2013.
[43] K M Svore, A V Aho, A W Cross, I Chuang, and I L
Markov. A layered software architecture for quantum
computing design tools. Computer, 39(1):74–83, January
2006.
[44] Krysta M Svore, David P DiVincenzo, and Barbara M
Terhal. Noise Threshold for a Fault-Tolerant Two-
Dimensional Lattice Architecture. Quantum Information
and Computation, 7(4):297–318, April 2007.
[45] Rodney Van Meter, Thaddeus D Ladd, Austin G Fowler,
and Yoshihisa Yamamoto. Distributed Quantum Com-
putation Architecture using semiconductor nanophoton-
ics. International Journal of Quantum Chemistry, pages
1–29, September 2009.
[46] D S Wang, Austin G Fowler, A M Stephens, and Lloyd
C L Hollenberg. Threshold error rates for the toric and
planar codes. Quantum Information and Computation,
10(5&6):456–469, May 2010.
[47] Y S Weinstein. How often must we apply syndrome mea-
surements? In Eric Donkor, Andrew R Pirich, and
Michael Hayduk, editors, SPIE Sensing Technology +
Applications, pages 95000Q–7. SPIE, May 2015.
[48] Yaakov S Weinstein and Sidney D Buchbinder. Use of
Shor states for the [7,1,3] quantum error-correcting code.
Physical Review A, 86(5):052336, November 2012.
[49] Mark G Whitney, Nemanja Isailovic, Yatish Patel, and
John Kubiatowicz. A fault tolerant, area efficient archi-
tecture for Shor’s factoring algorithm. In the 36th annual
international symposium, pages 383–12, New York, New
York, USA, 2009. ACM Press.
[50] Mingsheng Ying, Shenggang Ying, and Xiaodi Wu. In-
variants of quantum programs: characterisations and
generation. In the 44th ACM SIGPLAN Symposium,
24
pages 818–832, New York, New York, USA, 2017. ACM Press.
