High-Performance Computing with Quantum Processing Units by Britt, Keith A. & Humble, Travis S.
High-Performance Computing with Quantum Processing Units
KEITH A. BRITT
University of Tennessee and Oak Ridge National Laboratory∗
TRAVIS S. HUMBLE
Oak Ridge National Laboratory and University of Tennessee†
(Dated: 13 November 2015)
The prospects of quantum computing have driven efforts to realize fully functional quantum
processing units (QPUs). Recent success in developing proof-of-principle QPUs has prompted the
question of how to integrate these emerging processors into modern high-performance computing
(HPC) systems. We examine how QPUs can be integrated into current and future HPC system
architectures by accounting for functional and physical design requirements. We identify two in-
tegration pathways that are differentiated by infrastructure constraints on the QPU and the use
cases expected for the HPC system. This includes a tight integration that assumes infrastructure
bottlenecks can be overcome as well as a loose integration that assumes they cannot. We find that
the performance of both approaches is likely to depend on the quantum interconnect that serves
to entangle multiple QPUs. We also identify several challenges in assessing QPU performance for
HPC, and we consider new metrics that capture the interplay between system architecture and the
quantum parallelism underlying computational performance.
I. INTRODUCTION
High-performance computing (HPC) has historically
taken advantage of new processing paradigms by lever-
aging special purpose accelerators. This includes the
use of algorithmic logic units and floating-point units
in early processor architectures as well as more recent
vector processors capable of single-instruction multiple-
data (SIMD) parallelization. Current interest in graph-
ical processing units (GPUs) is another example of the
ongoing trend in accelerator use for HPC development.
A primary motivation for the accelerator paradigm is
that low-level processes can take advantage of special-
ized hardware while minimizing changes to overall pro-
gram structure [1]. This approach isolates the need for
program or algorithm refactoring to those workloads spe-
cific to the hardware accelerator [2]. A secondary motiva-
tion is that the accelerator model offers an opportunity to
take advantage of emerging technologies while also miti-
gating the technical risk to system development. Current
limitations on processor frequency, communication band-
width, physical scale, energy consumption, and hardware
reliability make it advantageous for HPC designers to an-
ticipate new technologies and to leverage architectures
∗Electronic address: keithbritt@utk.edu
†Electronic address: humblets.ornl.gov; This manuscript has
been authored by UT-Battelle, LLC, under Contract No. DE-
AC0500OR22725 with the U.S. Department of Energy. The United
States Government retains and the publisher, by accepting the ar-
ticle for publication, acknowledges that the United States Govern-
ment retains a non-exclusive, paid-up, irrevocable, world-wide li-
cense to publish or reproduce the published form of this manuscript,
or allow others to do so, for the United States Government pur-
poses. The Department of Energy will provide public access to
these results of federally sponsored research in accordance with the
DOE Public Access Plan.
that support innovative platforms. Future efforts to re-
alize HPC beyond the current exascale target are likely
to require such innovations [3–5].
The search for technology paths that lead to perfor-
mance beyond exascale may require alternative computa-
tional models. This is because some problems are better
suited to computational models other than the standard
Turing machine model. In particular, quantum comput-
ing has attracted significant interest due to theoretical
results that exponential reductions in algorithmic com-
plexity of some problems are possible relative to the best
known conventional algorithms [6]. This includes the fac-
torization of integers, a staple of public-key cryptogra-
phy [7]; ab initio calculations of electronic structure in
chemistry and physics [8]; and scattering amplitudes of
particles in high-energy physics [9]. These algorithmic
speedups are achieved by leveraging the unique features
of quantum mechanics, namely: superposition, entangle-
ment, and intrinsic randomness. The basic principles of
quantum computing have been demonstrated in small-
scale experimental systems and there is an on-going,
global effort to develop large-scale quantum computing
platforms.
The opportunities afforded by quantum computing
represent a challenge to the HPC accelerator model,
which previously has been practiced exclusively within
the setting of the classical, deterministic Turing model.
By contrast, many quantum algorithms lack clearly de-
fined kernels that can be off-loaded to a quantum compu-
tational accelerator. The mixture of computational mod-
els also stymies efforts to leverage conventional notions
of SIMD and multiple-instruction multiple-data (MIMD)
parallelism. Parallel computing typically makes use of
domain decomposition [10] whereas quantum algorithms
frequently intentionally avoid this type of problem parti-
tioning [11, 12]. In addition, domain decomposition ex-
poses an interface between classical and quantum compu-
ar
X
iv
:1
51
1.
04
38
6v
1 
 [c
s.E
T]
  1
3 N
ov
 20
15
2tational models that is not yet well defined. Translation
between computational models may be theoretically pos-
sible but making these interfaces efficient and robust is
an outstanding concern.
As constraints on the physics underlying quantum
computation limit how these resources may be used, it is
unclear if and how emerging quantum processors will be-
come compatible with existing or future HPC platforms.
We analyze the integration of these quantum processing
units (QPUs) for modern HPC architectures. We place
an emphasis on conceptual differences between the con-
ventional and quantum computing models that may be
expected to challenge integration. Our analysis examines
two pathways that lead to several abstract machine archi-
tectures. We identify those distinguishing features that
QPUs may be expected to exhibit and the dimensions
that will be most useful for characterizing their perfor-
mance metrics within a hybrid system.
The paper is organized as follows. In Sec. II, we charac-
terize the features of a QPU, briefly summarize its oper-
ating principles, and identify requirements for operation
that influence HPC integration. In Sec. III, we examine
three multi-processing models for adopting QPUs into
conventional HPC systems and the architectures that
arise from them. In Sec. IV, we describe the need for
both standardized as well as unique performance met-
rics to characterize HPC with quantum accelerators. We
offer final remarks in Sec. V.
II. QUANTUM PROCESSING UNIT (QPU)
We define a quantum processing unit (QPU) to be a
computational unit that uses quantum computing prin-
ciples to perform a task. As the operating principles of a
QPU are based on quantum mechanics, there are several
unique features that do not have analogs in conventional
computing platforms. Foremost, a QPU stores computa-
tional states in the form of a quantum mechanical state.
While a quantum state is formally defined as a unit vector
in a finite-dimensional Hilbert space [13], it must also be
interpreted as the data processed by the QPU. The sim-
plest and most frequently used example is a qubit, which
expresses a state within a two-dimensional Hilbert space.
These states are stored in quantum physical systems. For
the purpose of clarity, we define a quantum register as an
addressable array of two-level quantum physical systems.
We will refer to an individual system within the register
as a quantum register element and we will assume that
each register element can store a qubit of information.
We may refer to the size of the register by the number of
qubits that it can store, e.g., an n-qubit register.
The computational space available to a quantum reg-
ister scales exponentially with its size. Like an n-bit reg-
ister, an n-qubit register is capable of representing all
2n computational states. However, the quantum regis-
ter is also capable of representing superpositions of these
states, a phenomenon known as entanglement. Funda-
mentally, entanglement is a limitation on the ability to
describe states of a register solely by specifying the value
of each register element. This is in stark contrast to
classical models of computation and leads to a descrip-
tion of what has been called the ‘inherent parallelism’ of
quantum computing [14]. This inseparability of register
states manifests as perfect correlations during computa-
tion, and many quantum algorithms take advantage of
entanglement to realize computational speed ups [15].
Operations on the quantum register are realized us-
ing gates. Like conventional computing, quantum gates
correspond with well-defined transformations of the com-
putational state. When the register is prepared in a su-
perposition state, operations effectively act on multiple
computational states in parallel. This may be viewed
as a quantum variant of conventional SIMD processing.
However, quantum computing makes use of gates that
are either unitary transforms of the register elements or
projective measurements. Only the latter gates prepare
the state of the quantum register in a well-defined clas-
sical value, e.g., either a 0 or 1 for each register element.
For a unitary gate, the value of the register remains in
a superposition of computational states and serves as an
intermediate computation. Ultimately, the solution to
a computation is recovered when a projective measure-
ment gate is applied to the register. The resulting bit
string must then be stored in a classical register within
the QPU.
Quantum computational models define how the regis-
ters and gates within a QPU realize quantum computa-
tion. While all the models offer identical computational
power from a complexity perspective, they do differ with
respect to hardware implementations and principles of
operation. For example, the quantum circuit model is
closely related to the conventional representations of clas-
sical circuits, as it uses sequences of discrete gates acting
on registers to generate a series of computational states.
By contrast, the adiabatic quantum computing model
uses a continuous, time-dependent transformation of the
interactions between register elements to evolve the com-
putational state toward a solution. For example, recent
special-purpose processors within the adiabatic quantum
computing model implement quantum optimization us-
ing a single instruction that is tunable in duration [16].
Across all computational models, QPUs require precise
control over the quantum physical degrees of freedom
defining register elements. There is an ongoing effort to
demonstrate proof-of-principle registers and gates within
a variety of physical systems, including silicon donor sys-
tems [17], trapped ions [18], and superconducting circuits
[19]. A specific focus has been on realizing high-fidelity
implementations that can support fault-tolerant opera-
tion. In addition, there has been some work to design
the physical layout and instruction architectures for cer-
tain technologies [20–25].
A typical usage case for a QPU begins by preparing
the quantum register in a well-defined initial state and
then applying a sequence of gates that may act on indi-
3vidual or multiple register elements. This is commonly
referred to as the QRAM model, first articulated by Knill
[26]. We define a QPU to include a QRAM that applies
low-level gates to register elements, a quantum control
unit (QCU) that parses programs into instructions, and
a classical controller interface that defines how the CPU
within a host system interacts with the QPU. The ex-
act sequence of gates is determined by the host program,
which the QCU parses into an intermediate representa-
tion using an instruction set architecture (ISA). The ISA
represents a set of high-level instructions that are avail-
able for programming the QRAM. Instruction sequences
are generated when a compiled program is decoded by
the QCU, after which the instructions are parsed by the
QRAM into gates, i.e., machine codes, that are specific
to the QPU technology base. At present, there are a
variety of technologies under consideration for quantum
computing and they each support different gate opera-
tions. The ISA provides a framework for standardizing
the interface between different QPU technologies.
FIG. 1: A sequence diagram modeling the interactions be-
tween a host CPU and a QPU. The QPU interface defines
the internal QRAM model and drives gates operating on the
register. The QCU parses the incoming instructions and the
outgoing response according to the computational model and
device physics.
Applying the QRAM model within the QPU context
defines an interface with the classical controller allocated
to the host CPU (see Fig. 1). The CPU tasks the QPU by
submitting a program and then waits for the reply. These
tasks must represent quantum computational workloads
that can be parsed out by the QCU, where the interface
may be implemented in software or hardware. Develop-
ment of these interfaces is still an open question although
recent progress has been made in defining quantum pro-
gramming languages for this purpose [27–30]. These lan-
guages offer exposure to the gates and registers needed
for programming low-level quantum algorithms, and we
expect future generations will likely grow to addressing
additional data structures and instructions. Interfaces
that encapsulate and mask quantum hardware details
from the software developer are especially important for
maintaining existing applications across a variety of HPC
environments. As an example, an application program
interface (API) that requires software developers to inte-
grate quantum processing instructions directly into the
code base will prevent its adoption (due to the burden of
code rewriting).
In addition to local operations driven by a host CPU,
a QPU may also interact directly with other QPUs. This
may be necessary to communicate a computational state
between QPUs or to prepare both registers in a mutu-
ally entangled state. These operations require the pres-
ence of a quantum network, which uses quantum physical
systems to communicate quantum states between regis-
ters. However, a notable feature of quantum computing
is that intermediate computations cannot be copied dur-
ing the communication. This is a consequence of the
no-cloning theorem from quantum mechanics that limits
the precision with which arbitrary quantum states can
be duplicated. Instead, communication between QPUs
must use either direct transmission or teleportation [13].
Direct transmission transfers the value of the first regis-
ter over the quantum network by using a mobile quan-
tum carrier. Upon receiving the transmission, the second
QPU swaps this states into its register. This approach
most closely resembles conventional read-write communi-
cation in an HPC network. But quantum communication
also supports teleportation, which allows for the value
of a first register to be transferred to a second register
without passing through the quantum network. Instead,
teleportation uses pre-existing entanglement between the
QPUs to perform the data transfer. This does also re-
quire transmission of classical side-channel information
from the first QPU to the second, however this classical
information is generally much less than the information
needed to describe the value of the quantum register.
III. QPU INTEGRATION STRATEGIES
The simplified CPU-QPU execution model presented
in Sec. II offers a variety of different integration strate-
gies for the development of large-scale hybrid computing
systems. A significant obstacle to integration is the phys-
ical hardware requirements of current experimental quan-
tum computing devices. Many technologies that could be
used to realize a fully functional QPU currently require
bulky and costly infrastructure. This includes the use
of dilution refrigerators to suppress thermal noise, elec-
tromagnetic shielding to avert ambient energy, and ultra-
high vacuum enclosures to prevent device contamination.
In addition, most devices require relatively complex elec-
tronic and optical control systems that must cross the
physical barriers to the processor. An exemplary system
schematic is shown in Fig. 2.
We anticipate that QPU requirements will ease with
future device development and refined engineering prin-
ciples. For example, recent work on ultra-cold operation
of FGPAs to drive silicon qubits suggest there is a path
toward integration of the QPU control interface within
the dilution refrigerator [31]. Similarly, progress in the
miniaturization of electronic controls for linear optical
4FIG. 2: Asymmetric multi-processor architecture for integrat-
ing a stand-alone quantum computer (QC) with an HPC sys-
tem. We highlight components of the QC system that rep-
resent the substantial infrastructure required for interfacing
and controlling the QPU.
quantum chips hints at scalable operations in the future
[32]. However, these devices still remain far from the
typical hardware environments on which modern HPC
systems have been built, namely, room temperature op-
eration, direct interaction with the host CPU, and eas-
ily managed footprints for individual processors. Con-
sequently, integration opportunities for QPUs naturally
separate into loosely and tightly bound systems.
In the loose integration path, QPUs remain as isolated
operational elements that must interact with a host HPC
system using a network interface. This is effectively a
client-server model as shown in Fig. 3 where the quan-
tum computing (QC) server may either be on a dedicated
network or part of a larger computational grid. In this
asymmetric multi-processor model, the network commu-
nicates requests between the host (client) system and the
QC server. As indicated in Fig. 3, the QC server may host
multiple QPUs and these may interact via a quantum in-
terconnect. However, the entry point into the system
remains the primary bottleneck. This connection can be
streamlined when both systems are within a local area
network, but access to each QPU must still be provi-
sioned by the QC control system. This control system
may appear as a switch that forwards program data to
individual QPUs, or it may more intelligently route pro-
grams based on QPU usage and demand.
The demands of the client-server model in Fig. 3 force
a trade-off between the communication latency and the
computational speed-up gained from using a QPU in-
stead of a CPU (or some other local resource). This
trade-off is advantageous when the communication time
is offset by the QPU speed up, but this will depend
on problem type as well as size. Moreover, evaluation
of the model is complicated by the communication pat-
terns arising from multiple CPU nodes and any latency
they may experience from resource competition. There-
fore, the client-server model is likely to be broadly use-
ful only when the computational gain over conventional
approaches is significant. This adds emphasis to the im-
portance of the underlying quantum algorithm.
FIG. 3: An asymmetric multiprocessor model employing a
quantum computing (QC) server, e.g., a form of cloud-based
quantum computing. The dashed lines indicate a quantum in-
terconnects between QPUs while solid lines indicate classical
interconnects. The concept of QC as a service offers increased
flexibility and ease-of-use at the expense of communication
latencies. Latencies will contribute to overall execution tim-
ing and, depending on problem and program structure, could
partially negate quantum computational advantages.
The loosely integrated client-server model also sup-
ports the alternative use case of cloud-based quantum
computing. In this setting, the QC server is a rare re-
source in demand from multiple users simultaneously.
For a system containing q QPUs, the server can support
classical MIMD parallelism with each node performing
an isolated job. As a measure of server capacity, the
dimension of the server Hilbert space is q2n given a n-
qubit register on each node. By contrast, the presence
of a quantum interconnect linking the individual QPUs
offers a Hilbert space of 2nq. This exponential increase
in server capacity with node number is not a guaran-
tee of computational speed up. The cloud-based quan-
tum computing model may be especially attractive for
blind quantum computing, which permits a user to sub-
mit a job request without revealing details about either
the data or instructions [33].
The tight integration path is shown in Figs. 4 and 5
and represents a progression toward more sophisticated
accelerator models. The goal of this design is to move the
QPU closer to the host node in order to eliminate commu-
nication latency and maximize computational speed up.
This design assumes the hardware requirements for QPUs
can be eased to the point that a single, tightly connected
single-system model can be created. As mentioned above,
this will require multiple advances in the classical infras-
tructure used by current experimental devices. Figure 4
(left) represents a first design based on a shared resource
model that permits multiple CPU nodes to interact with
a single QPU node. Like the server model, a single QPU
is responsible for managing requests for multiple CPUs
and requires a robust classical controller interface. This
communication design also represents a bottleneck but
the tighter integration alleviates some of the overhead
required in the loose model. In addition, data from mul-
tiple CPUs can be aggregated by the QPU. This use case
may appear when pre-processing of the input for the
5FIG. 4: (left) A shared resource model in which a single QPU
is accessed by multiple CPU nodes. (right) A standard accel-
erator model in which QPUs are attached to nodes hosted on
a classical interconnect. The absence of quantum networking
between QPUs restricts the scaling with respect to the quan-
tum resources and enforces a classical domain decomposition
paradigm.
FIG. 5: An accelerator model with QPUs incorporating a
quantum interconnect that supports both quantum and clas-
sical parallelism. QPUs may be addressed individually or
collectively through the coordinated CPU elements.
QPU can be parallelized across CPU nodes. Alterna-
tively, if the QPU is part of a MIMD or data streaming
model, then the redirection of QPU results to another
node may be useful.
A more pronounced example of the accelerator model
is presented in the right panel of Fig. 4. This design
most closely matches that used for integrating GPU ac-
celerators into modern HPC systems as each CPU node
is tightly integrated with a dedicated QPU. This greatly
simplifies the QPU interface. This hardware model
also naturally matches many existing program data ac-
cess patterns, in which top-level memory management
is driven by domain decomposition with low-level data
movement restricted to single nodes. In this sense, the
design is motivated by an initial application of classi-
cal parallelism and subsequently followed by quantum
parallelism. While this model offers the appeal that
it would minimize the refactoring required of existing
source codes, it also restricts the amount of quantum
parallelism available.
Instead, Fig. 5 represents an accelerator model in
which individual QPUs are interconnected. The quan-
tum interconnect establishes quantum communication
between each QPU and offers the possibility of gener-
ating entangled states between registers. Like the con-
ventional interconnect used to establish communication
between CPU nodes, the quantum interconnect will re-
quire additional switches and possibly routers to perform
robust communication. In this tightly integrated limit, a
collection of interconnected QPUs may be abstracted as
a single QRAM accessed through interfaces at multiple
CPU nodes. This design offers the benefit of maximiz-
ing the potential quantum parallelism across QPUs and
the classical parallelism across CPUs, where hybrid nodes
may cooperate by exchanging messages and data across
both interconnects.
IV. PERFORMANCE METRICS
An important aspect in evaluating the merits of QPUs
for use in any of the system integration strategies is iden-
tifying performance metrics that are both well-defined
and meaningful. Basic metrics for conventional comput-
ing typically focus on register size, word size, FLOPS, etc.
More elaborate measures take account of multi-threaded
processor pipelines, memory access speeds, and commu-
nication latency. Processing models that include SIMD
and MIMD parallelism are also judged according to use
cases and targeted problem sets. Each metric can also be
monetized, e.g., FLOPS per watt and FLOPS per unit
currency, to address specific stakeholder interests.
Despite a long history of evaluating conventional HPC
systems, the unique features of QPUs pose new chal-
lenges in measuring performance. First and foremost
is the difference in the underlying computational model.
For example, QCUs and QRAMs will certainly require
clocks to execute the instructions and machine codes act-
ing on the quantum register, but the speed of the clock
is not directly proportional to the speed at which gates
are applied. Instead, the gates themselves are compli-
cated sequences of control signals applied to the physical
system and they cannot be made arbitrarily fast. The
speed of gate execution is complicated further by the use
of fault-tolerant (FT) instruction protocols. These pro-
tocols protect against errors but necessarily require the
use of additional primitive gates and qubits [13]. Fault-
tolerant instruction protocols may vary with program se-
quence as well as data locality. Since the context of the
program ultimately determines the speed at which gates
are executed, the quantum FLOPS analogy for measur-
ing QPU performance is poorly defined.
Nevertheless, QPU performance can be measured. One
measure is the relative overhead required by the FT in-
struction protocols for the ISA. At the scope of the QCU,
the cycles per instruction can be extracted. Moreover,
the longest duration gate within a QRAM can serve as
a worst-case measure of performance while the shortest
6gate reflects best time cost. The timing difference be-
tween these instructions offers a measure of the spread in
QRAM performance on any random gate instance. The
same measures can be applied to the complete set of QPU
instructions to compile a snapshot of worse and best-case
timings.
These definitions do not require reference to a particu-
lar technology as they rely on abstraction of the QRAM
and QPU interfaces. This is advantageous for making
performance comparisons against different technologies,
which may have widely different physics and control sig-
nals. Different QPU technologies may also employ vastly
different ISAs in a manner that is reminiscent of RISC
versus CISC designs. For example, a QPU based on
the adiabatic quantum computing model may use a sin-
gle, time-continuous gate to implement an instruction,
whereas the same instruction would be realized in the
circuit model using a lengthy sequence of discrete prim-
itive gates. A comparison between these different QPUs
that simply references gate duration and spread forgoes
these important details. Additionally, task-specific QPUs
will necessitate special-purpose metrics to differentiate
between how problems are solved. Restrictions on pro-
cessor behavior can be useful, however, for purposes of
forming comparisons provided that the context is well
specified.
A third consideration for measuring QPU performance
is that quantum algorithms are often probabilistic. This
introduces the notion of repeated sampling of the read-
out register in order to collect sufficient statistics for re-
porting a final result. Depending on the level of instruc-
tion abstraction, the programmer may handle this type
of sampling or it may arise within the QPU itself. For
those use cases where quantum behavior is intended to be
hidden from the user, e.g., with high-level languages and
libraries, QPU performance will be impacted by these
classical pre- and post-processing steps. The use of sta-
tistical sampling to derive the final result requires confi-
dence levels that determine the number of required sam-
ples and therefore total duration of the program. Nei-
ther the instruction or gate metric that we have proposed
would measure this aspect of QPU performance. Instead
this becomes an element of benchmarking against certain
problem sets and solution goals. Algorithmic complexity
statements will offer some guidelines on the scaling of
these slowdowns, but experimental tests will be needed
to identify the variance in total function performance.
Performance of the quantum interconnect is also a ma-
jor factor in overall system behavior. The rate at which
QPUs establish entangled registers may be initially ig-
nored, since these operations can occur offline from the
program execution. However, in a high-performance set-
ting the rate of entangling and preserving entanglement
between nodes represents a potential bottleneck for the
system. Idle registers in a QPU require active error cor-
rection to mitigate against noise, and any delays in the
quantum interconnect will add to this overhead.
It therefore seems very likely that QPUs will need to
use some form of network management controller that
interacts with the quantum programs wishing to execute
using entangled registers. The network manager will be
responsible for ensuring availability of entangled registers
when requested by a program. This will require coordi-
nation between the interconnect, the error-corrected reg-
isters, and the program instructions, such that the man-
ager refreshes entanglement between QPUs only when
needed by the program. However, faults and latencies in
the both the QPUs and the interconnect will complicate
these instructions and may eventually lead to communi-
cation failure. Therefore, the performance of the inter-
connect and especially the entangling operations is likely
to be a major factor in overall system behavior.
V. CONCLUSIONS
As QPUs mature from basic scientific testbeds to
robust devices, they will likely be adopted for both
application-specific and general purpose computing. We
have investigated several possible abstract architectures
for integrating QPUs into HPC systems. We have ex-
amined both loose and tight integration designs, which
differ primarily in the communication infrastructure and
run-time environment needed to host the QPU. The rel-
ative performance of each design is expected to depend
on how well the quantum algorithm and its programming
model offsets the costs of this communication as well as
the intended use case.
We have also emphasized that one of the most im-
portant aspects of future HPC with QPUs is the quan-
tum interconnect. It has long been appreciated for mas-
sively parallel processing systems that the communica-
tion backbone between nodes plays a significant role in
performance. This has been underscored in recent years
with awareness that communication costs may often be
bottlenecks in application scaling. It is clear that a quan-
tum interconnect can enhance system functionality by
enlarging the set of accessible register states, but it re-
mains unclear if the interconnect would provide a net
benefit. This is because the entanglement established be-
tween nodes by the interconnect would expose the system
to a potentially more serious fault model, in which cor-
related errors lead to a cascade of failures across QPUs.
Appreciable attention has been placed on fault-tolerant
ISAs within the context of local computation, and simi-
lar techniques for managing distributed QPUs will be an
important issue moving forward. Fault models for these
architectures and protocols for mitigating against these
types of failures are needed.
Existing metrics for conventional computing do not
capture all aspects of the QPU behavior, and we have
suggested several new features that need to be tracked
when tuning system performance. This includes the over-
head in fault-tolerant ISA’s as well as the spread in in-
struction timings. However, some instructions may be so
complex that there performance can only be measured in
7very restrictive settings, e.g., as special-purpose QPUs.
The comparison of these metrics with each other offers a
quantitative means of assessing the value of QPU-enabled
systems, but only when they can be related to existing
system metrics, e.g., FLOPS, etc. Putting quantum met-
rics at the same level of inspections as those of a CPU-
based measurements will require more detailed execution
models for the entire system.
[1] V. Kindratenko, G. K. Thiruvathukal, and S. Got-
tlieb, Computing in Science & Engineering 10, 13
(2008), URL http://scitation.aip.org/content/aip/
journal/cise/10/6/10.1109/MCSE.2008.149.
[2] B. I. Schneider, Computing in Science Engineering 17, 9
(2015), ISSN 1521-9615.
[3] J. Ang, K. Bergman, S. Borkar, W. Carlson, L. Car-
rington, G. Chiu, R. Colwell, W. Dally, J. Dongarra,
A. Geist, et al., Tech. Rep., DOE ASCAC Subcommittee
Report (2010).
[4] S. Ashby, P. Beckman, J. Chen, P. Colella, B. Collins,
D. Crawford, J. Dongarra, D. Kothe, R. Lusk,
P. Messina, et al., Tech. Rep., Summary Report of
the Advanced Scientific Computing Advisory Committee
Subcommittee (2010).
[5] A. Geist and R. Lucas, Int. J. of High. Perform. Comput.
Appl. 23, 427436 (2009).
[6] D. R. Simon, SIAM Journal on Computing 26,
1474 (1997), URL http://epubs.siam.org/doi/abs/
10.1137/S0097539796298637.
[7] P. W. Shor, SIAM Journal on Computing 26,
1484 (1997), URL http://epubs.siam.org/doi/abs/
10.1137/S0097539795293172.
[8] I. Kassal, J. D. Whitfield, A. Perdomo-Ortiz, M.-H.
Yung, and A. Aspuru-Guzik, Annual Review of Physical
Chemistry 62, 185 (2011), URL http://dx.doi.org/10.
1146/annurev-physchem-032210-103512.
[9] D. S. Abrams and S. Lloyd, Phys. Rev. Lett. 79,
2586 (1997), URL http://link.aps.org/doi/10.1103/
PhysRevLett.79.2586.
[10] R. V. van Nieuwpoort, T. Kielmann, and H. E. Bal,
SIGPLAN Not. 36, 34 (2001), ISSN 0362-1340, URL
http://doi.acm.org/10.1145/568014.379563.
[11] B. Bauer, D. Wecker, A. J. Millis, M. B. Hastings, and
M. Troyer, Hybrid quantum-classical approach to corre-
lated materials (2015), arXiv:1510.03859 [quant-ph].
[12] J. M. Kreula, S. R. Clark, and D. Jaksch, A quantum co-
processor for accelerating simulations of non-equilibrium
many body quantum dynamics (2015), arXiv:1510.05703
[quant-ph].
[13] M. A. Nielsen and I. L. Chuang, Quantum computation
and quantum information (Cambridge University Press,
2000).
[14] D. Deutsch, Proceedings of the Royal Society of London
Series A 400, 97 (1985).
[15] A. M. Childs and W. van Dam, Rev. Mod. Phys.
82, 1 (2010), URL http://link.aps.org/doi/10.1103/
RevModPhys.82.1.
[16] M. Johnson, M. Amin, S. Gildert, T. Lanting, F. Hamze,
N. Dickson, R. Harris, A. Berkley, J. Johansson,
P. Bunyk, et al., Nature 473, 194 (2011).
[17] K. Saeedi, S. Simmons, J. Z. Salvail, P. Dluhy, H. Rie-
mann, N. V. Abrosimov, P. Becker, H.-J. Pohl, J. J. L.
Morton, and M. L. W. Thewalt, Science 342, 830 (2013),
URL http://www.sciencemag.org/content/342/6160/
830.abstract.
[18] C. Monroe and J. Kim, Science 339, 1164 (2013),
URL http://www.sciencemag.org/content/339/6124/
1164.abstract.
[19] M. H. Devoret and R. J. Schoelkopf, Science 339, 1169
(2013), URL http://www.sciencemag.org/content/
339/6124/1169.abstract.
[20] R. v. Meter and M. Oskin, ACM Journal on Emerg-
ing Technologies in Computing Systems (JETC) 2, 31
(2006).
[21] D. D. Thaker, T. S. Metodi, A. W. Cross, I. L. Chuang,
and F. T. Chong, SIGARCH Comput. Archit. News 34,
378 (2006), ISSN 0163-5964, URL http://doi.acm.org/
10.1145/1150019.1136518.
[22] R. Van Meter, T. D. Ladd, A. G. Fowler, and Y. Ya-
mamoto, International Journal of Quantum Information
8, 295 (2010).
[23] N. C. Jones, R. Van Meter, A. G. Fowler, P. L. McMahon,
J. Kim, T. D. Ladd, and Y. Yamamoto, Physical Review
X 2, 031007 (2012).
[24] T. S. Metodi, A. I. Faruque, and F. T. Chong, Quan-
tum Computing for Computer Architects, Second Edition,
Synthesis Lectures on Computer Architecture (Morgan &
Claypool Publishers, 2011), URL http://dx.doi.org/
10.2200/S00331ED1V01Y201101CAC013.
[25] C. D. Hill, E. Peretz, S. J. Hile, M. G. House, M. Fuech-
sle, S. Rogge, M. Y. Simmons, and L. C. L. Hollenberg,
Science Advances 1 (2015).
[26] E. Knill, Tech. Rep., Technical Report LAUR-96-2724,
Los Alamos National Laboratory (1996).
[27] P. Selinger, Mathematical Structures in Com-
puter Science 14, 527 (2004), ISSN 1469-8072,
URL http://journals.cambridge.org/article_
S0960129504004256.
[28] A. S. Green, P. L. Lumsdaine, N. J. Ross, P. Selinger, and
B. Valiron, in Proceedings of the 34th ACM SIGPLAN
Conference on Programming Language Design and Im-
plementation (ACM, New York, NY, USA, 2013), PLDI
’13, pp. 333–342, ISBN 978-1-4503-2014-6, URL http:
//doi.acm.org/10.1145/2491956.2462177.
[29] D. Wecker and K. M. Svore, LIQUID: A Software Design
Architecture and Domain-Specific Language for Quan-
tum Computing (2014), http://arxiv.org/pdf/1402.
4467v1.pdf.
[30] A. J. Abhari, A. Faruque, M. J. Dousti, L. Svec, O. Catu,
A. Chakrabati, C.-F. Chiang, S. Vanderwilt, J. Black,
F. Chong, et al., Tech. Rep. (2012), URL ftp://ftp.
cs.princeton.edu/techreports/2012/934.pdf.
[31] J. M. Hornibrook, J. I. Colless, I. D. Conway Lamb, S. J.
Pauka, H. Lu, A. C. Gossard, J. D. Watson, G. C. Gard-
ner, S. Fallahi, M. J. Manfra, et al., Phys. Rev. Applied
3, 024010 (2015), URL http://link.aps.org/doi/10.
1103/PhysRevApplied.3.024010.
[32] J. Carolan, C. Harrold, C. Sparrow, E. Martn-Lpez, N. J.
Russell, J. W. Silverstone, P. J. Shadbolt, N. Matsuda,
8M. Oguma, M. Itoh, et al., Science 349, 711 (2015),
URL http://www.sciencemag.org/content/349/6249/
711.abstract.
[33] A. Broadbent, J. Fitzsimons, and E. Kashefi (2009).
