3 research outputs found
Noise-Adaptive Compiler Mappings for Noisy Intermediate-Scale Quantum Computers
A massive gap exists between current quantum computing (QC) prototypes, and
the size and scale required for many proposed QC algorithms. Current QC
implementations are prone to noise and variability which affect their
reliability, and yet with less than 80 quantum bits (qubits) total, they are
too resource-constrained to implement error correction. The term Noisy
Intermediate-Scale Quantum (NISQ) refers to these current and near-term systems
of 1000 qubits or less. Given NISQ's severe resource constraints, low
reliability, and high variability in physical characteristics such as coherence
time or error rates, it is of pressing importance to map computations onto them
in ways that use resources efficiently and maximize the likelihood of
successful runs.
This paper proposes and evaluates backend compiler approaches to map and
optimize high-level QC programs to execute with high reliability on NISQ
systems with diverse hardware characteristics. Our techniques all start from an
LLVM intermediate representation of the quantum program (such as would be
generated from high-level QC languages like Scaffold) and generate QC
executables runnable on the IBM Q public QC machine. We then use this framework
to implement and evaluate several optimal and heuristic mapping methods. These
methods vary in how they account for the availability of dynamic machine
calibration data, the relative importance of various noise parameters, the
different possible routing strategies, and the relative importance of
compile-time scalability versus runtime success. Using real-system
measurements, we show that fine grained spatial and temporal variations in
hardware parameters can be exploited to obtain an average x (and up to
x) improvement in program success rate over the industry standard IBM
Qiskit compiler.Comment: To appear in ASPLOS'1
Formal Constraint-based Compilation for Noisy Intermediate-Scale Quantum Systems
Noisy, intermediate-scale quantum (NISQ) systems are expected to have a few
hundred qubits, minimal or no error correction, limited connectivity and limits
on the number of gates that can be performed within the short coherence window
of the machine. The past decade's research on quantum programming languages and
compilers is directed towards large systems with thousands of qubits. For near
term quantum systems, it is crucial to design tool flows which make efficient
use of the hardware resources without sacrificing the ease and portability of a
high-level programming environment. In this paper, we present a compiler for
the Scaffold quantum programming language in which aggressive optimization
specifically targets NISQ machines with hundreds of qubits. Our compiler
extracts gates from a Scaffold program, and formulates a constrained
optimization problem which considers both program characteristics and machine
constraints. Using the Z3 SMT solver, the compiler maps program qubits to
hardware qubits, schedules gates, and inserts CNOT routing operations while
optimizing the overall execution time. The output of the optimization is used
to produce target code in the OpenQASM language, which can be executed on
existing quantum hardware such as the 16-qubit IBM machine. Using real and
synthetic benchmarks, we show that it is feasible to synthesize near-optimal
compiled code for current and small NISQ systems. For large programs and
machine sizes, the SMT optimization approach can be used to synthesize compiled
code that is guaranteed to finish within the coherence window of the machine.Comment: Invited paper in Special Issue on Quantum Computer Architecture: a
full-stack overview, Microprocessors and Microsystem
Interconnection networks for parallel and distributed computing
Parallel computers are generally either shared-memory machines or distributed- memory machines. There are currently technological limitations on shared-memory architectures and so parallel computers utilizing a large number of processors tend tube distributed-memory machines. We are concerned solely with distributed-memory multiprocessors. In such machines, the dominant factor inhibiting faster global computations is inter-processor communication. Communication is dependent upon the topology of the interconnection network, the routing mechanism, the flow control policy, and the method of switching. We are concerned with issues relating to the topology of the interconnection network. The choice of how we connect processors in a distributed-memory multiprocessor is a fundamental design decision. There are numerous, often conflicting, considerations to bear in mind. However, there does not exist an interconnection network that is optimal on all counts and trade-offs have to be made. A multitude of interconnection networks have been proposed with each of these networks having some good (topological) properties and some not so good. Existing noteworthy networks include trees, fat-trees, meshes, cube-connected cycles, butterflies, Möbius cubes, hypercubes, augmented cubes, k-ary n-cubes, twisted cubes, n-star graphs, (n, k)-star graphs, alternating group graphs, de Bruijn networks, and bubble-sort graphs, to name but a few. We will mainly focus on k-ary n-cubes and (n, k)-star graphs in this thesis. Meanwhile, we propose a new interconnection network called augmented k-ary n- cubes. The following results are given in the thesis.1. Let k ≥ 4 be even and let n ≥ 2. Consider a faulty k-ary n-cube Q(^k_n) in which the number of node faults f(_n) and the number of link faults f(_e) are such that f(_n) + f(_e) ≤ 2n - 2. We prove that given any two healthy nodes s and e of Q(^k_n), there is a path from s to e of length at least k(^n) - 2f(_n) - 1 (resp. k(^n) - 2f(_n) - 2) if the nodes s and e have different (resp. the same) parities (the parity of a node Q(^k_n) in is the sum modulo 2 of the elements in the n-tuple over 0, 1, ∙∙∙ , k - 1 representing the node). Our result is optimal in the sense that there are pairs of nodes and fault configurations for which these bounds cannot be improved, and it answers questions recently posed by Yang, Tan and Hsu, and by Fu. Furthermore, we extend known results, obtained by Kim and Park, for the case when n = 2.2. We give precise solutions to problems posed by Wang, An, Pan, Wang and Qu and by Hsieh, Lin and Huang. In particular, we show that Q(^k_n) is bi-panconnected and edge-bipancyclic, when k ≥ 3 and n ≥ 2, and we also show that when k is odd, Q(^k_n) is m-panconnected, for m = (^n(k - 1) + 2k - 6’ / ‘_2), and (k -1) pancyclic (these bounds are optimal). We introduce a path-shortening technique, called progressive shortening, and strengthen existing results, showing that when paths are formed using progressive shortening then these paths can be efficiently constructed and used to solve a problem relating to the distributed simulation of linear arrays and cycles in a parallel machine whose interconnection network is Q(^k_n) even in the presence of a faulty processor.3. We define an interconnection network AQ(^k_n) which we call the augmented k-ary n-cube by extending a k-ary n-cube in a manner analogous to the existing extension of an n-dimensional hypercube to an n-dimensional augmented cube. We prove that the augmented k-ary n-cube Q(^k_n) has a number of attractive properties (in the context of parallel computing). For example, we show that the augmented k-ary n-cube Q(^k_n) - is a Cayley graph (and so is vertex-symmetric); has connectivity 4n - 2, and is such that we can build a set of 4n - 2 mutually disjoint paths joining any two distinct vertices so that the path of maximal length has length at most max{{n- l)k- (n-2), k + 7}; has diameter [(^k) / (_3)] + [(^k - 1) /( _3)], when n = 2; and has diameter at most (^k) / (_4) (n+ 1), for n ≥ 3 and k even, and at most [(^k)/ (_4) (n + 1) + (^n) / (_4), for n ^, for n ≥ 3 and k odd.4. We present an algorithm which given a source node and a set of n - 1 target nodes in the (n, k)-star graph S(_n,k) where all nodes are distinct, builds a collection of n - 1 node-disjoint paths, one from each target node to the source. The collection of paths output from the algorithm is such that each path has length at most 6k - 7, and the algorithm has time complexity O(k(^3)n(^4))