342 research outputs found
Noise-Adaptive Compiler Mappings for Noisy Intermediate-Scale Quantum Computers
A massive gap exists between current quantum computing (QC) prototypes, and
the size and scale required for many proposed QC algorithms. Current QC
implementations are prone to noise and variability which affect their
reliability, and yet with less than 80 quantum bits (qubits) total, they are
too resource-constrained to implement error correction. The term Noisy
Intermediate-Scale Quantum (NISQ) refers to these current and near-term systems
of 1000 qubits or less. Given NISQ's severe resource constraints, low
reliability, and high variability in physical characteristics such as coherence
time or error rates, it is of pressing importance to map computations onto them
in ways that use resources efficiently and maximize the likelihood of
successful runs.
This paper proposes and evaluates backend compiler approaches to map and
optimize high-level QC programs to execute with high reliability on NISQ
systems with diverse hardware characteristics. Our techniques all start from an
LLVM intermediate representation of the quantum program (such as would be
generated from high-level QC languages like Scaffold) and generate QC
executables runnable on the IBM Q public QC machine. We then use this framework
to implement and evaluate several optimal and heuristic mapping methods. These
methods vary in how they account for the availability of dynamic machine
calibration data, the relative importance of various noise parameters, the
different possible routing strategies, and the relative importance of
compile-time scalability versus runtime success. Using real-system
measurements, we show that fine grained spatial and temporal variations in
hardware parameters can be exploited to obtain an average x (and up to
x) improvement in program success rate over the industry standard IBM
Qiskit compiler.Comment: To appear in ASPLOS'1
Recommended from our members
Designing calibration and expressivity-efficient instruction sets for quantum computing
Near-term quantum computing (QC) systems have limited qubit counts, high gate (instruction) error rates, and typically support a minimal instruction set having one type of two-qubit gate (2Q). To reduce program instruction counts and improve application expressivity, vendors have proposed, and shown proof-of-concept demonstrations of richer instruction sets such as XY gates (Rigetti) and fSim gates (Google). These instruction sets comprise of families of 2Q gate types parameterized by continuous qubit rotation angles. That is, it allows a large set of different physical operations to be realized on the qubits, based on the input angles. However, having such a large number of gate types is problematic because each gate type has to be calibrated periodically, across the full system, to obtain high fidelity implementations. This results in substantial recurring calibration overheads even on current systems which use only a few gate types. Our work aims to navigate this tradeoff between application expressivity and calibration overhead, and identify what instructions vendors should implement to get the best expressivity with acceptable calibration time.Studying this tradeoff is challenging because of the diversity in QC application requirements, the need to optimize applications for widely different hardware gate types and noise variations across gate types. Therefore, our work develops NuOp, a flexible compilation pass based on numerical optimization, to efficiently decompose application operations into arbitrary hardware gate types. Using NuOp and four important quantum applications, we study the instruction set proposals of Rigetti and Google, with realistic noise simulations and a calibration model. Our experiments show that implementing 4-8 types of 2Q gates is sufficient to attain nearly the same expressivity as a full continuous gate family, while reducing the calibration overhead by two orders of magnitude. With several vendors proposing rich gate families as means to higher fidelity, our work has potential to provide valuable instruction set design guidance for near-term QC systems
Architectural Support for Optimizing Huge Page Selection Within the OS
© 2023 Copyright held by the owner/author(s). This document is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
This document is the Accepted version of a Published Work that appeared in final form in 56th ACM/IEEE International Symposium on Microarchitecture (MICRO), Toronto, Canada. To access the final edited and published work see https://doi.org/10.1145/3613424.3614296Irregular, memory-intensive applications often incur high translation lookaside buffer (TLB) miss rates that result in significant address translation overheads. Employing huge pages is an effective way to reduce these overheads, however in real systems the number of available huge pages can be limited when system memory is nearly full and/or fragmented. Thus, huge pages must be used selectively to back application memory. This work demonstrates that choosing memory regions that incur the most TLB misses for huge page promotion best reduces address translation overheads. We call these regions High reUse TLB-sensitive data (HUBs). Unlike prior work which relies on expensive per-page software counters to identify promotion regions, we propose new architectural support to identify these regions dynamically at application runtime. We propose a promotion candidate cache (PCC) that identifies HUB candidates based on hardware page table walks after a lastlevel TLB miss. This small, fixed-size structure tracks huge pagealigned regions (consisting of base pages), ranks them based on observed page table walk frequency, and only keeps the most frequently accessed ones. Evaluated on applications of various memory intensity, our approach successfully identifies application pages incurring the highest address translation overheads. Our approach demonstrates that with the help of a PCC, the OS only needs to promote 4% of the application footprint to achieve more than 75% of the peak achievable performance, yielding 1.19-1.33× speedups over 4KB base pages alone. In real systems where memory is typically fragmented, the PCC outperforms Linux’s page promotion policy by 14% (when 50% of total memory is fragmented) and 16% (when 90% of total memory is fragmented) respectively
Architecting Noisy Intermediate-Scale Trapped Ion Quantum Computers
Trapped ions (TI) are a leading candidate for building Noisy
Intermediate-Scale Quantum (NISQ) hardware. TI qubits have fundamental
advantages over other technologies such as superconducting qubits, including
high qubit quality, coherence and connectivity. However, current TI systems are
small in size, with 5-20 qubits and typically use a single trap architecture
which has fundamental scalability limitations. To progress towards the next
major milestone of 50-100 qubits, a modular architecture termed the Quantum
Charge Coupled Device (QCCD) has been proposed. In a QCCD-based TI device,
small traps are connected through ion shuttling. While the basic hardware
components for such devices have been demonstrated, building a 50-100 qubit
system is challenging because of a wide range of design possibilities for trap
sizing, communication topology and gate implementations and the need to match
diverse application resource requirements.
Towards realizing QCCD systems with 50-100 qubits, we perform an extensive
architectural study evaluating the key design choices of trap sizing,
communication topology and operation implementation methods. We built a design
toolflow which takes a QCCD architecture's parameters as input, along with a
set of applications and realistic hardware performance models. Our toolflow
maps the applications onto the target device and simulates their execution to
compute metrics such as application run time, reliability and device noise
rates. Using six applications and several hardware design points, we show that
trap sizing and communication topology choices can impact application
reliability by up to three orders of magnitude. Microarchitectural gate
implementation choices influence reliability by another order of magnitude.
From these studies, we provide concrete recommendations to tune these choices
to achieve highly reliable and performant application executions.Comment: Published in ISCA 2020 https://www.iscaconf.org/isca2020/program/
(please cite the ISCA version
Sensor Data Collection through Unmanned Aircraft Gateways
Current addressing and service discovery schemes in mobile networks are not well-suited to multihop disconnected networks. This paper describes an implementation of a highly mobile ad-hoc network (MANET) that may never experience end-to-end connectivity. Spe-cial gateway nodes are described which are responsible for intelligently routing messages to their intended destination(s). These gateway nodes qualify their links and announce their status to the MANET, a simple approach to service discovery that is effective in this implementation. This implementation has been tested in an outdoor environment. I
Architectures for Multinode Superconducting Quantum Computers
Many proposals to scale quantum technology rely on modular or distributed
designs where individual quantum processors, called nodes, are linked together
to form one large multinode quantum computer (MNQC). One scalable method to
construct an MNQC is using superconducting quantum systems with optical
interconnects. However, a limiting factor of these machines will be internode
gates, which may be two to three orders of magnitude noisier and slower than
local operations. Surmounting the limitations of internode gates will require a
range of techniques, including improvements in entanglement generation, the use
of entanglement distillation, and optimized software and compilers, and it
remains unclear how improvements to these components interact to affect overall
system performance, what performance from each is required, or even how to
quantify the performance of each. In this paper, we employ a `co-design'
inspired approach to quantify overall MNQC performance in terms of hardware
models of internode links, entanglement distillation, and local architecture.
In the case of superconducting MNQCs with microwave-to-optical links, we
uncover a tradeoff between entanglement generation and distillation that
threatens to degrade performance. We show how to navigate this tradeoff, lay
out how compilers should optimize between local and internode gates, and
discuss when noisy quantum links have an advantage over purely classical links.
Using these results, we introduce a roadmap for the realization of early MNQCs
which illustrates potential improvements to the hardware and software of MNQCs
and outlines criteria for evaluating the landscape, from progress in
entanglement generation and quantum memory to dedicated algorithms such as
distributed quantum phase estimation. While we focus on superconducting devices
with optical interconnects, our approach is general across MNQC
implementations.Comment: 23 pages, white pape
Data Compression Algorithms for Energy-Constrained Devices in Delay Tolerant Networks
Sensor networks are fundamentally constrained b y the difficulty and energy expense of delivering information from sensors to sink. Our work has focused on garnerin g additional significant energ y improvements b y d ev isin g computationally-efficient lossless compression algorithms on the source node. These reduce the amount of data that must be passed through the network and to the sink, and thus have energy benefits that are multiplicative with the number of hops the data travels through the network. Currently, if sensor system designers want to compress acquired data, they must either develop application-specific compression algorithms or use off-the-shelf algorithms not designed for resource-constrained sensor nodes. This paper discusses the design issues involved with implementing, adapting, and customizing compression algorithms specifically geared for sensor nodes. While developing Sensor LZW (S-LZW) and some simple, but effective, variations to this algorithm, we show how different amounts of compression can lead to energy savings on both the compressing node and throughout the network and that the savings depends heavily on the radio hardware. To validate and evaluate our work, we apply it to datasets from several different real-world deployments and show that our approaches can reduce energy consumption by up to a factor of 4.5X across the network
Recommended from our members
Verifying Correct Microarchitectural Enforcement of Memory Consistency Models
Memory consistency models define the rules and guarantees about the ordering and visibility of memory references on multithreaded CPUs and systems on chip. PipeCheck offers a methodology and automated tool for verifying that a particular microarchitecture correctly implements the consistency model required by its architectural specification
- …