Search CORE

342 research outputs found

Noise-Adaptive Compiler Mappings for Noisy Intermediate-Scale Quantum Computers

Author: Abhari Ali Javadi
Baker Jonathan M.
Chong Frederic T.
Martonosi Margaret
Murali Prakash
Publication venue
Publication date: 30/01/2019
Field of study

A massive gap exists between current quantum computing (QC) prototypes, and the size and scale required for many proposed QC algorithms. Current QC implementations are prone to noise and variability which affect their reliability, and yet with less than 80 quantum bits (qubits) total, they are too resource-constrained to implement error correction. The term Noisy Intermediate-Scale Quantum (NISQ) refers to these current and near-term systems of 1000 qubits or less. Given NISQ's severe resource constraints, low reliability, and high variability in physical characteristics such as coherence time or error rates, it is of pressing importance to map computations onto them in ways that use resources efficiently and maximize the likelihood of successful runs. This paper proposes and evaluates backend compiler approaches to map and optimize high-level QC programs to execute with high reliability on NISQ systems with diverse hardware characteristics. Our techniques all start from an LLVM intermediate representation of the quantum program (such as would be generated from high-level QC languages like Scaffold) and generate QC executables runnable on the IBM Q public QC machine. We then use this framework to implement and evaluate several optimal and heuristic mapping methods. These methods vary in how they account for the availability of dynamic machine calibration data, the relative importance of various noise parameters, the different possible routing strategies, and the relative importance of compile-time scalability versus runtime success. Using real-system measurements, we show that fine grained spatial and temporal variations in hardware parameters can be exploited to obtain an average

2.9

x (and up to

18

x) improvement in program success rate over the industry standard IBM Qiskit compiler.Comment: To appear in ASPLOS'1

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Recommended from our members

Designing calibration and expressivity-efficient instruction sets for quantum computing

Author: Browne D
Lao L
Martonosi M
Murali P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Near-term quantum computing (QC) systems have limited qubit counts, high gate (instruction) error rates, and typically support a minimal instruction set having one type of two-qubit gate (2Q). To reduce program instruction counts and improve application expressivity, vendors have proposed, and shown proof-of-concept demonstrations of richer instruction sets such as XY gates (Rigetti) and fSim gates (Google). These instruction sets comprise of families of 2Q gate types parameterized by continuous qubit rotation angles. That is, it allows a large set of different physical operations to be realized on the qubits, based on the input angles. However, having such a large number of gate types is problematic because each gate type has to be calibrated periodically, across the full system, to obtain high fidelity implementations. This results in substantial recurring calibration overheads even on current systems which use only a few gate types. Our work aims to navigate this tradeoff between application expressivity and calibration overhead, and identify what instructions vendors should implement to get the best expressivity with acceptable calibration time.Studying this tradeoff is challenging because of the diversity in QC application requirements, the need to optimize applications for widely different hardware gate types and noise variations across gate types. Therefore, our work develops NuOp, a flexible compilation pass based on numerical optimization, to efficiently decompose application operations into arbitrary hardware gate types. Using NuOp and four important quantum applications, we study the instruction set proposals of Rigetti and Google, with realistic noise simulations and a calibration model. Our experiments show that implementing 4-8 types of 2Q gates is sufficient to attain nearly the same expressivity as a full continuous gate family, while reducing the calibration overhead by two orders of magnitude. With several vendors proposing rich gate families as means to higher fidelity, our work has potential to provide valuable instruction set design guidance for near-term QC systems

Princeton University Open Access Repository

UCL Discovery

Architectural Support for Optimizing Huge Page Selection Within the OS

Author: Aragón J.L.
Manocha A.
Martonosi M.
Nellans D.
Tureci E.
Yan Z.
Publication venue
Publication date: 30/10/2023
Field of study

© 2023 Copyright held by the owner/author(s). This document is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ This document is the Accepted version of a Published Work that appeared in final form in 56th ACM/IEEE International Symposium on Microarchitecture (MICRO), Toronto, Canada. To access the final edited and published work see https://doi.org/10.1145/3613424.3614296Irregular, memory-intensive applications often incur high translation lookaside buffer (TLB) miss rates that result in significant address translation overheads. Employing huge pages is an effective way to reduce these overheads, however in real systems the number of available huge pages can be limited when system memory is nearly full and/or fragmented. Thus, huge pages must be used selectively to back application memory. This work demonstrates that choosing memory regions that incur the most TLB misses for huge page promotion best reduces address translation overheads. We call these regions High reUse TLB-sensitive data (HUBs). Unlike prior work which relies on expensive per-page software counters to identify promotion regions, we propose new architectural support to identify these regions dynamically at application runtime. We propose a promotion candidate cache (PCC) that identifies HUB candidates based on hardware page table walks after a lastlevel TLB miss. This small, fixed-size structure tracks huge pagealigned regions (consisting of base pages), ranks them based on observed page table walk frequency, and only keeps the most frequently accessed ones. Evaluated on applications of various memory intensity, our approach successfully identifies application pages incurring the highest address translation overheads. Our approach demonstrates that with the help of a PCC, the OS only needs to promote 4% of the application footprint to achieve more than 75% of the peak achievable performance, yielding 1.19-1.33× speedups over 4KB base pages alone. In real systems where memory is typically fragmented, the PCC outperforms Linux’s page promotion policy by 14% (when 50% of total memory is fragmented) and 16% (when 90% of total memory is fragmented) respectively

DIGITUM Universidad de Murcia (España)

Challenges in computer architecture evaluation

Author: D.I. August
D.J. Lilja
K. Skadron
M. Martonosi
M.D. Hill
V.S. Pai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Architecting Noisy Intermediate-Scale Trapped Ion Quantum Computers

Author: Brown Kenneth R.
Debroy Dripto M.
Martonosi Margaret
Murali Prakash
Publication venue
Publication date: 01/01/2020
Field of study

Trapped ions (TI) are a leading candidate for building Noisy Intermediate-Scale Quantum (NISQ) hardware. TI qubits have fundamental advantages over other technologies such as superconducting qubits, including high qubit quality, coherence and connectivity. However, current TI systems are small in size, with 5-20 qubits and typically use a single trap architecture which has fundamental scalability limitations. To progress towards the next major milestone of 50-100 qubits, a modular architecture termed the Quantum Charge Coupled Device (QCCD) has been proposed. In a QCCD-based TI device, small traps are connected through ion shuttling. While the basic hardware components for such devices have been demonstrated, building a 50-100 qubit system is challenging because of a wide range of design possibilities for trap sizing, communication topology and gate implementations and the need to match diverse application resource requirements. Towards realizing QCCD systems with 50-100 qubits, we perform an extensive architectural study evaluating the key design choices of trap sizing, communication topology and operation implementation methods. We built a design toolflow which takes a QCCD architecture's parameters as input, along with a set of applications and realistic hardware performance models. Our toolflow maps the applications onto the target device and simulates their execution to compute metrics such as application run time, reliability and device noise rates. Using six applications and several hardware design points, we show that trap sizing and communication topology choices can impact application reliability by up to three orders of magnitude. Microarchitectural gate implementation choices influence reliability by another order of magnitude. From these studies, we provide concrete recommendations to tune these choices to achieve highly reliable and performant application executions.Comment: Published in ISCA 2020 https://www.iscaconf.org/isca2020/program/ (please cite the ISCA version

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Dynamic server allocation at parallel queues

Author: Bertsekas D. P.
Fisher M.
Harrison J. M.
Martonosi S. E.
Newell G. F.
Palmer J.
Panayiotou C.
Ridley A. D.
Sethuraman J.
Squillante M. S.
Susan E. Martonosi
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Sensor Data Collection through Unmanned Aircraft Gateways

Author: Akan O. B.
Akyildiz I.
Cerf V. G.
Doshi S.
Jain S.
Juang P.
Martonosi M.
Warthman F.
Xylomenos G.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/01/2007
Field of study

Current addressing and service discovery schemes in mobile networks are not well-suited to multihop disconnected networks. This paper describes an implementation of a highly mobile ad-hoc network (MANET) that may never experience end-to-end connectivity. Spe-cial gateway nodes are described which are responsible for intelligently routing messages to their intended destination(s). These gateway nodes qualify their links and announce their status to the MANET, a simple approach to service discovery that is effective in this implementation. This implementation has been tested in an outdoor environment. I

CiteSeerX

Crossref

Architectures for Multinode Superconducting Quantum Computers

Many proposals to scale quantum technology rely on modular or distributed designs where individual quantum processors, called nodes, are linked together to form one large multinode quantum computer (MNQC). One scalable method to construct an MNQC is using superconducting quantum systems with optical interconnects. However, a limiting factor of these machines will be internode gates, which may be two to three orders of magnitude noisier and slower than local operations. Surmounting the limitations of internode gates will require a range of techniques, including improvements in entanglement generation, the use of entanglement distillation, and optimized software and compilers, and it remains unclear how improvements to these components interact to affect overall system performance, what performance from each is required, or even how to quantify the performance of each. In this paper, we employ a `co-design' inspired approach to quantify overall MNQC performance in terms of hardware models of internode links, entanglement distillation, and local architecture. In the case of superconducting MNQCs with microwave-to-optical links, we uncover a tradeoff between entanglement generation and distillation that threatens to degrade performance. We show how to navigate this tradeoff, lay out how compilers should optimize between local and internode gates, and discuss when noisy quantum links have an advantage over purely classical links. Using these results, we introduce a roadmap for the realization of early MNQCs which illustrates potential improvements to the hardware and software of MNQCs and outlines criteria for evaluating the landscape, from progress in entanglement generation and quantum memory to dedicated algorithms such as distributed quantum phase estimation. While we focus on superconducting devices with optical interconnects, our approach is general across MNQC implementations.Comment: 23 pages, white pape

arXiv.org e-Print Archive

Data Compression Algorithms for Energy-Constrained Devices in Delay Tolerant Networks

Author: Christopher M. Sadler
Margaret Martonosi
Publication venue
Publication date: 01/01/2006
Field of study

Sensor networks are fundamentally constrained b y the difficulty and energy expense of delivering information from sensors to sink. Our work has focused on garnerin g additional significant energ y improvements b y d ev isin g computationally-efficient lossless compression algorithms on the source node. These reduce the amount of data that must be passed through the network and to the sink, and thus have energy benefits that are multiplicative with the number of hops the data travels through the network. Currently, if sensor system designers want to compress acquired data, they must either develop application-specific compression algorithms or use off-the-shelf algorithms not designed for resource-constrained sensor nodes. This paper discusses the design issues involved with implementing, adapting, and customizing compression algorithms specifically geared for sensor nodes. While developing Sensor LZW (S-LZW) and some simple, but effective, variations to this algorithm, we show how different amounts of compression can lead to energy savings on both the compressing node and throughout the network and that the savings depends heavily on the radio hardware. To validate and evaluate our work, we apply it to datasets from several different real-world deployments and show that our approaches can reduce energy consumption by up to a factor of 4.5X across the network

CiteSeerX

Crossref

Recommended from our members

Verifying Correct Microarchitectural Enforcement of Memory Consistency Models

Author: Lustig D
Martonosi Margaret
Pellauer M
Publication venue
Publication date: 01/01/2015
Field of study

Memory consistency models define the rules and guarantees about the ordering and visibility of memory references on multithreaded CPUs and systems on chip. PipeCheck offers a methodology and automated tool for verifying that a particular microarchitecture correctly implements the consistency model required by its architectural specification

Princeton University Open Access Repository