342 research outputs found

    Noise-Adaptive Compiler Mappings for Noisy Intermediate-Scale Quantum Computers

    Full text link
    A massive gap exists between current quantum computing (QC) prototypes, and the size and scale required for many proposed QC algorithms. Current QC implementations are prone to noise and variability which affect their reliability, and yet with less than 80 quantum bits (qubits) total, they are too resource-constrained to implement error correction. The term Noisy Intermediate-Scale Quantum (NISQ) refers to these current and near-term systems of 1000 qubits or less. Given NISQ's severe resource constraints, low reliability, and high variability in physical characteristics such as coherence time or error rates, it is of pressing importance to map computations onto them in ways that use resources efficiently and maximize the likelihood of successful runs. This paper proposes and evaluates backend compiler approaches to map and optimize high-level QC programs to execute with high reliability on NISQ systems with diverse hardware characteristics. Our techniques all start from an LLVM intermediate representation of the quantum program (such as would be generated from high-level QC languages like Scaffold) and generate QC executables runnable on the IBM Q public QC machine. We then use this framework to implement and evaluate several optimal and heuristic mapping methods. These methods vary in how they account for the availability of dynamic machine calibration data, the relative importance of various noise parameters, the different possible routing strategies, and the relative importance of compile-time scalability versus runtime success. Using real-system measurements, we show that fine grained spatial and temporal variations in hardware parameters can be exploited to obtain an average 2.92.9x (and up to 1818x) improvement in program success rate over the industry standard IBM Qiskit compiler.Comment: To appear in ASPLOS'1

    Architectural Support for Optimizing Huge Page Selection Within the OS

    Get PDF
    © 2023 Copyright held by the owner/author(s). This document is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ This document is the Accepted version of a Published Work that appeared in final form in 56th ACM/IEEE International Symposium on Microarchitecture (MICRO), Toronto, Canada. To access the final edited and published work see https://doi.org/10.1145/3613424.3614296Irregular, memory-intensive applications often incur high translation lookaside buffer (TLB) miss rates that result in significant address translation overheads. Employing huge pages is an effective way to reduce these overheads, however in real systems the number of available huge pages can be limited when system memory is nearly full and/or fragmented. Thus, huge pages must be used selectively to back application memory. This work demonstrates that choosing memory regions that incur the most TLB misses for huge page promotion best reduces address translation overheads. We call these regions High reUse TLB-sensitive data (HUBs). Unlike prior work which relies on expensive per-page software counters to identify promotion regions, we propose new architectural support to identify these regions dynamically at application runtime. We propose a promotion candidate cache (PCC) that identifies HUB candidates based on hardware page table walks after a lastlevel TLB miss. This small, fixed-size structure tracks huge pagealigned regions (consisting of base pages), ranks them based on observed page table walk frequency, and only keeps the most frequently accessed ones. Evaluated on applications of various memory intensity, our approach successfully identifies application pages incurring the highest address translation overheads. Our approach demonstrates that with the help of a PCC, the OS only needs to promote 4% of the application footprint to achieve more than 75% of the peak achievable performance, yielding 1.19-1.33× speedups over 4KB base pages alone. In real systems where memory is typically fragmented, the PCC outperforms Linux’s page promotion policy by 14% (when 50% of total memory is fragmented) and 16% (when 90% of total memory is fragmented) respectively

    Challenges in computer architecture evaluation

    Full text link

    Architecting Noisy Intermediate-Scale Trapped Ion Quantum Computers

    Full text link
    Trapped ions (TI) are a leading candidate for building Noisy Intermediate-Scale Quantum (NISQ) hardware. TI qubits have fundamental advantages over other technologies such as superconducting qubits, including high qubit quality, coherence and connectivity. However, current TI systems are small in size, with 5-20 qubits and typically use a single trap architecture which has fundamental scalability limitations. To progress towards the next major milestone of 50-100 qubits, a modular architecture termed the Quantum Charge Coupled Device (QCCD) has been proposed. In a QCCD-based TI device, small traps are connected through ion shuttling. While the basic hardware components for such devices have been demonstrated, building a 50-100 qubit system is challenging because of a wide range of design possibilities for trap sizing, communication topology and gate implementations and the need to match diverse application resource requirements. Towards realizing QCCD systems with 50-100 qubits, we perform an extensive architectural study evaluating the key design choices of trap sizing, communication topology and operation implementation methods. We built a design toolflow which takes a QCCD architecture's parameters as input, along with a set of applications and realistic hardware performance models. Our toolflow maps the applications onto the target device and simulates their execution to compute metrics such as application run time, reliability and device noise rates. Using six applications and several hardware design points, we show that trap sizing and communication topology choices can impact application reliability by up to three orders of magnitude. Microarchitectural gate implementation choices influence reliability by another order of magnitude. From these studies, we provide concrete recommendations to tune these choices to achieve highly reliable and performant application executions.Comment: Published in ISCA 2020 https://www.iscaconf.org/isca2020/program/ (please cite the ISCA version

    Sensor Data Collection through Unmanned Aircraft Gateways

    Full text link
    Current addressing and service discovery schemes in mobile networks are not well-suited to multihop disconnected networks. This paper describes an implementation of a highly mobile ad-hoc network (MANET) that may never experience end-to-end connectivity. Spe-cial gateway nodes are described which are responsible for intelligently routing messages to their intended destination(s). These gateway nodes qualify their links and announce their status to the MANET, a simple approach to service discovery that is effective in this implementation. This implementation has been tested in an outdoor environment. I

    Architectures for Multinode Superconducting Quantum Computers

    Full text link
    Many proposals to scale quantum technology rely on modular or distributed designs where individual quantum processors, called nodes, are linked together to form one large multinode quantum computer (MNQC). One scalable method to construct an MNQC is using superconducting quantum systems with optical interconnects. However, a limiting factor of these machines will be internode gates, which may be two to three orders of magnitude noisier and slower than local operations. Surmounting the limitations of internode gates will require a range of techniques, including improvements in entanglement generation, the use of entanglement distillation, and optimized software and compilers, and it remains unclear how improvements to these components interact to affect overall system performance, what performance from each is required, or even how to quantify the performance of each. In this paper, we employ a `co-design' inspired approach to quantify overall MNQC performance in terms of hardware models of internode links, entanglement distillation, and local architecture. In the case of superconducting MNQCs with microwave-to-optical links, we uncover a tradeoff between entanglement generation and distillation that threatens to degrade performance. We show how to navigate this tradeoff, lay out how compilers should optimize between local and internode gates, and discuss when noisy quantum links have an advantage over purely classical links. Using these results, we introduce a roadmap for the realization of early MNQCs which illustrates potential improvements to the hardware and software of MNQCs and outlines criteria for evaluating the landscape, from progress in entanglement generation and quantum memory to dedicated algorithms such as distributed quantum phase estimation. While we focus on superconducting devices with optical interconnects, our approach is general across MNQC implementations.Comment: 23 pages, white pape

    Data Compression Algorithms for Energy-Constrained Devices in Delay Tolerant Networks

    No full text
    Sensor networks are fundamentally constrained b y the difficulty and energy expense of delivering information from sensors to sink. Our work has focused on garnerin g additional significant energ y improvements b y d ev isin g computationally-efficient lossless compression algorithms on the source node. These reduce the amount of data that must be passed through the network and to the sink, and thus have energy benefits that are multiplicative with the number of hops the data travels through the network. Currently, if sensor system designers want to compress acquired data, they must either develop application-specific compression algorithms or use off-the-shelf algorithms not designed for resource-constrained sensor nodes. This paper discusses the design issues involved with implementing, adapting, and customizing compression algorithms specifically geared for sensor nodes. While developing Sensor LZW (S-LZW) and some simple, but effective, variations to this algorithm, we show how different amounts of compression can lead to energy savings on both the compressing node and throughout the network and that the savings depends heavily on the radio hardware. To validate and evaluate our work, we apply it to datasets from several different real-world deployments and show that our approaches can reduce energy consumption by up to a factor of 4.5X across the network
    corecore