329 research outputs found

    QASMBench: A Low-level QASM Benchmark Suite for NISQ Evaluation and Simulation

    Full text link
    The rapid development of quantum computing (QC) in the NISQ era urgently demands a low-level benchmark suite and insightful evaluation metrics for characterizing the properties of prototype NISQ devices, the efficiency of QC programming compilers, schedulers and assemblers, and the capability of quantum simulators in a classical computer. In this work, we fill this gap by proposing a low-level, easy-to-use benchmark suite called QASMBench based on the OpenQASM assembly representation. It consolidates commonly used quantum routines and kernels from a variety of domains including chemistry, simulation, linear algebra, searching, optimization, arithmetic, machine learning, fault tolerance, cryptography, etc., trading-off between generality and usability. To analyze these kernels in terms of NISQ device execution, in addition to circuit width and depth, we propose four circuit metrics including gate density, retention lifespan, measurement density, and entanglement variance, to extract more insights about the execution efficiency, the susceptibility to NISQ error, and the potential gain from machine-specific optimizations. Most of the QASMBench application code can be launched and verified in IBM-Q directly. With the help from q-convert, QASMBench can be evaluated on various platforms and simulation environments. QASMBench is released at: http://github.com/pnnl/QASMBench

    Faulty Behavior of Storage Elements and Its Effects on Sequential Circuits

    Get PDF
    It is often assumed that the faults in storage elements (SEs) can be modeled as output/input stuck-at faults of the element. They are implicitly considered equivalent to the stuck-at faults in the combinational logic surrounding the SE cells. Transistor-level faults in common SEs are examined here. A more accurate higher level fault model for elementary SEs that better represents the physical failures is presented. It is shown that a minimal (stuck-at) model may be adequate if only modest fault coverage is desired. The enhanced model includes some common fault behaviors of SEs that are not covered by the minimal fault model. These include data-feedthrough and clock-feedthrough behaviors, as well as problems with logic level retention. Fault models for complex SE cells can be obtained without a significant loss of information about the structure of the circuit. The detectability of feedthrough faults is considered

    Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters

    Full text link
    While parallel architectures based on clusters of Processing Elements (PEs) sharing L1 memory are widespread, there is no consensus on how lean their PE should be. Architecting PEs as vector processors holds the promise to greatly reduce their instruction fetch bandwidth, mitigating the Von Neumann Bottleneck (VNB). However, due to their historical association with supercomputers, classical vector machines include micro-architectural tricks to improve the Instruction Level Parallelism (ILP), which increases their instruction fetch and decode energy overhead. In this paper, we explore for the first time vector processing as an option to build small and efficient PEs for large-scale shared-L1 clusters. We propose Spatz, a compact, modular 32-bit vector processing unit based on the integer embedded subset of the RISC-V Vector Extension version 1.0. A Spatz-based cluster with four Multiply-Accumulate Units (MACUs) needs only 7.9 pJ per 32-bit integer multiply-accumulate operation, 40% less energy than an equivalent cluster built with four Snitch scalar cores. We analyzed Spatz' performance by integrating it within MemPool, a large-scale many-core shared-L1 cluster. The Spatz-based MemPool system achieves up to 285 GOPS when running a 256x256 32-bit integer matrix multiplication, 70% more than the equivalent Snitch-based MemPool system. In terms of energy efficiency, the Spatz-based MemPool system achieves up to 266 GOPS/W when running the same kernel, more than twice the energy efficiency of the Snitch-based MemPool system, which reaches 128 GOPS/W. Those results show the viability of lean vector processors as high-performance and energy-efficient PEs for large-scale clusters with tightly-coupled L1 memory.Comment: 9 pages. Accepted for publication in the 2022 International Conference on Computer-Aided Design (ICCAD 2022
    corecore