Search CORE

26,069 research outputs found

Optimized Compilation of Aggregated Instructions for Realistic Quantum Computers

Author: Chong Fred T.
Gokhale Pranav
Hoffman Henry
Leung Nelson
Rossi Zane
Schuster David I.
Shi Yunong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/02/2019
Field of study

Recent developments in engineering and algorithms have made real-world applications in quantum computing possible in the near future. Existing quantum programming languages and compilers use a quantum assembly language composed of 1- and 2-qubit (quantum bit) gates. Quantum compiler frameworks translate this quantum assembly to electric signals (called control pulses) that implement the specified computation on specific physical devices. However, there is a mismatch between the operations defined by the 1- and 2-qubit logical ISA and their underlying physical implementation, so the current practice of directly translating logical instructions into control pulses results in inefficient, high-latency programs. To address this inefficiency, we propose a universal quantum compilation methodology that aggregates multiple logical operations into larger units that manipulate up to 10 qubits at a time. Our methodology then optimizes these aggregates by (1) finding commutative intermediate operations that result in more efficient schedules and (2) creating custom control pulses optimized for the aggregate (instead of individual 1- and 2-qubit operations). Compared to the standard gate-based compilation, the proposed approach realizes a deeper vertical integration of high-level quantum software and low-level, physical quantum hardware. We evaluate our approach on important near-term quantum applications on simulations of superconducting quantum architectures. Our proposed approach provides a mean speedup of

5\times

, with a maximum of

10\times

. Because latency directly affects the feasibility of quantum computation, our results not only improve performance but also have the potential to enable quantum computation sooner than otherwise possible.Comment: 13 pages, to apper in ASPLO

arXiv.org e-Print Archive

Crossref

Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures

Author: Hager Georg
Hammer Julian
Hofmann Johannes
Laukemann Jan
Wellein Gerhard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/10/2018
Field of study

An accurate prediction of scheduling and execution of instruction streams is a necessary prerequisite for predicting the in-core performance behavior of throughput-bound loop kernels on out-of-order processor architectures. Such predictions are an indispensable component of analytical performance models, such as the Roofline and the Execution-Cache-Memory (ECM) model, and allow a deep understanding of the performance-relevant interactions between hardware architecture and loop code. We present the Open Source Architecture Code Analyzer (OSACA), a static analysis tool for predicting the execution time of sequential loops comprising x86 instructions under the assumption of an infinite first-level cache and perfect out-of-order scheduling. We show the process of building a machine model from available documentation and semi-automatic benchmarking, and carry it out for the latest Intel Skylake and AMD Zen micro-architectures. To validate the constructed models, we apply them to several assembly kernels and compare runtime predictions with actual measurements. Finally we give an outlook on how the method may be generalized to new architectures.Comment: 11 pages, 4 figures, 7 table

arXiv.org e-Print Archive

Crossref

Polly's Polyhedral Scheduling in the Presence of Reductions

Author: Benaissa Zino
Doerfert Johannes
Hack Sebastian
Streit Kevin
Publication venue
Publication date: 01/01/2015
Field of study

The polyhedral model provides a powerful mathematical abstraction to enable effective optimization of loop nests with respect to a given optimization goal, e.g., exploiting parallelism. Unexploited reduction properties are a frequent reason for polyhedral optimizers to assume parallelism prohibiting dependences. To our knowledge, no polyhedral loop optimizer available in any production compiler provides support for reductions. In this paper, we show that leveraging the parallelism of reductions can lead to a significant performance increase. We give a precise, dependence based, definition of reductions and discuss ways to extend polyhedral optimization to exploit the associativity and commutativity of reduction computations. We have implemented a reduction-enabled scheduling approach in the Polly polyhedral optimizer and evaluate it on the standard Polybench 3.2 benchmark suite. We were able to detect and model all 52 arithmetic reductions and achieve speedups up to 2.21

\times

on a quad core machine by exploiting the multidimensional reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Resource Control for Synchronous Cooperative Threads

Author: A. Cobham
F. Boussinot
G. Berry
G. Bonfante
G. Boudol
J. Armstrong
M. Hofmann
M. Odersky
N. Carriero
N. Jones
P. Baillot
R. Amadio
S. Bellantoni
Publication venue
Publication date: 01/01/2004
Field of study

We develop new methods to statically bound the resources needed for the execution of systems of concurrent, interactive threads. Our study is concerned with a \emph{synchronous} model of interaction based on cooperative threads whose execution proceeds in synchronous rounds called instants. Our contribution is a system of compositional static analyses to guarantee that each instant terminates and to bound the size of the values computed by the system as a function of the size of its parameters at the beginning of the instant. Our method generalises an approach designed for first-order functional languages that relies on a combination of standard termination techniques for term rewriting systems and an analysis of the size of the computed values based on the notion of quasi-interpretation. We show that these two methods can be combined to obtain an explicit polynomial bound on the resources needed for the execution of the system during an instant. As a second contribution, we introduce a virtual machine and a related bytecode thus producing a precise description of the resources needed for the execution of a system. In this context, we present a suitable control flow analysis that allows to formulte the static analyses for resource control at byte code level

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Multiple man-machine interfaces

Author: Cook C. W.
Stanton L.
Publication venue
Publication date: 15/10/1981
Field of study

The multiple man machine interfaces inherent in military pilot training, their social implications, and the issue of possible negative feedback were explored. Modern technology has produced machines which can see, hear, and touch with greater accuracy and precision than human beings. Consequently, the military pilot is more a systems manager, often doing battle against a target he never sees. It is concluded that unquantifiable human activity requires motivation that is not intrinsic in a machine

NASA Technical Reports Server

Recommended from our members

Percolation scheduling with resource constraints

Author: Ebciogiu Kemal
Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 01/01/1989
Field of study

This paper presents a new approach to resource-constrained compiler extraction of fine-grain parallelism, targeted towards VLIW supercomputers, and in particular, the IBM VLIW (Very Large Instruction Word) processor. The algorithms described integrate resource limitations into Percolation Scheduling—a global parallelization technique—to deal with resource constraints, without sacrificing the generality and completeness of Percolation Scheduling in the process. This is in sharp contrast with previous approaches which either applied only to conditional-free code, or drastically limited the parallelization process by imposing relatively local heuristic resource constraints early in the scheduling process

eScholarship - University of California

Recommended from our members

A performance comparison of several superscalar processsor [sic] models with a VLIW processor

Author: Bagherzadeh Nader
Lenell John
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

Superscalar and VLIW processors can both execute multiple instructions each cycle. Each employs a different instruction scheduling method to achieve multiple instruction execution. Superscalar processors schedule instructions dynamically, and VLIW processors execute statically scheduled instructions. This paper quantitatively compares various superscalar processor architectures with a Very Long Instruction Word architecture developed at the University of California, Irvine. An architectural overview and performance analysis of the superscalar processor models and VIPER, a VLIW processor designed to take advantage of the parallelizing capabilities of Percolation Scheduling, are presented. The motivation for this comparison is to study the capability of a dynamically scheduled processor to obtain the same performance achieved by a statically scheduled processor, and examine the hardware resources required by each

eScholarship - University of California