22,715 research outputs found

    Model-driven search-based loop fusion optimization for handwritten code

    Get PDF
    The Tensor Contraction Engine (TCE) is a compiler that translates high-level, mathematical tensor contraction expressions into efficient, parallel Fortran code. A pair of optimizations in the TCE, the fusion and tiling optimizations, have proven successful for minimizing disk-to-memory traffic for dense tensor computations. While other optimizations are specific to tensor contraction expressions, these two model-driven search-based optimization algorithms could also be useful for optimizing handwritten dense array computations to minimize disk to memory traffic. In this thesis, we show how to apply the loop fusion algorithm to handwritten code in a procedural language. While in the TCE the loop fusion algorithm operated on high-level expression trees, in a standard compiler it needs to operate on abstract syntax trees. For simplicity, we use the fusion algorithm only for memory minimization instead of for minimizing disk-to-memory traffic. Also, we limit ourselves to handwritten, dense array computations in which loop bounds expressions are constant, subscript expressions are simple loop variables, and there are no common subexpressions. After type-checking, we canonicalize the abstract syntax tree to move side effects and loop-invariant code out of larger expressions. Using dataflow analysis, we then compute reaching definitions and add use-def chains to the abstract syntax tree. After undoing any partial loop fusion, a generalized loop fusion algorithm traverses the abstract syntax tree together with the use-def chains. Finally, the abstract syntax tree is rewritten to reflect the loop structure found by the loop fusion algorithm. We outline how the constraints on loop bounds expressions and array index expressions could be removed in the future using an algebraic cost model and an analysis of the iteration space using a polyhedral model

    Efficient Parallel Path Checking for Linear-Time Temporal Logic With Past and Bounds

    Full text link
    Path checking, the special case of the model checking problem where the model under consideration is a single path, plays an important role in monitoring, testing, and verification. We prove that for linear-time temporal logic (LTL), path checking can be efficiently parallelized. In addition to the core logic, we consider the extensions of LTL with bounded-future (BLTL) and past-time (LTL+Past) operators. Even though both extensions improve the succinctness of the logic exponentially, path checking remains efficiently parallelizable: Our algorithm for LTL, LTL+Past, and BLTL+Past is in AC^1(logDCFL) \subseteq NC

    Simulating quantum computation by contracting tensor networks

    Full text link
    The treewidth of a graph is a useful combinatorial measure of how close the graph is to a tree. We prove that a quantum circuit with TT gates whose underlying graph has treewidth dd can be simulated deterministically in TO(1)exp[O(d)]T^{O(1)}\exp[O(d)] time, which, in particular, is polynomial in TT if d=O(logT)d=O(\log T). Among many implications, we show efficient simulations for log-depth circuits whose gates apply to nearby qubits only, a natural constraint satisfied by most physical implementations. We also show that one-way quantum computation of Raussendorf and Briegel (Physical Review Letters, 86:5188--5191, 2001), a universal quantum computation scheme with promising physical implementations, can be efficiently simulated by a randomized algorithm if its quantum resource is derived from a small-treewidth graph.Comment: 7 figure

    A Parallel Distributed Strategy for Arraying a Scattered Robot Swarm

    Full text link
    We consider the problem of organizing a scattered group of nn robots in two-dimensional space, with geometric maximum distance DD between robots. The communication graph of the swarm is connected, but there is no central authority for organizing it. We want to arrange them into a sorted and equally-spaced array between the robots with lowest and highest label, while maintaining a connected communication network. In this paper, we describe a distributed method to accomplish these goals, without using central control, while also keeping time, travel distance and communication cost at a minimum. We proceed in a number of stages (leader election, initial path construction, subtree contraction, geometric straightening, and distributed sorting), none of which requires a central authority, but still accomplishes best possible parallelization. The overall arraying is performed in O(n)O(n) time, O(n2)O(n^2) individual messages, and O(nD)O(nD) travel distance. Implementation of the sorting and navigation use communication messages of fixed size, and are a practical solution for large populations of low-cost robots
    corecore