22,715 research outputs found
Model-driven search-based loop fusion optimization for handwritten code
The Tensor Contraction Engine (TCE) is a compiler that translates high-level, mathematical tensor contraction expressions into efficient, parallel Fortran code. A pair of optimizations in the TCE, the fusion and tiling optimizations, have proven successful for minimizing disk-to-memory traffic for dense tensor computations. While other optimizations are specific to tensor contraction expressions, these two model-driven search-based optimization algorithms could also be useful for optimizing handwritten dense array computations to minimize disk to memory traffic. In this thesis, we show how to apply the loop fusion algorithm to handwritten code in a procedural language. While in the TCE the loop fusion algorithm operated on high-level expression trees, in a standard compiler it needs to operate on abstract syntax trees. For simplicity, we use the fusion algorithm only for memory minimization instead of for minimizing disk-to-memory traffic. Also, we limit ourselves to handwritten, dense array computations in which loop bounds expressions are constant, subscript expressions are simple loop variables, and there are no common subexpressions. After type-checking, we canonicalize the abstract syntax tree to move side effects and loop-invariant code out of larger expressions. Using dataflow analysis, we then compute reaching definitions and add use-def chains to the abstract syntax tree. After undoing any partial loop fusion, a generalized loop fusion algorithm traverses the abstract syntax tree together with the use-def chains. Finally, the abstract syntax tree is rewritten to reflect the loop structure found by the loop fusion algorithm. We outline how the constraints on loop bounds expressions and array index expressions could be removed in the future using an algebraic cost model and an analysis of the iteration space using a polyhedral model
Efficient Parallel Path Checking for Linear-Time Temporal Logic With Past and Bounds
Path checking, the special case of the model checking problem where the model
under consideration is a single path, plays an important role in monitoring,
testing, and verification. We prove that for linear-time temporal logic (LTL),
path checking can be efficiently parallelized. In addition to the core logic,
we consider the extensions of LTL with bounded-future (BLTL) and past-time
(LTL+Past) operators. Even though both extensions improve the succinctness of
the logic exponentially, path checking remains efficiently parallelizable: Our
algorithm for LTL, LTL+Past, and BLTL+Past is in AC^1(logDCFL) \subseteq NC
Simulating quantum computation by contracting tensor networks
The treewidth of a graph is a useful combinatorial measure of how close the
graph is to a tree. We prove that a quantum circuit with gates whose
underlying graph has treewidth can be simulated deterministically in
time, which, in particular, is polynomial in if
. Among many implications, we show efficient simulations for
log-depth circuits whose gates apply to nearby qubits only, a natural
constraint satisfied by most physical implementations. We also show that
one-way quantum computation of Raussendorf and Briegel (Physical Review
Letters, 86:5188--5191, 2001), a universal quantum computation scheme with
promising physical implementations, can be efficiently simulated by a
randomized algorithm if its quantum resource is derived from a small-treewidth
graph.Comment: 7 figure
A Parallel Distributed Strategy for Arraying a Scattered Robot Swarm
We consider the problem of organizing a scattered group of robots in
two-dimensional space, with geometric maximum distance between robots. The
communication graph of the swarm is connected, but there is no central
authority for organizing it. We want to arrange them into a sorted and
equally-spaced array between the robots with lowest and highest label, while
maintaining a connected communication network.
In this paper, we describe a distributed method to accomplish these goals,
without using central control, while also keeping time, travel distance and
communication cost at a minimum. We proceed in a number of stages (leader
election, initial path construction, subtree contraction, geometric
straightening, and distributed sorting), none of which requires a central
authority, but still accomplishes best possible parallelization. The overall
arraying is performed in time, individual messages, and
travel distance. Implementation of the sorting and navigation use communication
messages of fixed size, and are a practical solution for large populations of
low-cost robots
- …