7,314 research outputs found

    Time-Optimal and Conflict-Free Mappings of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays

    Get PDF
    Most existing methods of mapping algorithms into processor arrays are restricted to the case where n-dimensional algorithms or algorithms with n nested loops are mapped into (n—l)-dimensional arrays. However, in practice, it is interesting to map n-dimensional algorithms into (k —l)-dimensional arrays where k\u3c.n. For example, many algorithms at bit-level are at least 4-dimensional (matrix multiplication, convolution, LU decomposition, etc.) and most existing bit level processor arrays are 2-dimensional. A computational conflict occurs if two or more computations of an algorithm are mapped into the same processor and the same execution time. In this paper, necessary and sufficient conditions are derived to identify all mappings without computational conflicts, based on the Hermite normal form of the mapping matrix. These conditions are used to propose methods of mapping any n-dimensional algorithm into (k— l)-dimensional arrays, kn—3, optimality of the mapping is guaranteed

    The projector algorithm: a simple parallel algorithm for computing Voronoi diagrams and Delaunay graphs

    Full text link
    The Voronoi diagram is a certain geometric data structure which has numerous applications in various scientific and technological fields. The theory of algorithms for computing 2D Euclidean Voronoi diagrams of point sites is rich and useful, with several different and important algorithms. However, this theory has been quite steady during the last few decades in the sense that no essentially new algorithms have entered the game. In addition, most of the known algorithms are serial in nature and hence cast inherent difficulties on the possibility to compute the diagram in parallel. In this paper we present the projector algorithm: a new and simple algorithm which enables the (combinatorial) computation of 2D Voronoi diagrams. The algorithm is significantly different from previous ones and some of the involved concepts in it are in the spirit of linear programming and optics. Parallel implementation is naturally supported since each Voronoi cell can be computed independently of the other cells. A new combinatorial structure for representing the cells (and any convex polytope) is described along the way and the computation of the induced Delaunay graph is obtained almost automatically.Comment: This is a major revision; re-organization and better presentation of some parts; correction of several inaccuracies; improvement of some proofs and figures; added references; modification of the title; the paper is long but more than half of it is composed of proofs and references: it is sufficient to look at pages 5, 7--11 in order to understand the algorith

    Formal synthesis of control signals for systolic arrays

    Get PDF

    Compiler Optimization Techniques for Scheduling and Reducing Overhead

    Get PDF
    Exploiting parallelism in loops in programs is an important factor in realizing the potential performance of processors today. This dissertation develops and evaluates several compiler optimizations aimed at improving the performance of loops on processors. An important feature of a class of scientific computing problems is the regularity exhibited by their access patterns. Chapter 2 presents an approach of optimizing the address generation of these problems that results in the following: (i) elimination of redundant arithmetic computation by recognizing and exploiting the presence of common sub-expressions across different iterations in stencil codes; and (ii) conversion of as many array references to scalar accesses as possible, which leads to reduced execution time, decrease in address arithmetic overhead, access to data in registers as opposed to caches, etc. With the advent of VLIW processors, the exploitation of fine-grain instruction-level parallelism has become a major challenge to optimizing compilers. Fine-grain scheduling of inner loops has received a lot of attention, little work has been done in the area of applying it to nested loops. Chapter 3 presents an approach to fine-grain scheduling of nested loops by formulating the problem of finding theminimum iteration initiation interval as one of finding a rational affine schedule for each statement in the body of a perfectly nested loop which is then solved using linear programming. Frequent synchronization on multiprocessors is expensive due to its high cost. Chapter 4 presents a method for eliminating redundant synchronization for nested loops. In nested loops, a dependence may be redundant in only a portion of the iteration space. A characterization of the non-uniformity of the redundancy of a dependence is developed in terms of the relation between the dependences and the shape and size of the iteration space. Exploiting locality is critical for achieving high level of performance on a parallel machine. Chapter 5 presents an approach using the concept of affinity regions to find transformations such that a suitable iteration-to-processor mapping can be found for a sequence of loop nests accessing shared arrays. This not only improves the data locality but significantly reduces communication overhead
    • …
    corecore