29,187 research outputs found

    OpenACC Based GPU Parallelization of Plane Sweep Algorithm for Geometric Intersection

    Get PDF
    Line segment intersection is one of the elementary operations in computational geometry. Complex problems in Geographic Information Systems (GIS) like finding map overlays or spatial joins using polygonal data require solving segment intersections. Plane sweep paradigm is used for finding geometric intersection in an efficient manner. However, it is difficult to parallelize due to its in-order processing of spatial events. We present a new fine-grained parallel algorithm for geometric intersection and its CPU and GPU implementation using OpenMP and OpenACC. To the best of our knowledge, this is the first work demonstrating an effective parallelization of plane sweep on GPUs. We chose compiler directive based approach for implementation because of its simplicity to parallelize sequential code. Using Nvidia Tesla P100 GPU, our implementation achieves around 40X speedup for line segment intersection problem on 40K and 80K data sets compared to sequential CGAL library

    POSH: Paris OpenSHMEM: A High-Performance OpenSHMEM Implementation for Shared Memory Systems

    Get PDF
    In this paper we present the design and implementation of POSH, an Open-Source implementation of the OpenSHMEM standard. We present a model for its communications, and prove some properties on the memory model defined in the OpenSHMEM specification. We present some performance measurements of the communication library featured by POSH and compare them with an existing one-sided communication library. POSH can be downloaded from \url{http://www.lipn.fr/~coti/POSH}. % 9 - 67Comment: This is an extended version (featuring the full proofs) of a paper accepted at ICCS'1

    Minimizing Communication in Linear Algebra

    Full text link
    In 1981 Hong and Kung proved a lower bound on the amount of communication needed to perform dense, matrix-multiplication using the conventional O(n3)O(n^3) algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and extended it to the parallel case. In both cases the lower bound may be expressed as Ω\Omega(#arithmetic operations / M\sqrt{M}), where M is the size of the fast memory (or local memory in the parallel case). Here we generalize these results to a much wider variety of algorithms, including LU factorization, Cholesky factorization, LDLTLDL^T factorization, QR factorization, algorithms for eigenvalues and singular values, i.e., essentially all direct methods of linear algebra. The proof works for dense or sparse matrices, and for sequential or parallel algorithms. In addition to lower bounds on the amount of data moved (bandwidth) we get lower bounds on the number of messages required to move it (latency). We illustrate how to extend our lower bound technique to compositions of linear algebra operations (like computing powers of a matrix), to decide whether it is enough to call a sequence of simpler optimal algorithms (like matrix multiplication) to minimize communication, or if we can do better. We give examples of both. We also show how to extend our lower bounds to certain graph theoretic problems. We point out recently designed algorithms for dense LU, Cholesky, QR, eigenvalue and the SVD problems that attain these lower bounds; implementations of LU and QR show large speedups over conventional linear algebra algorithms in standard libraries like LAPACK and ScaLAPACK. Many open problems remain.Comment: 27 pages, 2 table

    Linear Temporal Logic and Propositional Schemata, Back and Forth (extended version)

    Full text link
    This paper relates the well-known Linear Temporal Logic with the logic of propositional schemata introduced by the authors. We prove that LTL is equivalent to a class of schemata in the sense that polynomial-time reductions exist from one logic to the other. Some consequences about complexity are given. We report about first experiments and the consequences about possible improvements in existing implementations are analyzed.Comment: Extended version of a paper submitted at TIME 2011: contains proofs, additional examples & figures, additional comparison between classical LTL/schemata algorithms up to the provided translations, and an example of how to do model checking with schemata; 36 pages, 8 figure

    Intensity-based image registration using multiple distributed agents

    Get PDF
    Image registration is the process of geometrically aligning images taken from different sensors, viewpoints or instances in time. It plays a key role in the detection of defects or anomalies for automated visual inspection. A multiagent distributed blackboard system has been developed for intensity-based image registration. The images are divided into segments and allocated to agents on separate processors, allowing parallel computation of a similarity metric that measures the degree of likeness between reference and sensed images after the application of a transform. The need for a dedicated control module is removed by coordination of agents via the blackboard. Tests show that additional agents increase speed, provided the communication capacity of the blackboard is not saturated. The success of the approach in achieving registration, despite significant misalignment of the original images, is demonstrated in the detection of manufacturing defects on screen-printed plastic bottles and printed circuit boards
    • …
    corecore