101 research outputs found

    Tight bounds for some problems in computational geometry: the complete sub-logarithmic parallel time range

    Get PDF
    There are a number of fundamental problems in computational geometry for which work-optimal algorithms exist which have a parallel running time of O(logn)O(\log n) in the PRAM model. These include problems like two and three dimensional convex-hulls, trapezoidal decomposition, arrangement construction, dominance among others. Further improvements in running time to sub-logarithmic range were not considered likely because of their close relationship to sorting for which an Ω(logn/loglogn)\Omega (\log n/\log\log n ) is known to hold even with a polynomial number of processors. However, with recent progress in padded-sort algorithms, which circumvents the conventional lower-bounds, there arises a natural question about speeding up algorithms for the above-mentioned geometric problems (with appropriate modifications in the output specification). We present randomized parallel algorithms for some fundamental problems like convex-hulls and trapezoidal decomposition which execute in time O(logn/logk)O( \log n/\log k) in an nknk (k>1k > 1) processor CRCW PRAM. Our algorithms do not make any assumptions about the input distribution. Our work relies heavily on results on padded-sorting and some earlier results of Reif and Sen [28, 27]. We further prove a matching lower-bound for these problems in the bounded degree decision tree

    Visibility-Related Problems on Parallel Computational Models

    Get PDF
    Visibility-related problems find applications in seemingly unrelated and diverse fields such as computer graphics, scene analysis, robotics and VLSI design. While there are common threads running through these problems, most existing solutions do not exploit these commonalities. With this in mind, this thesis identifies these common threads and provides a unified approach to solve these problems and develops solutions that can be viewed as template algorithms for an abstract computational model. A template algorithm provides an architecture independent solution for a problem, from which solutions can be generated for diverse computational models. In particular, the template algorithms presented in this work lead to optimal solutions to various visibility-related problems on fine-grain mesh connected computers such as meshes with multiple broadcasting and reconfigurable meshes, and also on coarse-grain multicomputers. Visibility-related problems studied in this thesis can be broadly classified into Object Visibility and Triangulation problems. To demonstrate the practical relevance of these algorithms, two of the fundamental template algorithms identified as powerful tools in almost every algorithm designed in this work were implemented on an IBM-SP2. The code was developed in the C language, using MPI, and can easily be ported to many commercially available parallel computers

    Sorting signed permutations by reversals, revisited

    Get PDF
    AbstractThe problem of sorting signed permutations by reversals (SBR) is a fundamental problem in computational molecular biology. The goal is, given a signed permutation, to find a shortest sequence of reversals that transforms it into the positive identity permutation, where a reversal is the operation of taking a segment of the permutation, reversing it, and flipping the signs of its elements.In this paper we describe a randomized algorithm for SBR. The algorithm tries to sort the permutation by repeatedly performing a random oriented reversal. This process is in fact a random walk on the graph where permutations are the nodes and an arc from π to π′ corresponds to an oriented reversal that transforms π to π′. We show that if this random walk stops at the identity permutation, then we have found a shortest sequence. We give empirical evidence that this process indeed succeeds with high probability on a random permutation.To implement our algorithm we describe a data structure to maintain a permutation, that allows to draw an oriented reversal uniformly at random, and perform it in sub-linear time. With this data structure we can implement the random walk in O(n3/2logn) time, thus obtaining an algorithm for SBR that almost always runs in sub-quadratic time. The data structures we present may also be of independent interest for developing other algorithms for SBR, and for other problems.Finally, we present the first efficient parallel algorithm for SBR. We obtain this result by developing a fast implementation of the recent algorithm of Bergeron (Proceedings of CPM, 2001, pp. 106–117) for sorting signed permutations by reversals that is parallelizable. Our implementation runs in O(n2logn) time on a regular RAM, and in O(nlogn) time on a PRAM using n processors

    Optimal parallel algorithms for rectilinear link-distance problems

    Get PDF
    We provide optimal parallel solutions to several link-distance problems set in trapezoided rectilinear polygons. All our main parallel algorithms are deterministic and designed to run on the exclusive read exclusive write parallel random access machine (EREW PRAM). Let P be a trapezoided rectilinear simple polygon with n vertices. In O(log n) time using O(n/log n) processors we can optimally compute: 1. Minimum réctilinear link paths, or shortest paths in the L1 metric from any point in P to all vertices of P. 2. Minimum rectilinear link paths from any segment inside P to all vertices of P. 3. The rectilinear window (histogram) partition of P. 4. Both covering radii and vertex intervals for any diagonal of P. 5. A data structure to support rectilinear link-distance queries between any two points in P (queries can be answered optimally in O(log n) time by uniprocessor). Our solution to 5 is based on a new linear-time sequential algorithm for this problem which is also provided here. This improves on the previously best-known sequential algorithm for this problem which used O(n log n) time and space.5 We develop techniques for solving link-distance problems in parallel which are expected to find applications in the design of other parallel computational geometry algorithms. We employ these parallel techniques, for example, to compute (on a CREW PRAM) optimally the link diameter, the link center, and the central diagonal of a rectilinear polygon

    External-Memory Computational Geometry

    Get PDF
    (c) 1993 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.In this paper we give new techniques for designing e cient algorithms for computational geometry prob- lems that are too large to be solved in internal mem- ory. We use these techniques to develop optimal and practical algorithms for a number of important large- scale problems. We discuss our algorithms primarily in the context of single processor/single disk machines, a domain in which they are not only the rst known optimal results but also of tremendous practical value. Our methods also produce the rst known optimal al- gorithms for a wide range of two-level and hierarchical multilevel memory models, including parallel models. The algorithms are optimal both in terms of I/O cost and internal computation

    Parallel Geometric Algorithms.

    Get PDF
    Geometric algorithms have many important applications in science and technology. Some geometric problems require fast response time that could not be achieved by traditional sequential algorithms. However, the speed, power and versatility of parallel computers can be exploited to develop efficient geometric algorithms as shown in this dissertation. Our study focuses on designing efficient parallel geometric algorithms and analyzing their computational complexities. In this research, first we developed a parallel algorithm to find the maxima of a set of N points in the d-dimensional space, d 3˘e\u3e 3, on a hypercube SIMD machine. Our algorithm is a parallel implementation from the sequential algorithm given by Kung, Luccio, and Preparata (KLP75). Although the time complexity, O(N\sp{0.77}\log\sp{d-1}\ N), of our algorithm is not optimal, it is the first sublinear time algorithm for solving the high dimensional maxima problem. Next, we developed another parallel algorithm to construct the Voronoi diagram of a point set in the plane. Our algorithm is based on the sequential algorithm given by Brown (B79). We use an N×NN\times N mesh of trees (MOT) SIMD computer and get the optimal time complexity O(log\sp2N).. Finally, we developed another MOT algorithm to solve the congruent pattern problem. Given a simple polygon P with k edges and a planar graph G with N edges, N3˘ek.N\u3ek. The problem is to find all the patterns (cycles) in G which are congruent to P. Our algorithm is based on the CREW PRAM algorithm given by Jeong, Kim, and Baek (JKB92). We also use an N×NN\times N MOT and get the optimal time complexity O(klogN).O(k\log N).

    Scaling Simulations of Reconfigurable Meshes.

    Get PDF
    This dissertation deals with reconfigurable bus-based models, a new type of parallel machine that uses dynamically alterable connections between processors to allow efficient communication and to perform fast computations. We focus this work on the Reconfigurable Mesh (R-Mesh), one of the most widely studied reconfigurable models. We study the ability of the R-Mesh to adapt an algorithm instance of an arbitrary size to run on a given smaller model size without significant loss of efficiency. A scaling simulation achieves this adaptation, and the simulation overhead expresses the efficiency of the simulation. We construct a scaling simulation for the Fusing-Restricted Reconfigurable Mesh (FR-Mesh), an important restriction of the R-Mesh. The overhead of this simulation depends only on the simulating machine size and not on the simulated machine size. The results of this scaling simulation extend to a variety of concurrent write rules and also translate to an improved scaling simulation of the R-Mesh itself. We present a bus linearization procedure that transforms an arbitrary non-linear bus configuration of an R-Mesh into an equivalent acyclic linear bus configuration implementable on an Linear Reconfigurable Mesh (LR-Mesh), a weaker version of the R-Mesh. This procedure gives the algorithm designer the liberty of using buses of arbitrary shape, while automatically translating the algorithm to run on a simpler platform. We illustrate our bus linearization method through two important applications. The first leads to a faster scaling simulation of the R-Mesh. The second application adapts algorithms designed for R-Meshes to run on models with pipelined optical buses. We also present a simulation of a Directional Reconfigurable Mesh (DR-Mesh) on an LR-Mesh. This simulation has a much better efficiency compared to previous work. In addition to the LR-Mesh, this simulation also runs on models that use pipelined optical buses

    Models for Parallel Computation in Multi-Core, Heterogeneous, and Ultra Wide-Word Architectures

    Get PDF
    Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a chip being widely available and an increasing number of cores predicted for the future. In addition, the decreasing costs and increasing programmability of Graphic Processing Units (GPUs) have made these an accessible source of parallel processing power in general purpose computing. Among the many research challenges that this scenario has raised are the fundamental problems related to theoretical modeling of computation in these architectures. In this thesis we study several aspects of computation in modern parallel architectures, from modeling of computation in multi-cores and heterogeneous platforms, to multi-core cache management strategies, through the proposal of an architecture that exploits bit-parallelism on thousands of bits. Observing that in practice multi-cores have a small number of cores, we propose a model for low-degree parallelism for these architectures. We argue that assuming a small number of processors (logarithmic in a problem's input size) simplifies the design of parallel algorithms. We show that in this model a large class of divide-and-conquer and dynamic programming algorithms can be parallelized with simple modifications to sequential programs, while achieving optimal parallel speedups. We further explore low-degree-parallelism in computation, providing evidence of fundamental differences in practice and theory between systems with a sublinear and linear number of processors, and suggesting a sharp theoretical gap between the classes of problems that are efficiently parallelizable in each case. Efficient strategies to manage shared caches play a crucial role in multi-core performance. We propose a model for paging in multi-core shared caches, which extends classical paging to a setting in which several threads share the cache. We show that in this setting traditional cache management policies perform poorly, and that any effective strategy must partition the cache among threads, with a partition that adapts dynamically to the demands of each thread. Inspired by the shared cache setting, we introduce the minimum cache usage problem, an extension to classical sequential paging in which algorithms must account for the amount of cache they use. This cache-aware model seeks algorithms with good performance in terms of faults and the amount of cache used, and has applications in energy efficient caching and in shared cache scenarios. The wide availability of GPUs has added to the parallel power of multi-cores, however, most applications underutilize the available resources. We propose a model for hybrid computation in heterogeneous systems with multi-cores and GPU, and describe strategies for generic parallelization and efficient scheduling of a large class of divide-and-conquer algorithms. Lastly, we introduce the Ultra-Wide Word architecture and model, an extension of the word-RAM model, that allows for constant time operations on thousands of bits in parallel. We show that a large class of existing algorithms can be implemented in the Ultra-Wide Word model, achieving speedups comparable to those of multi-threaded computations, while avoiding the more difficult aspects of parallel programming

    Connectivity Oracles for Graphs Subject to Vertex Failures

    Full text link
    We introduce new data structures for answering connectivity queries in graphs subject to batched vertex failures. A deterministic structure processes a batch of ddd\leq d_{\star} failed vertices in O~(d3)\tilde{O}(d^3) time and thereafter answers connectivity queries in O(d)O(d) time. It occupies space O(dmlogn)O(d_{\star} m\log n). We develop a randomized Monte Carlo version of our data structure with update time O~(d2)\tilde{O}(d^2), query time O(d)O(d), and space O~(m)\tilde{O}(m) for any failure bound dnd\le n. This is the first connectivity oracle for general graphs that can efficiently deal with an unbounded number of vertex failures. We also develop a more efficient Monte Carlo edge-failure connectivity oracle. Using space O(nlog2n)O(n\log^2 n), dd edge failures are processed in O(dlogdloglogn)O(d\log d\log\log n) time and thereafter, connectivity queries are answered in O(loglogn)O(\log\log n) time, which are correct w.h.p. Our data structures are based on a new decomposition theorem for an undirected graph G=(V,E)G=(V,E), which is of independent interest. It states that for any terminal set UVU\subseteq V we can remove a set BB of U/(s2)|U|/(s-2) vertices such that the remaining graph contains a Steiner forest for UBU-B with maximum degree ss
    corecore