28 research outputs found

    A Work Efficient Parallel Algorithm for Exact Euclidean Distance Transform

    Get PDF
    A fully-parallelized work-time optimal algorithm is presented for computing the exact Euclidean Distance Transform (EDT) of a 2D binary image with the size of n x n. Unlike existing PRAM and other algorithms, this algorithm is suitable for implementation on modern SIMD architectures such as GPUs. As a fundamental operation of 2D EDT, 1D EDT is efficiently parallelized first. Specifically, the GPU algorithm for the 1D EDT, which uses CUDA binary functions such as ballot(), ffs(), clz() and shfl(), runs in O(log_32n) time and performs O(n) work. Using the 1D EDT as a fundamental operation, the fully parallelized work-time optimal 2D EDT algorithm is designed. This algorithm consists of three steps. Step 1 of the algorithm runs in O(log_32n) time and performs O(N) (N=n^2) of total work on GPU. Step 2 performs O(N) of total work and has an expected time complexity of O(logn) on GPU. Step 3 runs in O(log_32n) time and performs O(N) of total work on GPU. As far as we know, this algorithm is the first fully-parallelized and realized work-time optimal algorithm for GPUs. Experimental results show that this algorithm outperforms prior state-of-the-art GPU algorithms

    Deterministic parallel algorithms for bilinear objective functions

    Full text link
    Many randomized algorithms can be derandomized efficiently using either the method of conditional expectations or probability spaces with low independence. A series of papers, beginning with work by Luby (1988), showed that in many cases these techniques can be combined to give deterministic parallel (NC) algorithms for a variety of combinatorial optimization problems, with low time- and processor-complexity. We extend and generalize a technique of Luby for efficiently handling bilinear objective functions. One noteworthy application is an NC algorithm for maximal independent set. On a graph GG with mm edges and nn vertices, this takes O~(log2n)\tilde O(\log^2 n) time and (m+n)no(1)(m + n) n^{o(1)} processors, nearly matching the best randomized parallel algorithms. Other applications include reduced processor counts for algorithms of Berger (1997) for maximum acyclic subgraph and Gale-Berlekamp switching games. This bilinear factorization also gives better algorithms for problems involving discrepancy. An important application of this is to automata-fooling probability spaces, which are the basis of a notable derandomization technique of Sivakumar (2002). Our method leads to large reduction in processor complexity for a number of derandomization algorithms based on automata-fooling, including set discrepancy and the Johnson-Lindenstrauss Lemma

    A high performance 3D exact euclidean distance transform algorithm for distributed computing

    Get PDF
    The Euclidean distance transform (EDT) is used in various methods in pattern recognition, computer vision, image analysis, physics, applied mathematics and robotics. Until now, several sequential EDT algorithms have been described in the literature, however they are time- and memory-consuming for images with large resolutions. Therefore, parallel implementations of the EDT are required specially for 3D images. This paper presents a parallel implementation based on domain decomposition of a well-known 3D Euclidean distance transform algorithm, and analyzes its performance on a cluster of workstations. The use of a data compression tool to reduce communication time is investigated and discussed. Among the obtained performance results, this work shows that data compression is an essential tool for clusters with low-bandwidth networks.CNP

    Energy-Efficient Algorithms on Mesh-Connected Systems with Additional Communication Links.

    Full text link
    Energy consumption has become a critical factor constraining the design of massively parallel computers, necessitating the development of new models and energy-efficient algorithms. In this work we take a fundamental abstract model of massive parallelism, the mesh-connected computer, and extend it with additional communication links motivated by recent advances in on-chip photonic interconnects. This new means of communication with optical signals rather than electrical signals can reduce the energy and/or time of calculations by providing faster communication between distant processing elements. Processors are arranged in a two-dimensional grid with wire connections between adjacent neighbors and an additional one or two layers of noncrossing optical connections. Varying constraints on the layout of optics affect how powerful the model can be. In this dissertation, three optical interconnection layouts are defined: the optical mesh, the optical mesh of trees, and the optical pyramid. For each layout, algorithms for solving important problems are presented. Since energy usage is an important factor, running times are given in terms of a peak-power constraint, where peak power is the maximum number of processors active at any one time. These results demonstrate advantages of optics in terms of improved time and energy usage over the standard mesh computer without optics. One of the most significant results shows an optimal nonlinear time/peak-power tradeoff for sorting on the optical pyramid. This work shows asymptotic theoretical limits of computation and energy usage on an abstract model which takes physical constraints and developing interconnection technology into account.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/102474/1/ppoon_1.pd

    Progress Report : 1991 - 1994

    Get PDF

    Generating Practical Random Hyperbolic Graphs in Near-Linear Time and with Sub-Linear Memory

    Get PDF
    Random graph models, originally conceived to study the structure of networks and the emergence of their properties, have become an indispensable tool for experimental algorithmics. Amongst them, hyperbolic random graphs form a well-accepted family, yielding realistic complex networks while being both mathematically and algorithmically tractable. We introduce two generators MemGen and HyperGen for the G_{alpha,C}(n) model, which distributes n random points within a hyperbolic plane and produces m=n*d/2 undirected edges for all point pairs close by; the expected average degree d and exponent 2*alpha+1 of the power-law degree distribution are controlled by alpha>1/2 and C. Both algorithms emit a stream of edges which they do not have to store. MemGen keeps O(n) items in internal memory and has a time complexity of O(n*log(log n) + m), which is optimal for networks with an average degree of d=Omega(log(log n)). For realistic values of d=o(n / log^{1/alpha}(n)), HyperGen reduces the memory footprint to O([n^{1-alpha}*d^alpha + log(n)]*log(n)). In an experimental evaluation, we compare HyperGen with four generators among which it is consistently the fastest. For small d=10 we measure a speed-up of 4.0 compared to the fastest publicly available generator increasing to 29.6 for d=1000. On commodity hardware, HyperGen produces 3.7e8 edges per second for graphs with 1e6 < m < 1e12 and alpha=1, utilising less than 600MB of RAM. We demonstrate nearly linear scalability on an Intel Xeon Phi

    Fundamental Computational Geometry on the GPU

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Automatic visual recognition using parallel machines

    Get PDF
    Invariant features and quick matching algorithms are two major concerns in the area of automatic visual recognition. The former reduces the size of an established model database, and the latter shortens the computation time. This dissertation, will discussed both line invariants under perspective projection and parallel implementation of a dynamic programming technique for shape recognition. The feasibility of using parallel machines can be demonstrated through the dramatically reduced time complexity. In this dissertation, our algorithms are implemented on the AP1000 MIMD parallel machines. For processing an object with a features, the time complexity of the proposed parallel algorithm is O(n), while that of a uniprocessor is O(n2). The two applications, one for shape matching and the other for chain-code extraction, are used in order to demonstrate the usefulness of our methods. Invariants from four general lines under perspective projection are also discussed in here. In contrast to the approach which uses the epipolar geometry, we investigate the invariants under isotropy subgroups. Theoretically speaking, two independent invariants can be found for four general lines in 3D space. In practice, we show how to obtain these two invariants from the projective images of four general lines without the need of camera calibration. A projective invariant recognition system based on a hypothesis-generation-testing scheme is run on the hypercube parallel architecture. Object recognition is achieved by matching the scene projective invariants to the model projective invariants, called transfer. Then a hypothesis-generation-testing scheme is implemented on the hypercube parallel architecture

    Distributed Monitoring of Network Properties: The Power of Hybrid Networks

    Get PDF
    We initiate the study of network monitoring algorithms in a class of hybrid networks in which the nodes are connected by an external network and an internal network (as a short form for externally and internally controlled network). While the external network lies outside of the control of the nodes (or in our case, the monitoring protocol running in them) and might be exposed to continuous changes, the internal network is fully under the control of the nodes. As an example, consider a group of users with mobile devices having access to the cell phone infrastructure. While the network formed by the WiFi connections of the devices is an external network (as its structure is not necessarily under the control of the monitoring protocol), the connections between the devices via the cell phone infrastructure represent an internal network (as it can be controlled by the monitoring protocol). Our goal is to continuously monitor properties of the external network with the help of the internal network. We present scalable distributed algorithms that efficiently monitor the number of edges, the average node degree, the clustering coefficient, the bipartiteness, and the weight of a minimum spanning tree. Their performance bounds demonstrate that monitoring the external network state with the help of an internal network can be done much more efficiently than just using the external network, as is usually done in the literature
    corecore