28 research outputs found
A Work Efficient Parallel Algorithm for Exact Euclidean Distance Transform
A fully-parallelized work-time optimal algorithm is presented for computing the exact Euclidean Distance Transform (EDT) of a 2D binary image with the size of n x n. Unlike existing PRAM and other algorithms, this algorithm is suitable for implementation on modern SIMD architectures such as GPUs. As a fundamental operation of 2D EDT, 1D EDT is efficiently parallelized first. Specifically, the GPU algorithm for the 1D EDT, which uses CUDA binary functions such as ballot(), ffs(), clz() and shfl(), runs in O(log_32n) time and performs O(n) work. Using the 1D EDT as a fundamental operation, the fully parallelized work-time optimal 2D EDT algorithm is designed. This algorithm consists of three steps. Step 1 of the algorithm runs in O(log_32n) time and performs O(N) (N=n^2) of total work on GPU. Step 2 performs O(N) of total work and has an expected time complexity of O(logn) on GPU. Step 3 runs in O(log_32n) time and performs O(N) of total work on GPU. As far as we know, this algorithm is the first fully-parallelized and realized work-time optimal algorithm for GPUs. Experimental results show that this algorithm outperforms prior state-of-the-art GPU algorithms
Deterministic parallel algorithms for bilinear objective functions
Many randomized algorithms can be derandomized efficiently using either the
method of conditional expectations or probability spaces with low independence.
A series of papers, beginning with work by Luby (1988), showed that in many
cases these techniques can be combined to give deterministic parallel (NC)
algorithms for a variety of combinatorial optimization problems, with low time-
and processor-complexity.
We extend and generalize a technique of Luby for efficiently handling
bilinear objective functions. One noteworthy application is an NC algorithm for
maximal independent set. On a graph with edges and vertices, this
takes time and processors, nearly
matching the best randomized parallel algorithms. Other applications include
reduced processor counts for algorithms of Berger (1997) for maximum acyclic
subgraph and Gale-Berlekamp switching games.
This bilinear factorization also gives better algorithms for problems
involving discrepancy. An important application of this is to automata-fooling
probability spaces, which are the basis of a notable derandomization technique
of Sivakumar (2002). Our method leads to large reduction in processor
complexity for a number of derandomization algorithms based on
automata-fooling, including set discrepancy and the Johnson-Lindenstrauss
Lemma
A high performance 3D exact euclidean distance transform algorithm for distributed computing
The Euclidean distance transform (EDT) is used in various methods in pattern recognition, computer vision, image analysis, physics, applied mathematics and robotics. Until now, several sequential EDT algorithms have been described in the literature, however they are time- and memory-consuming for images with large resolutions. Therefore, parallel implementations of the EDT are required specially for 3D images. This paper presents a parallel implementation based on domain decomposition of a well-known 3D Euclidean distance transform algorithm, and analyzes its performance on a cluster of workstations. The use of a data compression tool to reduce communication time is investigated and discussed. Among the obtained performance results, this work shows that data compression is an essential tool for clusters with low-bandwidth networks.CNP
Energy-Efficient Algorithms on Mesh-Connected Systems with Additional Communication Links.
Energy consumption has become a critical factor constraining the design of massively parallel computers, necessitating the development of new models and energy-efficient algorithms. In this work we take a fundamental abstract model of massive parallelism, the mesh-connected computer, and extend it with additional communication links motivated by recent advances in on-chip photonic interconnects. This new means of communication with optical signals rather than electrical signals can reduce the energy and/or time of calculations by providing faster communication between distant processing elements. Processors are arranged in a two-dimensional grid with wire connections between adjacent neighbors and an additional one or two layers of noncrossing optical connections. Varying constraints on the layout of optics affect how powerful the model can be. In this dissertation, three optical interconnection layouts are defined: the optical mesh, the optical mesh of trees, and the optical pyramid. For each layout, algorithms for solving important problems are presented. Since energy usage is an important factor, running times are given in terms of a peak-power constraint, where peak power is the maximum number of processors active at any one time. These results demonstrate advantages of optics in terms of improved time and energy usage over the standard mesh computer without optics. One of the most significant results shows an optimal nonlinear time/peak-power tradeoff for sorting on the optical pyramid. This work shows asymptotic theoretical limits of computation and energy usage on an abstract model which takes physical constraints and developing interconnection technology into account.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/102474/1/ppoon_1.pd
Generating Practical Random Hyperbolic Graphs in Near-Linear Time and with Sub-Linear Memory
Random graph models, originally conceived to study the structure of networks and the emergence of their properties, have become an indispensable tool for experimental algorithmics. Amongst them, hyperbolic random graphs form a well-accepted family, yielding realistic complex networks while being both mathematically and algorithmically tractable. We introduce two generators MemGen and HyperGen for the G_{alpha,C}(n) model, which distributes n random points within a hyperbolic plane and produces m=n*d/2 undirected edges for all point pairs close by; the expected average degree d and exponent 2*alpha+1 of the power-law degree distribution are controlled by alpha>1/2 and C. Both algorithms emit a stream of edges which they do not have to store. MemGen keeps O(n) items in internal memory and has a time complexity of O(n*log(log n) + m), which is optimal for networks with an average degree of d=Omega(log(log n)). For realistic values of d=o(n / log^{1/alpha}(n)), HyperGen reduces the memory footprint to O([n^{1-alpha}*d^alpha + log(n)]*log(n)).
In an experimental evaluation, we compare HyperGen with four generators among which it is consistently the fastest. For small d=10 we measure a speed-up of 4.0 compared to the fastest publicly available generator increasing to 29.6 for d=1000. On commodity hardware, HyperGen produces 3.7e8 edges per second for graphs with 1e6 < m < 1e12 and alpha=1, utilising less than 600MB of RAM. We demonstrate nearly linear scalability on an Intel Xeon Phi
Automatic visual recognition using parallel machines
Invariant features and quick matching algorithms are two major concerns in the area of automatic visual recognition. The former reduces the size of an established model database, and the latter shortens the computation time. This dissertation, will discussed both line invariants under perspective projection and parallel implementation of a dynamic programming technique for shape recognition. The feasibility of using parallel machines can be demonstrated through the dramatically reduced time complexity.
In this dissertation, our algorithms are implemented on the AP1000 MIMD parallel machines. For processing an object with a features, the time complexity of the proposed parallel algorithm is O(n), while that of a uniprocessor is O(n2). The two applications, one for shape matching and the other for chain-code extraction, are used in order to demonstrate the usefulness of our methods.
Invariants from four general lines under perspective projection are also discussed in here. In contrast to the approach which uses the epipolar geometry, we investigate the invariants under isotropy subgroups. Theoretically speaking, two independent invariants can be found for four general lines in 3D space. In practice, we show how to obtain these two invariants from the projective images of four general lines without the need of camera calibration.
A projective invariant recognition system based on a hypothesis-generation-testing scheme is run on the hypercube parallel architecture. Object recognition is achieved by matching the scene projective invariants to the model projective invariants, called transfer. Then a hypothesis-generation-testing scheme is implemented on the hypercube parallel architecture
Distributed Monitoring of Network Properties: The Power of Hybrid Networks
We initiate the study of network monitoring algorithms in a class of hybrid networks in which the nodes are connected by an external network and an internal network (as a short form for externally and internally controlled network). While the external network lies outside of the control of the nodes (or in our case, the monitoring protocol running in them) and might be exposed to continuous changes, the internal network is fully under the control of the nodes. As an example, consider a group of users with mobile devices having access to the cell phone infrastructure. While the network formed by the WiFi connections of the devices is an external network (as its structure is not necessarily under the control of the monitoring protocol), the connections between the devices via the cell phone infrastructure represent an internal network (as it can be controlled by the monitoring protocol). Our goal is to continuously monitor properties of the external network with the help of the internal network. We present scalable distributed algorithms that efficiently monitor the number of edges, the average node degree, the clustering coefficient, the bipartiteness, and the weight of a minimum spanning tree. Their performance bounds demonstrate that monitoring the external network state with the help of an internal network can be done much more efficiently than just using the external network, as is usually done in the literature