756 research outputs found

    Aspects of practical implementations of PRAM algorithms

    Get PDF
    The PRAM is a shared memory model of parallel computation which abstracts away from inessential engineering details. It provides a very simple architecture independent model and provides a good programming environment. Theoreticians of the computer science community have proved that it is possible to emulate the theoretical PRAM model using current technology. Solutions have been found for effectively interconnecting processing elements, for routing data on these networks and for distributing the data among memory modules without hotspots. This thesis reviews this emulation and the possibilities it provides for large scale general purpose parallel computation. The emulation employs a bridging model which acts as an interface between the actual hardware and the PRAM model. We review the evidence that such a scheme crn achieve scalable parallel performance and portable parallel software and that PRAM algorithms can be optimally implemented on such practical models. In the course of this review we presented the following new results: 1. Concerning parallel approximation algorithms, we describe an NC algorithm for finding an approximation to a minimum weight perfect matching in a complete weighted graph. The algorithm is conceptually very simple and it is also the first NC-approximation algorithm for the task with a sub-linear performance ratio. 2. Concerning graph embedding, we describe dense edge-disjoint embeddings of the complete binary tree with n leaves in the following n-node communication networks: the hypercube, the de Bruijn and shuffle-exchange networks and the 2-dimcnsional mesh. In the embeddings the maximum distance from a leaf to the root of the tree is asymptotically optimally short. The embeddings facilitate efficient implementation of many PRAM algorithms on networks employing these graphs as interconnection networks. 3. Concerning bulk synchronous algorithmics, we describe scalable transportable algorithms for the following three commonly required types of computation; balanced tree computations. Fast Fourier Transforms and matrix multiplications

    Optimal Permutation Routing for Low-dimensional Hypercubes

    Get PDF
    We consider the offline problem of routing a permutation of tokens on the nodes of a d-dimensional hypercube, under a queueless MIMD communication model (under the constraints that each hypercube edge may only communicate one token per communication step, and each node may only be occupied by a single token between communication steps). For a d-dimensional hypercube, it is easy to see that d communication steps are necessary. We develop a theory of “separability ” which enables an analytical proof that d steps suffice for the case d = 3, and facilitates an experimental verification that d steps suffice for d = 4. This result improves the upper bound for the number of communication steps required to route an arbitrary permutation on arbitrarily large hypercubes to 2d − 4. We also find an interesting side-result, that the number of possible communication steps in a d-dimensional hypercube is the same as the number of perfect matchings in a (d + 1)-dimensional hypercube, a combinatorial quantity for which there is no closed-form expression. Finally we present some experimental observations which may lead to a proof of a more general result for arbitrarily large dimension d. 2

    On Embeddings of l_1^k from Locally Decodable Codes

    Get PDF
    We show that any qq-query locally decodable code (LDC) gives a copy of 1k\ell_1^k with small distortion in the Banach space of qq-linear forms on p1N××pqN\ell_{p_1}^N\times\cdots\times\ell_{p_q}^N, provided 1/p1++1/pq11/p_1 + \cdots + 1/p_q \leq 1 and where kk, NN, and the distortion are simple functions of the code parameters. We exhibit the copy of 1k\ell_1^k by constructing a basis for it directly from "smooth" LDC decoders. Based on this, we give alternative proofs for known lower bounds on the length of 2-query LDCs. Using similar techniques, we reprove known lower bounds for larger qq. We also discuss the relation with an alternative proof, due to Pisier, of a result of Naor, Regev, and the author on cotype properties of projective tensor products of p\ell_p spaces

    How does object fatness impact the complexity of packing in d dimensions?

    Get PDF
    Packing is a classical problem where one is given a set of subsets of Euclidean space called objects, and the goal is to find a maximum size subset of objects that are pairwise non-intersecting. The problem is also known as the Independent Set problem on the intersection graph defined by the objects. Although the problem is NP-complete, there are several subexponential algorithms in the literature. One of the key assumptions of such algorithms has been that the objects are fat, with a few exceptions in two dimensions; for example, the packing problem of a set of polygons in the plane surprisingly admits a subexponential algorithm. In this paper we give tight running time bounds for packing similarly-sized non-fat objects in higher dimensions. We propose an alternative and very weak measure of fatness called the stabbing number, and show that the packing problem in Euclidean space of constant dimension d3d \geq 3 for a family of similarly sized objects with stabbing number α\alpha can be solved in 2O(n11/dα)2^{O(n^{1-1/d}\alpha)} time. We prove that even in the case of axis-parallel boxes of fixed shape, there is no 2o(n11/dα)2^{o(n^{1-1/d}\alpha)} algorithm under ETH. This result smoothly bridges the whole range of having constant-fat objects on one extreme (α=1\alpha=1) and a subexponential algorithm of the usual running time, and having very "skinny" objects on the other extreme (α=n1/d\alpha=n^{1/d}), where we cannot hope to improve upon the brute force running time of 2O(n)2^{O(n)}, and thereby characterizes the impact of fatness on the complexity of packing in case of similarly sized objects. We also study the same problem when parameterized by the solution size kk, and give a nO(k11/dα)n^{O(k^{1-1/d}\alpha)} algorithm, with an almost matching lower bound.Comment: Short version appears in ISAAC 201

    Deep Heuristic: A Heuristic for Message Broadcasting in Arbitrary Networks

    Get PDF
    With the increasing popularity of interconnection networks, efficient information dissemination has become a popular research area. Broadcasting is one of the information dissemination primitives. Finding the optimal broadcasting scheme for any originator in an arbitrary network has been proved to be an NP-Hard problem. In this thesis, a new heuristic that generates broadcast schemes in arbitrary networks is presented, which has O(|E| + |V | log |V |) time complexity. Based on computer simulations of this heuristic in some commonly used topologies and network models, and comparing the results with the best existing heuristics, we conclude that the new heuristic show comparable performances while having lower complexity

    Distance-Preserving Subgraphs of Interval Graphs

    Get PDF
    We consider the problem of finding small distance-preserving subgraphs of undirected, unweighted interval graphs that have k terminal vertices. We show that every interval graph admits a distance-preserving subgraph with O(k log k) branching vertices. We also prove a matching lower bound by exhibiting an interval graph based on bit-reversal permutation matrices. In addition, we show that interval graphs admit subgraphs with O(k) branching vertices that approximate distances up to an additive term of +1

    New Heuristic for Message Broadcasting in Arbitrary Networks

    Get PDF
    Efficient information dissemination in interconnection networks is a key research area because of the major role it plays in the modern interconnected world. A vast number of topics ranging from distributed computing to Internet communication rely on efficient information dissemination. Broadcasting is one of the information dissemination primitives. The minimum broadcast time problem in arbitrary networks has been examined since the 1970s. Finding an optimal broadcasting scheme for any originator in an arbitrary network has been proved to be an NP-Hard problem. In the current thesis, a new heuristic that generates broadcast schemes in arbitrary networks is presented. The heuristic has O(|E|log|V|) time complexity, where V is the set of nodes and E is the set of the links of the network. Computer simulations in some commonly used topologies and network models show that compared to the existing heuristics the new heuristic shows better performance in some network models, and comparable performance in other network models, while having a low complexity similar to the best existing heuristics. Another advantage of the new heuristic is that approximately one half of the vertices receive the message via a shortest path from the broadcast originator, while the rest of the vertices receive the message via a path at most three hops longer

    Efficient Interconnection Schemes for VLSI and Parallel Computation

    Get PDF
    This thesis is primarily concerned with two problems of interconnecting components in VLSI technologies. In the first case, the goal is to construct efficient interconnection networks for general-purpose parallel computers. The second problem is a more specialized problem in the design of VLSI chips, namely multilayer channel routing. In addition, a final part of this thesis provides lower bounds on the area required for VLSI implementations of finite-state machines. This thesis shows that networks based on Leiserson\u27s fat-tree architecture are nearly as good as any network built in a comparable amount of physical space. It shows that these universal networks can efficiently simulate competing networks by means of an appropriate correspondence between network components and efficient algorithms for routing messages on the universal network. In particular, a universal network of area A can simulate competing networks with O(lg^3A) slowdown (in bit-times), using a very simple randomized routing algorithm and simple network components. Alternatively, a packet routing scheme of Leighton, Maggs, and Rao can be used in conjunction with more sophisticated switching components to achieve O(lg^2 A) slowdown. Several other important aspects of universality are also discussed. It is shown that universal networks can be constructed in area linear in the number of processors, so that there is no need to restrict the density of processors in competing networks. Also results are presented for comparisons between networks of different size or with processors of different sizes (as determined by the amount of attached memory). Of particular interest is the fact that a universal network built from sufficiently small processors can simulate (with the slowdown already quoted) any competing network of comparable size regardless of the size of processors in the competing network. In addition, many of the results given do not require the usual assumption of unit wire delay. Finally, though most of the discussion is in the two-dimensional world, the results are shown to apply in three dimensions by way of a simple demonstration of general results on graph layout in three dimensions. The second main problem considered in this thesis is channel routing when many layers of interconnect are available, a scenario that is becoming more and more meaningful as chip fabrication technologies advance. This thesis describes a system MulCh for multilayer channel routing which extends the Chameleon system developed at U. C. Berkeley. Like Chameleon, MulCh divides a multilayer problem into essentially independent subproblems of at most three layers, but unlike Chameleon, MulCh considers the possibility of using partitions comprised of a single layer instead of only partitions of two or three layers. Experimental results show that MulCh often performs better than Chameleon in terms of channel width, total net length, and number of vias. In addition to a description of MulCh as implemented, this thesis provides improved algorithms for subtasks performed by MulCh, thereby indicating potential improvements in the speed and performance of multilayer channel routing. In particular, a linear time algorithm is given for determining the minimum width required for a single-layer channel routing problem, and an algorithm is given for maintaining the density of a collection of nets in logarithmic time per net insertion. The last part of this thesis shows that straightforward techniques for implementing finite-state machines are optimal in the worst case. Specifically, for any s and k, there is a deterministic finite-state machine with s states and k symbols such that any layout algorithm requires (ks lg s) area to lay out its realization. For nondeterministic machines, there is an analogous lower bound of (ks^2) area
    corecore