134 research outputs found

    The Fat-Pyramid and Universal Parallel Computation Independent of Wire Delay

    Get PDF
    This paper shows that a fat-pyramid of area Θ(A) requires only O(log A) slowdown to simulate any competing network of area A under very general conditions. The result holds regardless of the processor size (amount of attached memory) and number of processors in the competing networks as long as the limitation on total area is met. Furthermore, the result is valid regardless of the relationship between wire length and wire delay. We especially focus on elimination of the common simplifying assumption that unit time suffices to traverse a wire regardless of its length, since the assumption becomes more and more untenable as the size of parallel systems increases. This paper concentrates on simulation using transmission lines (wires along which bits can be pipelined) with the message routing schedule set up off line, but it also discusses the extension to on-line simulation. This paper also examines the capabilities of a fat-pyramid when matched against a substantially larger network and points out the surprising difficulty of doing such a comparison without the unit wire delay assumption

    Comparative Analysis of Hill Climbing Mapping Algorithms

    Get PDF
    The performance of a parallel algorithm depends in part on how well the communication structure of the algorithm is matched to the communication structure of the target parallel system. The mapping problem is the problem of generating such a match algorithmically. Solving the mapping problem optimally for any non-trivial case is NP-complete. Therefore, a heuristic approach must be used to solve the problem. Although several heuristic algorithms to this problem have been developed, their performance has been evaluated on relatively few combinations of communication and processor structures. This paper extensively evaluates the performance of hill climbing mapping algorithms through simulation on communication structures representative of existing parallel algorithms and architectures. The motivations for our study are as follows: to establish the differences in performance between variations of the hill climbing heuristic; to determine the factors which affect the performance of hill climbing with respect to optimum; and to compare hill climbing to known optimum and non-optimum mappings to determine the effectiveness of hill climbing as a mapping heuristic

    Expansion of layouts of complete binary trees into grids

    Get PDF
    AbstractLet Th be the complete binary tree of height h. Let M be the infinite grid graph with vertex set Z2, where two vertices (x1,y1) and (x2,y2) of M are adjacent if and only if |x1−x2|+|y1−y2|=1. Suppose that T is a tree which is a subdivision of Th and is also isomorphic to a subgraph of M. Motivated by issues in optimal VLSI design, we show that the point expansion ratio n(T)/n(Th)=n(T)/(2h+1−1) is bounded below by 1.122 for h sufficiently large. That is, we give bounds on how many vertices of degree 2 must be inserted along the edges of Th in order that the resulting tree can be laid out in the grid. Concerning the constructive end of VLSI design, suppose that T is a tree which is a subdivision of Th and is also isomorphic to a subgraph of the n×n grid graph. Define the expansion ratio of such a layout to be n2/n(Th)=n2/(2h+1−1). We show constructively that the minimum possible expansion ratio over all layouts of Th is bounded above by 1.4656 for sufficiently large h. That is, we give efficient layouts of complete binary trees into square grids, making improvements upon the previous work of others. We also give bounds for the point expansion and expansion problems for layouts of Th into extended grids, i.e. grids with added diagonals

    Efficient Interconnection Schemes for VLSI and Parallel Computation

    Get PDF
    This thesis is primarily concerned with two problems of interconnecting components in VLSI technologies. In the first case, the goal is to construct efficient interconnection networks for general-purpose parallel computers. The second problem is a more specialized problem in the design of VLSI chips, namely multilayer channel routing. In addition, a final part of this thesis provides lower bounds on the area required for VLSI implementations of finite-state machines. This thesis shows that networks based on Leiserson\u27s fat-tree architecture are nearly as good as any network built in a comparable amount of physical space. It shows that these universal networks can efficiently simulate competing networks by means of an appropriate correspondence between network components and efficient algorithms for routing messages on the universal network. In particular, a universal network of area A can simulate competing networks with O(lg^3A) slowdown (in bit-times), using a very simple randomized routing algorithm and simple network components. Alternatively, a packet routing scheme of Leighton, Maggs, and Rao can be used in conjunction with more sophisticated switching components to achieve O(lg^2 A) slowdown. Several other important aspects of universality are also discussed. It is shown that universal networks can be constructed in area linear in the number of processors, so that there is no need to restrict the density of processors in competing networks. Also results are presented for comparisons between networks of different size or with processors of different sizes (as determined by the amount of attached memory). Of particular interest is the fact that a universal network built from sufficiently small processors can simulate (with the slowdown already quoted) any competing network of comparable size regardless of the size of processors in the competing network. In addition, many of the results given do not require the usual assumption of unit wire delay. Finally, though most of the discussion is in the two-dimensional world, the results are shown to apply in three dimensions by way of a simple demonstration of general results on graph layout in three dimensions. The second main problem considered in this thesis is channel routing when many layers of interconnect are available, a scenario that is becoming more and more meaningful as chip fabrication technologies advance. This thesis describes a system MulCh for multilayer channel routing which extends the Chameleon system developed at U. C. Berkeley. Like Chameleon, MulCh divides a multilayer problem into essentially independent subproblems of at most three layers, but unlike Chameleon, MulCh considers the possibility of using partitions comprised of a single layer instead of only partitions of two or three layers. Experimental results show that MulCh often performs better than Chameleon in terms of channel width, total net length, and number of vias. In addition to a description of MulCh as implemented, this thesis provides improved algorithms for subtasks performed by MulCh, thereby indicating potential improvements in the speed and performance of multilayer channel routing. In particular, a linear time algorithm is given for determining the minimum width required for a single-layer channel routing problem, and an algorithm is given for maintaining the density of a collection of nets in logarithmic time per net insertion. The last part of this thesis shows that straightforward techniques for implementing finite-state machines are optimal in the worst case. Specifically, for any s and k, there is a deterministic finite-state machine with s states and k symbols such that any layout algorithm requires (ks lg s) area to lay out its realization. For nondeterministic machines, there is an analogous lower bound of (ks^2) area

    Aspects of practical implementations of PRAM algorithms

    Get PDF
    The PRAM is a shared memory model of parallel computation which abstracts away from inessential engineering details. It provides a very simple architecture independent model and provides a good programming environment. Theoreticians of the computer science community have proved that it is possible to emulate the theoretical PRAM model using current technology. Solutions have been found for effectively interconnecting processing elements, for routing data on these networks and for distributing the data among memory modules without hotspots. This thesis reviews this emulation and the possibilities it provides for large scale general purpose parallel computation. The emulation employs a bridging model which acts as an interface between the actual hardware and the PRAM model. We review the evidence that such a scheme crn achieve scalable parallel performance and portable parallel software and that PRAM algorithms can be optimally implemented on such practical models. In the course of this review we presented the following new results: 1. Concerning parallel approximation algorithms, we describe an NC algorithm for finding an approximation to a minimum weight perfect matching in a complete weighted graph. The algorithm is conceptually very simple and it is also the first NC-approximation algorithm for the task with a sub-linear performance ratio. 2. Concerning graph embedding, we describe dense edge-disjoint embeddings of the complete binary tree with n leaves in the following n-node communication networks: the hypercube, the de Bruijn and shuffle-exchange networks and the 2-dimcnsional mesh. In the embeddings the maximum distance from a leaf to the root of the tree is asymptotically optimally short. The embeddings facilitate efficient implementation of many PRAM algorithms on networks employing these graphs as interconnection networks. 3. Concerning bulk synchronous algorithmics, we describe scalable transportable algorithms for the following three commonly required types of computation; balanced tree computations. Fast Fourier Transforms and matrix multiplications

    Treewidth and related graph parameters

    Get PDF
    For modeling some practical problems, graphs play very important roles. Since many modeled problems can be NP-hard in general, some restrictions for inputs are required. Bounding a graph parameter of the inputs is one of the successful approaches. We study this approach in this thesis. More precisely, we study two graph parameters, spanning tree congestion and security number, that are related to treewidth. Let G be a connected graph and T be a spanning tree of G. For e ∈ E(T), the congestion of e is the number of edges in G connecting two components of T − e. The edge congestion of G in T is the maximum congestion over all edges in T. The spanning tree congestion of G is the minimum congestion of G in its spanning trees. In this thesis, we show the spanning tree congestion for the complete k-partite graphs, the two-dimensional tori, and the twodimensional Hamming graphs. We also address lower bounds of spanning tree congestion for the multi-dimensional hypercubes, the multi-dimensional grids, and the multi-dimensional Hamming graphs. The security number of a graph is the cardinality of a smallest vertex subset of the graph such that any “attack” on the subset is “defendable.” In this thesis, we determine the security number of two-dimensional cylinders and tori. This result settles a conjecture of Brigham, Dutton and Hedetniemi [Discrete Appl. Math. 155 (2007) 1708–1714]. We also show that every outerplanar graph has security number at most three. Additionally, we present lower and upper bounds for some classes of graphs.学位記番号:工博甲39

    Interconnection Networks Embeddings and Efficient Parallel Computations.

    Get PDF
    To obtain a greater performance, many processors are allowed to cooperate to solve a single problem. These processors communicate via an interconnection network or a bus. The most essential function of the underlying interconnection network is the efficient interchanging of messages between processes in different processors. Parallel machines based on the hypercube topology have gained a great respect in parallel computation because of its many attractive properties. Many versions of the hypercube have been introduced by many researchers mainly to enhance communications. The twisted hypercube is one of the most attractive versions of the hypercube. It preserves the important features of the hypercube and reduces its diameter by a factor of two. This dissertation investigates relations and transformations between various interconnection networks and the twisted hypercube and explore its efficiency in parallel computation. The capability of the twisted hypercube to simulate complete binary trees, complete quad trees, and rings is demonstrated and compared with the hypercube. Finally, the fault-tolerance of the twisted hypercube is investigated. We present optimal algorithms to simulate rings in a faulty twisted hypercube environment and compare that with the hypercube
    corecore