108 research outputs found

    Interconnection Networks Embeddings and Efficient Parallel Computations.

    Get PDF
    To obtain a greater performance, many processors are allowed to cooperate to solve a single problem. These processors communicate via an interconnection network or a bus. The most essential function of the underlying interconnection network is the efficient interchanging of messages between processes in different processors. Parallel machines based on the hypercube topology have gained a great respect in parallel computation because of its many attractive properties. Many versions of the hypercube have been introduced by many researchers mainly to enhance communications. The twisted hypercube is one of the most attractive versions of the hypercube. It preserves the important features of the hypercube and reduces its diameter by a factor of two. This dissertation investigates relations and transformations between various interconnection networks and the twisted hypercube and explore its efficiency in parallel computation. The capability of the twisted hypercube to simulate complete binary trees, complete quad trees, and rings is demonstrated and compared with the hypercube. Finally, the fault-tolerance of the twisted hypercube is investigated. We present optimal algorithms to simulate rings in a faulty twisted hypercube environment and compare that with the hypercube

    Efficient embedding of virtual hypercubes in irregular WDM optical networks

    Get PDF
    This thesis addresses one of the important issues in designing future WDM optical networks. Such networks are expected to employ an all-optical control plane for dissemination of network state information. It has recently been suggested that an efficient control plane will require non-blocking communication infrastructure and routing techniques. However, the irregular nature of most WDM networks does not lend itself to efficient non-blocking communications. It has been recently shown that hypercubes offer some very efficient non-blocking solutions for, all-to-all broadcast operations, which would be very attractive for control plane implementation. Such results can be utilized by embedding virtual structures in the physical network and doing the routing using properties of a virtual architecture. We will emphasize the hypercube due to its proven usefulness. In this thesis we propose three efficient heuristic methods for embedding a virtual hypercube in an irregular host network such that each node in the host network is either a hypercube node or a neighbor of a hypercube node. The latter will be called a ā€œsatelliteā€ or ā€œsecondaryā€ node. These schemes follow a step-by-step procedure for the embedding and for finding the physical path implementation of the virtual links while attempting to optimize certain metrics such as the number of wavelengths on each link and the average length of virtual link mappings. We have designed software that takes the adjacency list of an irregular topology as input and provides the adjacency list of a hypercube embedded in the original network. We executed this software on a number of irregular networks with different connectivities and compared the behavior of each of the three algorithms. The algorithms are compared with respect to their performance in trying to optimize several metrics. We also compare our algorithms to an already existing algorithm in the literature

    Separator-based graph embedding into multidimensional grids with small edge-congestion

    Get PDF
    We study the problem of embedding a guest graph with minimum edge-congestion into a multidimensional grid with the same size as that of the guest graph. Based on a well-known notion of graph separators, we show that an embedding with a smaller edge-congestion can be obtained if the guest graph has a smaller separator, and if the host grid has a higher but constant dimension. Specifically, we prove that any graph with NN nodes, maximum node degree Ī”Ī”, and with a node-separator of size ss, where ss is a function such that s(n)=O(nĪ±)s(n)=O(nĪ±) with 0ā‰¤Ī±1/(1āˆ’Ī±)d>1/(1āˆ’Ī±), O(Ī”logN)O(Ī”logN) if d=1/(1āˆ’Ī±)d=1/(1āˆ’Ī±), and View the MathML sourceO(Ī”NĪ±āˆ’1+1d) if d1/(1āˆ’Ī±)d>1/(1āˆ’Ī±), and matches an existential lower bound within a constant factor if dā‰¤1/(1āˆ’Ī±)dā‰¤1/(1āˆ’Ī±). Our result implies that if the guest graph has an excluded minor of a fixed size, such as a planar graph, then we can obtain an edge-congestion of O(Ī”logN)O(Ī”logN) for d=2d=2 and O(Ī”)O(Ī”) for any fixed dā‰„3dā‰„3. Moreover, if the guest graph has a fixed treewidth, such as a tree, an outerplanar graph, and a seriesā€“parallel graph, then we can obtain an edge-congestion of O(Ī”)O(Ī”) for any fixed dā‰„2dā‰„2. To design our embedding algorithm, we introduce edge-separators bounding extension , such that in partitioning a graph into isolated nodes using edge-separators recursively, the number of outgoing edges from a subgraph to be partitioned in a recursive step is bounded. We present an algorithm to construct an edge-separator with extension of O(Ī”nĪ±)O(Ī”nĪ±) from a node-separator of size O(nĪ±)O(nĪ±)

    Optical control plane: theory and algorithms

    Get PDF
    In this thesis we propose a novel way to achieve global network information dissemination in which some wavelengths are reserved exclusively for global control information exchange. We study the routing and wavelength assignment problem for the special communication pattern of non-blocking all-to-all broadcast in WDM optical networks. We provide efficient solutions to reduce the number of wavelengths needed for non-blocking all-to-all broadcast, in the absence of wavelength converters, for network information dissemination. We adopt an approach in which we consider all nodes to be tap-and-continue capable thus studying lighttrees rather than lightpaths. To the best of our knowledge, this thesis is the first to consider ā€œtap-and-continueā€ capable nodes in the context of conflict-free all-to-all broadcast. The problem of all to-all broadcast using individual lightpaths has been proven to be an NP-complete problem [6]. We provide optimal RWA solutions for conflict-free all-to-all broadcast for some particular cases of regular topologies, namely the ring, the torus and the hypercube. We make an important contribution on hypercube decomposition into edge-disjoint structures. We also present near-optimal polynomial-time solutions for the general case of arbitrary topologies. Furthermore, we apply for the first time the ā€œcactusā€ representation of all minimum edge-cuts of graphs with arbitrary topologies to the problem of all-to-all broadcast in optical networks. Using this representation recursively we obtain near-optimal results for the number of wavelengths needed by the non-blocking all-to-all broadcast. The second part of this thesis focuses on the more practical case of multi-hop RWA for non- blocking all-to-all broadcast in the presence of Optical-Electrical-Optical conversion. We propose two simple but efficient multi-hop RWA models. In addition to reducing the number of wavelengths we also concentrate on reducing the number of optical receivers, another important optical resource. We analyze these models on the ring and the hypercube, as special cases of regular topologies. Lastly, we develop a good upper-bound on the number of wavelengths in the case of non-blocking multi-hop all-to-all broadcast on networks with arbitrary topologies and offer a heuristic algorithm to achieve it. We propose a novel network partitioning method based on ā€œvirtual perfect matchingā€ for use in the RWA heuristic algorithm

    Interconnection networks for parallel and distributed computing

    Get PDF
    Parallel computers are generally either shared-memory machines or distributed- memory machines. There are currently technological limitations on shared-memory architectures and so parallel computers utilizing a large number of processors tend tube distributed-memory machines. We are concerned solely with distributed-memory multiprocessors. In such machines, the dominant factor inhibiting faster global computations is inter-processor communication. Communication is dependent upon the topology of the interconnection network, the routing mechanism, the flow control policy, and the method of switching. We are concerned with issues relating to the topology of the interconnection network. The choice of how we connect processors in a distributed-memory multiprocessor is a fundamental design decision. There are numerous, often conflicting, considerations to bear in mind. However, there does not exist an interconnection network that is optimal on all counts and trade-offs have to be made. A multitude of interconnection networks have been proposed with each of these networks having some good (topological) properties and some not so good. Existing noteworthy networks include trees, fat-trees, meshes, cube-connected cycles, butterflies, Mƶbius cubes, hypercubes, augmented cubes, k-ary n-cubes, twisted cubes, n-star graphs, (n, k)-star graphs, alternating group graphs, de Bruijn networks, and bubble-sort graphs, to name but a few. We will mainly focus on k-ary n-cubes and (n, k)-star graphs in this thesis. Meanwhile, we propose a new interconnection network called augmented k-ary n- cubes. The following results are given in the thesis.1. Let k ā‰„ 4 be even and let n ā‰„ 2. Consider a faulty k-ary n-cube Q(^k_n) in which the number of node faults f(_n) and the number of link faults f(_e) are such that f(_n) + f(_e) ā‰¤ 2n - 2. We prove that given any two healthy nodes s and e of Q(^k_n), there is a path from s to e of length at least k(^n) - 2f(_n) - 1 (resp. k(^n) - 2f(_n) - 2) if the nodes s and e have different (resp. the same) parities (the parity of a node Q(^k_n) in is the sum modulo 2 of the elements in the n-tuple over 0, 1, āˆ™āˆ™āˆ™ , k - 1 representing the node). Our result is optimal in the sense that there are pairs of nodes and fault configurations for which these bounds cannot be improved, and it answers questions recently posed by Yang, Tan and Hsu, and by Fu. Furthermore, we extend known results, obtained by Kim and Park, for the case when n = 2.2. We give precise solutions to problems posed by Wang, An, Pan, Wang and Qu and by Hsieh, Lin and Huang. In particular, we show that Q(^k_n) is bi-panconnected and edge-bipancyclic, when k ā‰„ 3 and n ā‰„ 2, and we also show that when k is odd, Q(^k_n) is m-panconnected, for m = (^n(k - 1) + 2k - 6ā€™ / ā€˜_2), and (k -1) pancyclic (these bounds are optimal). We introduce a path-shortening technique, called progressive shortening, and strengthen existing results, showing that when paths are formed using progressive shortening then these paths can be efficiently constructed and used to solve a problem relating to the distributed simulation of linear arrays and cycles in a parallel machine whose interconnection network is Q(^k_n) even in the presence of a faulty processor.3. We define an interconnection network AQ(^k_n) which we call the augmented k-ary n-cube by extending a k-ary n-cube in a manner analogous to the existing extension of an n-dimensional hypercube to an n-dimensional augmented cube. We prove that the augmented k-ary n-cube Q(^k_n) has a number of attractive properties (in the context of parallel computing). For example, we show that the augmented k-ary n-cube Q(^k_n) - is a Cayley graph (and so is vertex-symmetric); has connectivity 4n - 2, and is such that we can build a set of 4n - 2 mutually disjoint paths joining any two distinct vertices so that the path of maximal length has length at most max{{n- l)k- (n-2), k + 7}; has diameter [(^k) / (_3)] + [(^k - 1) /( _3)], when n = 2; and has diameter at most (^k) / (_4) (n+ 1), for n ā‰„ 3 and k even, and at most [(^k)/ (_4) (n + 1) + (^n) / (_4), for n ^, for n ā‰„ 3 and k odd.4. We present an algorithm which given a source node and a set of n - 1 target nodes in the (n, k)-star graph S(_n,k) where all nodes are distinct, builds a collection of n - 1 node-disjoint paths, one from each target node to the source. The collection of paths output from the algorithm is such that each path has length at most 6k - 7, and the algorithm has time complexity O(k(^3)n(^4))

    Diamond-based models for scientific visualization

    Get PDF
    Hierarchical spatial decompositions are a basic modeling tool in a variety of application domains including scientific visualization, finite element analysis and shape modeling and analysis. A popular class of such approaches is based on the regular simplex bisection operator, which bisects simplices (e.g. line segments, triangles, tetrahedra) along the midpoint of a predetermined edge. Regular simplex bisection produces adaptive simplicial meshes of high geometric quality, while simplifying the extraction of crack-free, or conforming, approximations to the original dataset. Efficient multiresolution representations for such models have been achieved in 2D and 3D by clustering sets of simplices sharing the same bisection edge into structures called diamonds. In this thesis, we introduce several diamond-based approaches for scientific visualization. We first formalize the notion of diamonds in arbitrary dimensions in terms of two related simplicial decompositions of hypercubes. This enables us to enumerate the vertices, simplices, parents and children of a diamond. In particular, we identify the number of simplices involved in conforming updates to be factorial in the dimension and group these into a linear number of subclusters of simplices that are generated simultaneously. The latter form the basis for a compact pointerless representation for conforming meshes generated by regular simplex bisection and for efficiently navigating the topological connectivity of these meshes. Secondly, we introduce the supercube as a high-level primitive on such nested meshes based on the atomic units within the underlying triangulation grid. We propose the use of supercubes to associate information with coherent subsets of the full hierarchy and demonstrate the effectiveness of such a representation for modeling multiresolution terrain and volumetric datasets. Next, we introduce Isodiamond Hierarchies, a general framework for spatial access structures on a hierarchy of diamonds that exploits the implicit hierarchical and geometric relationships of the diamond model. We use an isodiamond hierarchy to encode irregular updates to a multiresolution isosurface or interval volume in terms of regular updates to diamonds. Finally, we consider nested hypercubic meshes, such as quadtrees, octrees and their higher dimensional analogues, through the lens of diamond hierarchies. This allows us to determine the relationships involved in generating balanced hypercubic meshes and to propose a compact pointerless representation of such meshes. We also provide a local diamond-based triangulation algorithm to generate high-quality conforming simplicial meshes

    Aspects of practical implementations of PRAM algorithms

    Get PDF
    The PRAM is a shared memory model of parallel computation which abstracts away from inessential engineering details. It provides a very simple architecture independent model and provides a good programming environment. Theoreticians of the computer science community have proved that it is possible to emulate the theoretical PRAM model using current technology. Solutions have been found for effectively interconnecting processing elements, for routing data on these networks and for distributing the data among memory modules without hotspots. This thesis reviews this emulation and the possibilities it provides for large scale general purpose parallel computation. The emulation employs a bridging model which acts as an interface between the actual hardware and the PRAM model. We review the evidence that such a scheme crn achieve scalable parallel performance and portable parallel software and that PRAM algorithms can be optimally implemented on such practical models. In the course of this review we presented the following new results: 1. Concerning parallel approximation algorithms, we describe an NC algorithm for finding an approximation to a minimum weight perfect matching in a complete weighted graph. The algorithm is conceptually very simple and it is also the first NC-approximation algorithm for the task with a sub-linear performance ratio. 2. Concerning graph embedding, we describe dense edge-disjoint embeddings of the complete binary tree with n leaves in the following n-node communication networks: the hypercube, the de Bruijn and shuffle-exchange networks and the 2-dimcnsional mesh. In the embeddings the maximum distance from a leaf to the root of the tree is asymptotically optimally short. The embeddings facilitate efficient implementation of many PRAM algorithms on networks employing these graphs as interconnection networks. 3. Concerning bulk synchronous algorithmics, we describe scalable transportable algorithms for the following three commonly required types of computation; balanced tree computations. Fast Fourier Transforms and matrix multiplications

    A tetrahedral space-filling curve for non-conforming adaptive meshes

    Full text link
    We introduce a space-filling curve for triangular and tetrahedral red-refinement that can be computed using bitwise interleaving operations similar to the well-known Z-order or Morton curve for cubical meshes. To store sufficient information for random access, we define a low-memory encoding using 10 bytes per triangle and 14 bytes per tetrahedron. We present algorithms that compute the parent, children, and face-neighbors of a mesh element in constant time, as well as the next and previous element in the space-filling curve and whether a given element is on the boundary of the root simplex or not. Our presentation concludes with a scalability demonstration that creates and adapts selected meshes on a large distributed-memory system.Comment: 33 pages, 12 figures, 8 table
    • ā€¦
    corecore