115 research outputs found

    Multiphase complete exchange on a circuit switched hypercube

    Get PDF
    On a distributed memory parallel computer, the complete exchange (all-to-all personalized) communication pattern requires each of n processors to send a different block of data to each of the remaining n - 1 processors. This pattern is at the heart of many important algorithms, most notably the matrix transpose. For a circuit switched hypercube of dimension d(n = 2(sup d)), two algorithms for achieving complete exchange are known. These are (1) the Standard Exchange approach that employs d transmissions of size 2(sup d-1) blocks each and is useful for small block sizes, and (2) the Optimal Circuit Switched algorithm that employs 2(sup d) - 1 transmissions of 1 block each and is best for large block sizes. A unified multiphase algorithm is described that includes these two algorithms as special cases. The complete exchange on a hypercube of dimension d and block size m is achieved by carrying out k partial exchange on subcubes of dimension d(sub i) Sigma(sup k)(sub i=1) d(sub i) = d and effective block size m(sub i) = m2(sup d-di). When k = d and all d(sub i) = 1, this corresponds to algorithm (1) above. For the case of k = 1 and d(sub i) = d, this becomes the circuit switched algorithm (2). Changing the subcube dimensions d, varies the effective block size and permits a compromise between the data permutation and block transmission overhead of (1) and the startup overhead of (2). For a hypercube of dimension d, the number of possible combinations of subcubes is p(d), the number of partitions of the integer d. This is an exponential but very slowly growing function and it is feasible over these partitions to discover the best combination for a given message size. The approach was analyzed for, and implemented on, the Intel iPSC-860 circuit switched hypercube. Measurements show good agreement with predictions and demonstrate that the multiphase approach can substantially improve performance for block sizes in the 0 to 160 byte range. This range, which corresponds to 0 to 40 floating point numbers per processor, is commonly encountered in practical numeric applications. The multiphase technique is applicable to all circuit-switched hypercubes that use the common e-cube routing strategy

    General broadcasting algorithms in one-port wormhole routed hypercubes

    Full text link
    Wormhole routing has been accepted as an efficient switching mechanism in point-to-point interconnection networks. Here the network resource, i.e. node buffers and communication channels, are effectively utilized to deliver message across the network; We consider the problem of broadcasting a message in the hypercue equipped with the wormhole switching mechanism. The model is a generalization of an earlier work and considers a broadcast path-length of {dollar}m\ (1\leq m\leq n{dollar}) in the n-cube with a single-port communication capability. In this thesis, the scheme of e-cube and a Gray code path routing and intermediate reception capability have been adopted in order to solve the problem of broadcasting in one-port wormhole routed hypercubes. Two methods have been suggested; one is based on utilizing the Gray codes (Gray code path-based routing), while the other is based on the recursive partitioning of the cube (cube-based routing). The number of routing steps in both methods are compared to those in the previous results, as well as to the lower bounds derived based on the path-length m assumption. A cube-based and a path-based algorithm give {dollar}T(R)+(k\sb{c}+1)T(m){dollar} and {dollar}k\sb{G} +T(m){dollar} routing steps, respectively. By comparison with routing steps of both algorithms, the performance of the path-based algorithm shows better than that of the cube-based; The results of this work are significant and can be used for immediate implementation in contemporary machines most of which are equipped with wormhole routing and serial communication capability

    Near-optimal broadcast in all-port wormhole-routed hypercubes using error-correcting codes

    Full text link
    A new broadcasting method is presented for hypercubes with wormhole routing mechanism. The communication model assumed allows an n-dimensional hypercube to have at most n concurrent I/O communication along its ports. It assumes a distance insensitivity of (n + 1) with no intermediate reception capability for the nodes. The approach is based on determination of the set of nodes called stations in the hypercube. Once stations are identified, node disjoint paths are formed from the source to all stations. The broadcasting is accomplished first by sending the message to all stations, which will inform the rest of the nodes. To establish node-disjoint paths between the source node and all stations, we introduce a new routing strategy. We prove that multicasting can be done in one routing step as long as the number of destination nodes are at most n in an n-dimensional hypercube. The number of broadcasting steps using our routing is equal to or smaller than that obtained in an earlier work; this number is optimal for all hypercube dimensions n ≤ 12, except for n = 10

    I/O embedding and broadcasting in star interconnection networks

    Full text link
    The issues of communication between a host or central controller and processors, in large interconnection networks are very important and have been studied in the past by several researchers. There is a plethora of problems that arise when processors are asked to exchange information on parallel computers on which processors are interconnected according to a specific topology. In robust networks, it is desirable at times to send (receive) data/control information to (from) all the processors in minimal time. This type of communication is commonly referred to as broadcasting. To speed up broadcasting in a given network without modifying its topology, certain processors called stations can be specified to act as relay agents. In this thesis, broadcasting issues in a star-based interconnection network are studied. The model adopted assumes all-port communication and wormhole switching mechanism. Initially, the problem treated is one of finding the minimum number of stations required to cover all the nodes in the star graph with i-adjacency. We consider 1-, 2-, and 3-adjacencies and determine the upper bound on the number of stations required to cover the nodes for each case. After deriving the number of stations, two algorithms are designed to broadcast the messages first from the host to stations, and then from stations to remaining nodes; In addition, a Binary-based Algorithm is designed to allow routing in the network by directly working on the binary labels assigned to the star graph. No look-up table is consulted during routing and minimum number of bits are used to represent a node label. At the end, the thesis sheds light on another algorithm for routing using parallel paths in the star network

    Broadcasting in Hypercubes under Circuit Switched Model

    Get PDF
    International audienceIn this paper, we propose a method which enables to construct almost optimal broadcast schemes on an n-dimensional hypercube in the circuit switched,-port model. In this model, an initiator must inform all the nodes of the network in a sequence of rounds. During a round, vertices communicate along arc-disjoint dipaths. Our construction is based on particular sequences of nested binary codes having the property that each code can inform the next one in a single round. This last property is insured by a ow technique and results about symmetric ow networks. We apply the method to design optimal schemes improving and generalizing the previous results

    Symmetric flows and broadcasting in hypercubes

    Get PDF
    International audienceIn this paper, we propose a method which enables to construct almost optimal broadcast schemes on an n-dimensional hypercube in the circuit switched,-port model. In this model, an initiator must inform all the nodes of the network in a sequence of rounds. During a round, vertices communicate along arc-disjoint dipaths. Our construction is based on particular sequences of nested binary codes having the property that each code can inform the next one in a single round. This last property is insured by a ow technique and results about symmetric ow networks. We apply the method to design optimal schemes improving and generalizing the previous results

    Highly parallel computation

    Get PDF
    Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed

    A powerful heuristic for telephone gossiping

    Get PDF
    A refined heuristic for computing schedules for gossiping in the telephone model is presented. The heuristic is fast: for a network with n nodes and m edges, requiring R rounds for gossiping, the running time is O(R n log(n) m) for all tested classes of graphs. This moderate time consumption allows to compute gossiping schedules for networks with more than 10,000 PUs and 100,000 connections. The heuristic is good: in practice the computed schedules never exceed the optimum by more than a few rounds. The heuristic is versatile: it can also be used for broadcasting and more general information dispersion patterns. It can handle both the unit-cost and the linear-cost model. Actually, the heuristic is so good, that for CCC, shuffle-exchange, butterfly de Bruijn, star and pancake networks the constructed gossiping schedules are better than the best theoretically derived ones. For example, for gossiping on a shuffle-exchange network with 2^{13} PUs, the former upper bound was 49 rounds, while our heuristic finds a schedule requiring 31 rounds. Also for broadcasting the heuristic improves on many formerly known results. A second heuristic, works even better for CCC, butterfly, star and pancake networks. For example, with this heuristic we found that gossiping on a pancake network with 7! PUs can be performed in 15 rounds, 2 fewer than achieved by the best theoretical construction. This second heuristic is less versatile than the first, but by refined search techniques it can tackle even larger problems, the main limitation being the storage capacity. Another advantage is that the constructed schedules can be represented concisely
    • …
    corecore