479 research outputs found

    Optimal Networks from Error Correcting Codes

    Full text link
    To address growth challenges facing large Data Centers and supercomputing clusters a new construction is presented for scalable, high throughput, low latency networks. The resulting networks require 1.5-5 times fewer switches, 2-6 times fewer cables, have 1.2-2 times lower latency and correspondingly lower congestion and packet losses than the best present or proposed networks providing the same number of ports at the same total bisection. These advantage ratios increase with network size. The key new ingredient is the exact equivalence discovered between the problem of maximizing network bisection for large classes of practically interesting Cayley graphs and the problem of maximizing codeword distance for linear error correcting codes. Resulting translation recipe converts existent optimal error correcting codes into optimal throughput networks.Comment: 14 pages, accepted at ANCS 2013 conferenc

    Symmetric Interconnection Networks from Cubic Crystal Lattices

    Full text link
    Torus networks of moderate degree have been widely used in the supercomputer industry. Tori are superb when used for executing applications that require near-neighbor communications. Nevertheless, they are not so good when dealing with global communications. Hence, typical 3D implementations have evolved to 5D networks, among other reasons, to reduce network distances. Most of these big systems are mixed-radix tori which are not the best option for minimizing distances and efficiently using network resources. This paper is focused on improving the topological properties of these networks. By using integral matrices to deal with Cayley graphs over Abelian groups, we have been able to propose and analyze a family of high-dimensional grid-based interconnection networks. As they are built over nn-dimensional grids that induce a regular tiling of the space, these topologies have been denoted \textsl{lattice graphs}. We will focus on cubic crystal lattices for modeling symmetric 3D networks. Other higher dimensional networks can be composed over these graphs, as illustrated in this research. Easy network partitioning can also take advantage of this network composition operation. Minimal routing algorithms are also provided for these new topologies. Finally, some practical issues such as implementability and preliminary performance evaluations have been addressed

    Optical control plane: theory and algorithms

    Get PDF
    In this thesis we propose a novel way to achieve global network information dissemination in which some wavelengths are reserved exclusively for global control information exchange. We study the routing and wavelength assignment problem for the special communication pattern of non-blocking all-to-all broadcast in WDM optical networks. We provide efficient solutions to reduce the number of wavelengths needed for non-blocking all-to-all broadcast, in the absence of wavelength converters, for network information dissemination. We adopt an approach in which we consider all nodes to be tap-and-continue capable thus studying lighttrees rather than lightpaths. To the best of our knowledge, this thesis is the first to consider “tap-and-continue” capable nodes in the context of conflict-free all-to-all broadcast. The problem of all to-all broadcast using individual lightpaths has been proven to be an NP-complete problem [6]. We provide optimal RWA solutions for conflict-free all-to-all broadcast for some particular cases of regular topologies, namely the ring, the torus and the hypercube. We make an important contribution on hypercube decomposition into edge-disjoint structures. We also present near-optimal polynomial-time solutions for the general case of arbitrary topologies. Furthermore, we apply for the first time the “cactus” representation of all minimum edge-cuts of graphs with arbitrary topologies to the problem of all-to-all broadcast in optical networks. Using this representation recursively we obtain near-optimal results for the number of wavelengths needed by the non-blocking all-to-all broadcast. The second part of this thesis focuses on the more practical case of multi-hop RWA for non- blocking all-to-all broadcast in the presence of Optical-Electrical-Optical conversion. We propose two simple but efficient multi-hop RWA models. In addition to reducing the number of wavelengths we also concentrate on reducing the number of optical receivers, another important optical resource. We analyze these models on the ring and the hypercube, as special cases of regular topologies. Lastly, we develop a good upper-bound on the number of wavelengths in the case of non-blocking multi-hop all-to-all broadcast on networks with arbitrary topologies and offer a heuristic algorithm to achieve it. We propose a novel network partitioning method based on “virtual perfect matching” for use in the RWA heuristic algorithm

    Interconnection networks for parallel and distributed computing

    Get PDF
    Parallel computers are generally either shared-memory machines or distributed- memory machines. There are currently technological limitations on shared-memory architectures and so parallel computers utilizing a large number of processors tend tube distributed-memory machines. We are concerned solely with distributed-memory multiprocessors. In such machines, the dominant factor inhibiting faster global computations is inter-processor communication. Communication is dependent upon the topology of the interconnection network, the routing mechanism, the flow control policy, and the method of switching. We are concerned with issues relating to the topology of the interconnection network. The choice of how we connect processors in a distributed-memory multiprocessor is a fundamental design decision. There are numerous, often conflicting, considerations to bear in mind. However, there does not exist an interconnection network that is optimal on all counts and trade-offs have to be made. A multitude of interconnection networks have been proposed with each of these networks having some good (topological) properties and some not so good. Existing noteworthy networks include trees, fat-trees, meshes, cube-connected cycles, butterflies, Möbius cubes, hypercubes, augmented cubes, k-ary n-cubes, twisted cubes, n-star graphs, (n, k)-star graphs, alternating group graphs, de Bruijn networks, and bubble-sort graphs, to name but a few. We will mainly focus on k-ary n-cubes and (n, k)-star graphs in this thesis. Meanwhile, we propose a new interconnection network called augmented k-ary n- cubes. The following results are given in the thesis.1. Let k ≥ 4 be even and let n ≥ 2. Consider a faulty k-ary n-cube Q(^k_n) in which the number of node faults f(_n) and the number of link faults f(_e) are such that f(_n) + f(_e) ≤ 2n - 2. We prove that given any two healthy nodes s and e of Q(^k_n), there is a path from s to e of length at least k(^n) - 2f(_n) - 1 (resp. k(^n) - 2f(_n) - 2) if the nodes s and e have different (resp. the same) parities (the parity of a node Q(^k_n) in is the sum modulo 2 of the elements in the n-tuple over 0, 1, ∙∙∙ , k - 1 representing the node). Our result is optimal in the sense that there are pairs of nodes and fault configurations for which these bounds cannot be improved, and it answers questions recently posed by Yang, Tan and Hsu, and by Fu. Furthermore, we extend known results, obtained by Kim and Park, for the case when n = 2.2. We give precise solutions to problems posed by Wang, An, Pan, Wang and Qu and by Hsieh, Lin and Huang. In particular, we show that Q(^k_n) is bi-panconnected and edge-bipancyclic, when k ≥ 3 and n ≥ 2, and we also show that when k is odd, Q(^k_n) is m-panconnected, for m = (^n(k - 1) + 2k - 6’ / ‘_2), and (k -1) pancyclic (these bounds are optimal). We introduce a path-shortening technique, called progressive shortening, and strengthen existing results, showing that when paths are formed using progressive shortening then these paths can be efficiently constructed and used to solve a problem relating to the distributed simulation of linear arrays and cycles in a parallel machine whose interconnection network is Q(^k_n) even in the presence of a faulty processor.3. We define an interconnection network AQ(^k_n) which we call the augmented k-ary n-cube by extending a k-ary n-cube in a manner analogous to the existing extension of an n-dimensional hypercube to an n-dimensional augmented cube. We prove that the augmented k-ary n-cube Q(^k_n) has a number of attractive properties (in the context of parallel computing). For example, we show that the augmented k-ary n-cube Q(^k_n) - is a Cayley graph (and so is vertex-symmetric); has connectivity 4n - 2, and is such that we can build a set of 4n - 2 mutually disjoint paths joining any two distinct vertices so that the path of maximal length has length at most max{{n- l)k- (n-2), k + 7}; has diameter [(^k) / (_3)] + [(^k - 1) /( _3)], when n = 2; and has diameter at most (^k) / (_4) (n+ 1), for n ≥ 3 and k even, and at most [(^k)/ (_4) (n + 1) + (^n) / (_4), for n ^, for n ≥ 3 and k odd.4. We present an algorithm which given a source node and a set of n - 1 target nodes in the (n, k)-star graph S(_n,k) where all nodes are distinct, builds a collection of n - 1 node-disjoint paths, one from each target node to the source. The collection of paths output from the algorithm is such that each path has length at most 6k - 7, and the algorithm has time complexity O(k(^3)n(^4))

    Efficient All-to-All Collective Communication Schedules for Direct-Connect Topologies

    Full text link
    The all-to-all collective communications primitive is widely used in machine learning (ML) and high performance computing (HPC) workloads, and optimizing its performance is of interest to both ML and HPC communities. All-to-all is a particularly challenging workload that can severely strain the underlying interconnect bandwidth at scale. This is mainly because of the quadratic scaling in the number of messages that must be simultaneously serviced combined with large message sizes. This paper takes a holistic approach to optimize the performance of all-to-all collective communications on supercomputer-scale direct-connect interconnects. We address several algorithmic and practical challenges in developing efficient and bandwidth-optimal all-to-all schedules for any topology, lowering the schedules to various backends and fabrics that may or may not expose additional forwarding bandwidth, establishing an upper bound on all-to-all throughput, and exploring novel topologies that deliver near-optimal all-to-all performance

    Efficient embedding of virtual hypercubes in irregular WDM optical networks

    Get PDF
    This thesis addresses one of the important issues in designing future WDM optical networks. Such networks are expected to employ an all-optical control plane for dissemination of network state information. It has recently been suggested that an efficient control plane will require non-blocking communication infrastructure and routing techniques. However, the irregular nature of most WDM networks does not lend itself to efficient non-blocking communications. It has been recently shown that hypercubes offer some very efficient non-blocking solutions for, all-to-all broadcast operations, which would be very attractive for control plane implementation. Such results can be utilized by embedding virtual structures in the physical network and doing the routing using properties of a virtual architecture. We will emphasize the hypercube due to its proven usefulness. In this thesis we propose three efficient heuristic methods for embedding a virtual hypercube in an irregular host network such that each node in the host network is either a hypercube node or a neighbor of a hypercube node. The latter will be called a “satellite” or “secondary” node. These schemes follow a step-by-step procedure for the embedding and for finding the physical path implementation of the virtual links while attempting to optimize certain metrics such as the number of wavelengths on each link and the average length of virtual link mappings. We have designed software that takes the adjacency list of an irregular topology as input and provides the adjacency list of a hypercube embedded in the original network. We executed this software on a number of irregular networks with different connectivities and compared the behavior of each of the three algorithms. The algorithms are compared with respect to their performance in trying to optimize several metrics. We also compare our algorithms to an already existing algorithm in the literature
    corecore