38 research outputs found

    Scheduling multicasts on unit-capacity trees and meshes

    Get PDF
    This paper studies the multicast routing and admission control problem on unit-capacity tree and mesh topologies in the throughput-model. The problem is a generalization of the edge-disjoint paths problem and is NP-hard both on trees and meshes. We study both the offline and the online version of the problem: In the offline setting, we give the first constant-factor approximation algorithm for trees, and an O((log log n)^2)-factor approximation algorithm for meshes. In the online setting, we give the first polylogarithmic competitive online algorithm for tree and mesh topologies. No polylogarithmic-competitive algorithm is possible on general network topologies [Bartal,Fiat,Leonardi, 96], and there exists a polylogarithmic lower bound on the competitive ratio of any online algorithm on tree topologies [Awerbuch,Azar,Fiat,Leighton, 96]. We prove the same lower bound for meshes

    Scheduling multicasts on unit-capacity trees and meshes

    Get PDF
    This paper studies the multicast routing and admission control problem on unit-capacity tree and mesh topologies in the throughput model. The problem is a generalization of the edge-disjoint paths problem and is NP-hard both on trees and meshes. We study both the offline and the online version of the problem: In the offline setting, we give the first constant-factor approximation algorithm for trees, and an O((log log n)2)-factor approximation algorithm for meshes. In the online setting, we give the first polylogarithmic competitive online algorithm for tree and mesh topologies. No polylogarithmic-competitive algorithm is possible on general network topologies (Lower bounds for on-line graph problems with application to on-line circuits and optical routing, in: Proceedings of the 28th ACM Symposium on Theory of Computing, 1996, pp. 531-540) and there exists a polygarithmic lower bound on the competitive ratio of any online algorithm on tree topologies (Making commitments in the face of uncertainity: how to pick a winner almost every time, in: Proceedings of the 28th Annual ACM Symposium on Theory of Computing, 1996, pp. 519-530). We prove the same lower bound for meshes

    Low Latency Multimedia Broadcast in Multi-rate Wireless Meshes

    Get PDF
    Abstract — In a multi-rate wireless network, a node can dynamically adjust its link transmission rate by switching between different modulation schemes. For the current IEEE802.11a/b/g standards, this rate adjustment is limited to unicast traffic only while multicast and broadcast traffic is always transmitted at the lowest possible rate. In this paper, we consider a novel type of multi-rate mesh networks where a node can dynamically adjust its link layer multicast rates to its neighbours. In particular, we consider the problem of realising low latency network-wide broadcast in this type of multi-rate wireless meshes. We will first show that the multi-rate broadcast problem is significantly different from the single-rate case. We will then present an algorithm for achieving low latency broadcast in a multi-rate mesh which exploits both wireless broadcast advantage and the multi-rate nature of the network. I

    Low-Latency Broadcast in Multirate Wireless Mesh Networks

    Get PDF
    Special Issue on “Multi-Hop Wireless Mesh Networks”</p

    Optimizing Communication for Massively Parallel Processing

    Get PDF
    The current trends in high performance computing show that large machines with tens of thousands of processors will soon be readily available. The IBM Bluegene-L machine with 128k processors (which is currently being deployed) is an important step in this direction. In this scenario, it is going to be a significant burden for the programmer to manually scale his applications. This task of scaling involves addressing issues like load-imbalance and communication overhead. In this thesis, we explore several communication optimizations to help parallel applications to easily scale on a large number of processors. We also present automatic runtime techniques to relieve the programmer from the burden of optimizing communication in his applications. This thesis explores processor virtualization to improve communication performance in applications. With processor virtualization, the computation is mapped to virtual processors (VPs). After one VP has finished computation and is waiting for responses to its messages, another VP can compute, thus overlapping communication with computation. This overlap is only effective if the processor overhead of the communication operation is a small fraction of the total communication time. Fortunately, with network interfaces having co-processors, this happens to be true and processor virtualization has a natural advantage on such interconnects. The communication optimizations we present in this thesis, are motivated by applications such as NAMD (a classical molecular dynamics application) and CPAIMD (a quantum chemistry application). Applications like NAMD and CPAIMD consume a fair share of the time available on supercomputers. So, improving their performance would be of great value. We have successfully scaled NAMD to 1TF of peak performance on 3000 processors of PSC Lemieux, using the techniques presented in this thesis. We study both point-to-point communication and collective communication (specifically all-to-all communication). On a large number of processors all-to-all communication can take several milli-seconds to finish. With synchronous collectives defined in MPI, the processor idles while the collective messages are in flight. Therefore, we demonstrate an asynchronous collective communication framework, to let the CPU compute while the all-to-all messages are in flight. We also show that the best strategy for all-to-all communication depends on the message size, number of processors and other dynamic parameters. This suggests that these parameters can be observed at runtime and used to choose the optimal strategy for all-to-all communication. In this thesis, we demonstrate adaptive strategy switching for all-to-all communication. The communication optimization framework presented in this thesis, has been designed to optimize communication in the context of processor virtualization and dynamic migrating objects. We present the streaming strategy to optimize fine grained object-to-object communication. In this thesis, we motivate the need for hardware collectives, as processor based collectives can be delayed by intermediate that processors busy with computation. We explore a next generation interconnect that supports collectives in the switching hardware. We show the performance gains of hardware collectives through synthetic benchmarks

    C4 - TOC

    Get PDF

    Scheduling and reconfiguration of interconnection network switches

    Get PDF
    Interconnection networks are important parts of modern computing systems, facilitating communication between a system\u27s components. Switches connecting various nodes of an interconnection network serve to move data in the network. The switch\u27s delay and throughput impact the overall performance of the network and thus the system. Scheduling efficient movement of data through a switch and configuring the switch to realize a schedule are the main themes of this research. We consider various interconnection network switches including (i) crossbar-based switches, (ii) circuit-switched tree switches, and (iii) fat-tree switches. For crossbar-based input-queued switches, a recent result established that logarithmic packet delay is possible. However, this result assumes that packet transmission time through the switch is no less than schedule-generation time. We prove that without this assumption (as is the case in practice) packet delay becomes linear. We also report results of simulations that bear out our result for practical switch sizes and indicate that a fast scheduling algorithm reduces not only packet delay but also buffer size. We also propose a fast mesh-of-trees based distributed switch scheduling (maximal-matching based) algorithm that has polylog complexity. A circuit-switched tree (CST) can serve as an interconnect structure for various computing architectures and models such as the self-reconfigurable gate array and the reconfigurable mesh. A CST is a tree structure with source and destination processing elements as leaves and switches as internal nodes. We design several scheduling and configuration algorithms that distributedly partition a given set of communications into non-conflicting subsets and then establish switch settings and paths on the CST corresponding to the communications. A fat-tree is another widely used interconnection structure in many of today\u27s high-performance clusters. We embed a reconfigurable mesh inside a fat-tree switch to generate efficient connections. We present an R-Mesh-based algorithm for a fat-tree switch that creates buses connecting input and output ports corresponding to various communications using that switch

    Improving broadcast performance in multi-radio multi-channel multi-rate wireless mesh networks.

    Full text link
    This thesis addresses the problem of `efficient' broadcast in a multi-radio multi-channel multi-rate wireless mesh network (MR2^2-MC WMN). In such a MR2^2-MC WMN, nodes are equipped with multiple radio network interface cards, each tuned to an orthogonal channel, that can dynamically adjust transmission rate by choosing a modulation scheme appropriate for the channel conditions. We choose `broadcast latency', defined as the maximum delay between a packet's network-wide broadcast at the source and its eventual reception at all network nodes, as the `efficiency' metric of broadcast performance. The problem of constructing a broadcast forwarding structure having minimal broadcast latency is referred to as the `minimum-latency-broadcasting' (MLB) problem. While previous research for broadcast in single-radio single-rate wireless networks has highlighted the wireless medium's `\emph{wireless broadcast advantage}' (WBA); little is known regarding how the new features of MR2^2-MC WMN may be exploited. We study in this thesis how the availability of multiple radio interfaces (tuned to orthogonal channels) at WMN nodes, and WMN's multi-rate transmission capability and WBA, might be exploited to improve the `broadcast latency' performance. We show the MLB problem for MR2^2-MC WMN to be NP-hard, and resort to heuristics for its solution. We divide the overall problem into two sub-problems, which we address in two separate parts of this thesis. \emph{In the first part of this thesis}, the MLB problem is defined for the case of single-radio single-channel multi-rate WMNs where WMN nodes are equipped with a single radio tuned to a common channel. \emph{In the second part of this thesis}, the MLB problem is defined for MR2^2-MC WMNs where WMN nodes are equipped with multiple radios tuned to multiple orthogonal channels. We demonstrate that broadcasting in multi-rate WMNs is significantly different to broadcasting in single-rate WMNs, and that broadcast performance in multi-rate WMNs can be significantly improved by exploiting the availability of multi-rate feature and multiple interfaces. We also present two alternative MLB broadcast frameworks and specific algorithms, centralized and distributed, for each framework that can exploit multiple interfaces at a WMN node, and the multi-rate feature and WBA of MR2^2-MC WMN to return improved `broadcast latency' performance
    corecore