3,925 research outputs found

    On the design and implementation of broadcast and global combine operations using the postal model

    Get PDF
    There are a number of models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model a parameter λ is used to model the communication latency of the message-passing system. Each node during each round can send a fixed-size message and, simultaneously, receive a message of the same size. Furthermore, a message sent out during round r will incur a latency of hand will arrive at the receiving node at round r + λ - 1. Our goal in this paper is to bridge the gap between the theoretical modeling and the practical implementation. In particular, we investigate a number of practical issues related to the design and implementation of two collective communication operations, namely, the broadcast operation and the global combine operation. Those practical issues include, for example, 1) techniques for measurement of the value of λ on a given machine, 2) creating efficient broadcast algorithms that get the latency hand the number of nodes n as parameters and 3) creating efficient global combine algorithms for parallel machines with λ which is not an integer. We propose solutions that address those practical issues and present results of an experimental study of the new algorithms on the Intel Delta machine. Our main conclusion is that the postal model can help in performance prediction and tuning, for example, a properly tuned broadcast improves the known implementation by more than 20%

    Latency Optimal Broadcasting in Noisy Wireless Mesh Networks

    Full text link
    In this paper, we adopt a new noisy wireless network model introduced very recently by Censor-Hillel et al. in [ACM PODC 2017, CHHZ17]. More specifically, for a given noise parameter p[0,1],p\in [0,1], any sender has a probability of pp of transmitting noise or any receiver of a single transmission in its neighborhood has a probability pp of receiving noise. In this paper, we first propose a new asymptotically latency-optimal approximation algorithm (under faultless model) that can complete single-message broadcasting task in D+O(log2n)D+O(\log^2 n) time units/rounds in any WMN of size n,n, and diameter DD. We then show this diameter-linear broadcasting algorithm remains robust under the noisy wireless network model and also improves the currently best known result in CHHZ17 by a Θ(loglogn)\Theta(\log\log n) factor. In this paper, we also further extend our robust single-message broadcasting algorithm to kk multi-message broadcasting scenario and show it can broadcast kk messages in O(D+klogn+log2n)O(D+k\log n+\log^2 n) time rounds. This new robust multi-message broadcasting scheme is not only asymptotically optimal but also answers affirmatively the problem left open in CHHZ17 on the existence of an algorithm that is robust to sender and receiver faults and can broadcast kk messages in O(D+klogn+polylog(n))O(D+k\log n + polylog(n)) time rounds.Comment: arXiv admin note: text overlap with arXiv:1705.07369 by other author

    Throughput-Optimal Multihop Broadcast on Directed Acyclic Wireless Networks

    Get PDF
    We study the problem of efficiently broadcasting packets in multi-hop wireless networks. At each time slot the network controller activates a set of non-interfering links and forwards selected copies of packets on each activated link. A packet is considered jointly received only when all nodes in the network have obtained a copy of it. The maximum rate of jointly received packets is referred to as the broadcast capacity of the network. Existing policies achieve the broadcast capacity by balancing traffic over a set of spanning trees, which are difficult to maintain in a large and time-varying wireless network. We propose a new dynamic algorithm that achieves the broadcast capacity when the underlying network topology is a directed acyclic graph (DAG). This algorithm is decentralized, utilizes local queue-length information only and does not require the use of global topological structures such as spanning trees. The principal technical challenge inherent in the problem is the absence of work-conservation principle due to the duplication of packets, which renders traditional queuing modelling inapplicable. We overcome this difficulty by studying relative packet deficits and imposing in-order delivery constraints to every node in the network. Although in-order packet delivery, in general, leads to degraded throughput in graphs with cycles, we show that it is throughput optimal in DAGs and can be exploited to simplify the design and analysis of optimal algorithms. Our characterization leads to a polynomial time algorithm for computing the broadcast capacity of any wireless DAG under the primary interference constraints. Additionally, we propose an extension of our algorithm which can be effectively used for broadcasting in any network with arbitrary topology

    Constant-Time Algorithms for Minimum Spanning Tree and Related Problems on Processor Array with Reconfigurable Bus Systems

    Get PDF
    [[abstract]]A processor array with a reconfigurable bus system is a parallel computation model that consists of a processor array and a reconfigurable bus system. In this paper, a constant-time algorithm is proposed on this model for finding the cycles in an undirected graph. We can use this algorithm to decide whether a specified edge belongs to the minimum spanning tree of the graph or not. This cycle-finding algorithm is designed on a two-dimensional n×nn\times n processor array with a reconfigurable bus system, where nn is the number of vertices in the graph. Based on this cycle-finding algorithm, the minimum spanning tree problem and the spanning tree problem can be solved in O(1) time by using fewer processors than before, O(n×m×nn\times m\times n) and O(n3n^3) processors respectively. This is a substantial improvement over previous known results. Moreover, we also propose two constant-time algorithms for solving the minimum spanning tree verification problem and spanning tree verification problem by using O(n3n^3) and O(n2n^2) processors, respectively.
    corecore