1,642 research outputs found
On distributed scheduling in wireless networks exploiting broadcast and network coding
In this paper, we consider cross-layer optimization in wireless networks with wireless broadcast advantage, focusing on the problem of distributed scheduling of broadcast links. The wireless broadcast advantage is most useful in multicast scenarios. As such, we include network coding in our design to exploit the throughput gain brought in by network coding for multicasting. We derive a subgradient algorithm for joint rate control, network coding and scheduling, which however requires centralized link scheduling. Under the primary interference model, link scheduling problem is equivalent to a maximum weighted hypergraph matching problem that is NP-complete. To solve the scheduling problem distributedly, locally greedy and randomized approximation algorithms are proposed and shown to have bounded worst-case performance. With random network coding, we obtain a fully distributed cross-layer design. Numerical results show promising throughput gain using the proposed algorithms, and surprisingly, in some cases even with less complexity than cross-layer design without broadcast advantage
Optimal Networks from Error Correcting Codes
To address growth challenges facing large Data Centers and supercomputing
clusters a new construction is presented for scalable, high throughput, low
latency networks. The resulting networks require 1.5-5 times fewer switches,
2-6 times fewer cables, have 1.2-2 times lower latency and correspondingly
lower congestion and packet losses than the best present or proposed networks
providing the same number of ports at the same total bisection. These advantage
ratios increase with network size. The key new ingredient is the exact
equivalence discovered between the problem of maximizing network bisection for
large classes of practically interesting Cayley graphs and the problem of
maximizing codeword distance for linear error correcting codes. Resulting
translation recipe converts existent optimal error correcting codes into
optimal throughput networks.Comment: 14 pages, accepted at ANCS 2013 conferenc
An efficient task-based all-reduce for machine learning applications
All-Reduce is a collective-combine operation frequently utilised in synchronous parameter updates in parallel machine learning algorithms. The performance of this operation - and subsequently of the algorithm itself - is heavily dependent on its implementation, configuration and on the supporting hardware on which it is run. Given the pivotal role of all-reduce, a failure in any of these regards will significantly impact the resulting scientific output.
In this research we explore the performance of alternative all-reduce algorithms in data-flow graphs and compare these to the commonly used reduce-broadcast approach. We present an architecture and interface for all-reduce in task-based frameworks, and a parallelization scheme for object-serialization and computation. We present a concrete, novel application of a butterfly all-reduce algorithm on the Apache Spark framework on a high-performance compute cluster, and demonstrate the effectiveness of the new butterfly algorithm with a logarithmic speed-up with respect to the vector length compared with the original reduce-broadcast method - a 9x speed-up is observed for vector lengths in the order of 108. This improvement is comprised of both algorithmic changes (65%) and parallel-processing optimization (35%).
The effectiveness of the new butterfly all-reduce is demonstrated using real-world neural network applications with the Spark framework. For the model-update operation we observe significant speed-ups using the new butterfly algorithm compared with the original reduce-broadcast, for both smaller (Cifar and Mnist) and larger (ImageNet) datasets
Building Fault Tollrence within Clouds at Network Level
Cloud computing technologies and infrastructure facilities are coming up in a big way making it cost effective for the users to implement their IT based solutions to run business in most cost-effective and economical way. Many intricate issues however, have cropped-up which must be addressed to be able to use clouds the purpose for which they are designed and implemented. Among all, fault tolerance and securing the data stored on the clouds takes most of the importance. Continuous availability of the services is dependent on many factors. Faults bound to happen within a network, software, and platform or within the infrastructure which are all used for establishing the cloud. The network that connects various servers, devices, peripherals etc., have to be fault tolerant to start-with so that intended and un-interrupted services to the user can be made available. A novel network design method that leads to achieve high availability of the network and thereby the cloud itself has been presented in this pape
Wireless Inter-Session Network Coding - An Approach Using Virtual Multicasts
This paper addresses the problem of inter-session network coding to maximize throughput for multiple communication sessions in wireless networks. We introduce virtual multicast connections which can extract packets from original sessions and code them together. Random linear network codes can be used for these virtual multicasts. The problem can be stated as a flow-based convex optimization problem with side constraints. The proposed formulation provides a rate region which is at least as large as the region without inter-session network coding. We show the benefits of our technique for several scenarios by means of simulation.United States. Defense Advanced Research Projects Agency (Subcontract 18870740-37362-C
Optimal performance of distributed simulation programs
Journal ArticleThis paper describes a technique to analyze the potential speedup of distributed simulation programs. A distributed simulation strategy is proposed which minimizes execution time through the use of an oracle to control the simulation. Because the strategy relies on an oracle, it cannot be used for practical simulations. However the strategy facilitates performance evaluations of distributed simulation strategies by providing a useful point of comparison and can be used to determine the suitability of specific applications for implementation on a parallel computer. Based on the proposed strategy, a tool has been developed to determine the maximum performance which can be achieved from a distributed simulation program. In this paper we describe the technique and its use in evaluating the parallelism available in distributed simulators of parallel computer systems
A Linear Network Code Construction for General Integer Connections Based on the Constraint Satisfaction Problem
The problem of finding network codes for general connections is inherently
difficult in capacity constrained networks. Resource minimization for general
connections with network coding is further complicated. Existing methods for
identifying solutions mainly rely on highly restricted classes of network
codes, and are almost all centralized. In this paper, we introduce linear
network mixing coefficients for code constructions of general connections that
generalize random linear network coding (RLNC) for multicast connections. For
such code constructions, we pose the problem of cost minimization for the
subgraph involved in the coding solution and relate this minimization to a
path-based Constraint Satisfaction Problem (CSP) and an edge-based CSP. While
CSPs are NP-complete in general, we present a path-based probabilistic
distributed algorithm and an edge-based probabilistic distributed algorithm
with almost sure convergence in finite time by applying Communication Free
Learning (CFL). Our approach allows fairly general coding across flows,
guarantees no greater cost than routing, and shows a possible distributed
implementation. Numerical results illustrate the performance improvement of our
approach over existing methods.Comment: submitted to TON (conference version published at IEEE GLOBECOM 2015
Broadcasting in Hyper-cylinder graphs
Broadcasting in computer networking means the dissemination of information, which is known initially only at some nodes, to all network members. The goal is to inform every node in the minimal time possible. There are few models for broadcasting; the simplest and the historical model is called the Classical model. In the Classical model, dissemination happens in synchronous rounds, wherein a node may only inform one of its neighbors. The broadcast question is: What is the minimum number of rounds needed for broadcasting, and what broadcast scheme achieves it?
For general graphs, these questions are NP-hard, and it is known to be at least 3 - ε inapproximable for any real ε > 0. Even for some very restricted classes of graphs, the questions remain as an NP-hard problem. Little is known about broadcasting in restricted graphs, and only a few classes have a polynomial solution.
Parallel and distributed computing is one of the important domains which relies on efficient broadcasting. Hypercube and torus are the most used network topology in this domain. The widespread use is not only due to their simplicity but also is for their efficiency and high robustness (e.g., fault tolerance) while having an acceptable number of links. In this thesis, it is observed that the Cartesian product of a number of path and cycle graphs produces a valuable set of topologies, we called hyper-cylinders, which contain hypercube and Torus as well. Any hyper-cylinder shares many of the beneficial features of hypercube and torus and might be a suitable substitution in some cases. Some hyper-cylinders are also similar to other practically used topologies such as cube-connected cycles. In this thesis, the effect of the Cartesian product on broadcasting and broadcasting of hyper-cylinders under the Classical and Messy models is studied. This will add a valuable class of graphs to the limited classes of graphs which have a polynomially computable broadcast time. In the end, the relation between worst-case originators and diameters in trees is studied, which may help in the broadcast study of a larger class of graphs where any tree is allowed instead of a path in the Cartesian product
- …