6,261 research outputs found

    Communicating the sum of sources over a network

    Get PDF
    We consider the network communication scenario, over directed acyclic networks with unit capacity edges in which a number of sources sis_i each holding independent unit-entropy information XiX_i wish to communicate the sum Xi\sum{X_i} to a set of terminals tjt_j. We show that in the case in which there are only two sources or only two terminals, communication is possible if and only if each source terminal pair si/tjs_i/t_j is connected by at least a single path. For the more general communication problem in which there are three sources and three terminals, we prove that a single path connecting the source terminal pairs does not suffice to communicate Xi\sum{X_i}. We then present an efficient encoding scheme which enables the communication of Xi\sum{X_i} for the three sources, three terminals case, given that each source terminal pair is connected by {\em two} edge disjoint paths.Comment: 12 pages, IEEE JSAC: Special Issue on In-network Computation:Exploring the Fundamental Limits (to appear

    Distributed Matrix-Vector Multiplication: A Convolutional Coding Approach

    Get PDF
    Distributed computing systems are well-known to suffer from the problem of slow or failed nodes; these are referred to as stragglers. Straggler mitigation (for distributed matrix computations) has recently been investigated from the standpoint of erasure coding in several works. In this work we present a strategy for distributed matrix-vector multiplication based on convolutional coding. Our scheme can be decoded using a low-complexity peeling decoder. The recovery process enjoys excellent numerical stability as compared to Reed-Solomon coding based approaches (which exhibit significant problems owing their badly conditioned decoding matrices). Finally, our schemes are better matched to the practically important case of sparse matrix-vector multiplication as compared to many previous schemes. Extensive simulation results corroborate our findings

    Leveraging Coding Techniques for Speeding up Distributed Computing

    Get PDF
    Large scale clusters leveraging distributed computing frameworks such as MapReduce routinely process data that are on the orders of petabytes or more. The sheer size of the data precludes the processing of the data on a single computer. The philosophy in these methods is to partition the overall job into smaller tasks that are executed on different servers; this is called the map phase. This is followed by a data shuffling phase where appropriate data is exchanged between the servers. The final so-called reduce phase, completes the computation. One potential approach, explored in prior work for reducing the overall execution time is to operate on a natural tradeoff between computation and communication. Specifically, the idea is to run redundant copies of map tasks that are placed on judiciously chosen servers. The shuffle phase exploits the location of the nodes and utilizes coded transmission. The main drawback of this approach is that it requires the original job to be split into a number of map tasks that grows exponentially in the system parameters. This is problematic, as we demonstrate that splitting jobs too finely can in fact adversely affect the overall execution time. In this work we show that one can simultaneously obtain low communication loads while ensuring that jobs do not need to be split too finely. Our approach uncovers a deep relationship between this problem and a class of combinatorial structures called resolvable designs. Appropriate interpretation of resolvable designs can allow for the development of coded distributed computing schemes where the splitting levels are exponentially lower than prior work. We present experimental results obtained on Amazon EC2 clusters for a widely known distributed algorithm, namely TeraSort. We obtain over 4.69×\times improvement in speedup over the baseline approach and more than 2.6×\times over current state of the art

    On the multiple unicast capacity of 3-source, 3-terminal directed acyclic networks

    Get PDF
    We consider the multiple unicast problem with three source-terminal pairs over directed acyclic networks with unit-capacity edges. The three sitis_i-t_i pairs wish to communicate at unit-rate via network coding. The connectivity between the sitis_i - t_i pairs is quantified by means of a connectivity level vector, [k1k2k3][k_1 k_2 k_3] such that there exist kik_i edge-disjoint paths between sis_i and tit_i. In this work we attempt to classify networks based on the connectivity level. It can be observed that unit-rate transmission can be supported by routing if ki3k_i \geq 3, for all i=1,,3i = 1, \dots, 3. In this work, we consider, connectivity level vectors such that mini=1,,3ki<3\min_{i = 1, \dots, 3} k_i < 3. We present either a constructive linear network coding scheme or an instance of a network that cannot support the desired unit-rate requirement, for all such connectivity level vectors except the vector [1 2 4][1~2~4] (and its permutations). The benefits of our schemes extend to networks with higher and potentially different edge capacities. Specifically, our experimental results indicate that for networks where the different source-terminal paths have a significant overlap, our constructive unit-rate schemes can be packed along with routing to provide higher throughput as compared to a pure routing approach.Comment: To appear in the IEEE/ACM Transactions on Networkin

    Fractional repetition codes with flexible repair from combinatorial designs

    Get PDF
    Fractional repetition (FR) codes are a class of regenerating codes for distributed storage systems with an exact (table-based) repair process that is also uncoded, i.e., upon failure, a node is regenerated by simply downloading packets from the surviving nodes. In our work, we present constructions of FR codes based on Steiner systems and resolvable combinatorial designs such as affine geometries, Hadamard designs and mutually orthogonal Latin squares. The failure resilience of our codes can be varied in a simple manner. We construct codes with normalized repair bandwidth (β\beta) strictly larger than one; these cannot be obtained trivially from codes with β=1\beta = 1. Furthermore, we present the Kronecker product technique for generating new codes from existing ones and elaborate on their properties. FR codes with locality are those where the repair degree is smaller than the number of nodes contacted for reconstructing the stored file. For these codes we establish a tradeoff between the local repair property and failure resilience and construct codes that meet this tradeoff. Much of prior work only provided lower bounds on the FR code rate. In our work, for most of our constructions we determine the code rate for certain parameter ranges.Comment: 27 pages in IEEE two-column format. IEEE Transactions on Information Theory (to appear

    Replication based storage systems with local repair

    Get PDF
    We consider the design of regenerating codes for distributed storage systems that enjoy the property of local, exact and uncoded repair, i.e., (a) upon failure, a node can be regenerated by simply downloading packets from the surviving nodes and (b) the number of surviving nodes contacted is strictly smaller than the number of nodes that need to be contacted for reconstructing the stored file. Our codes consist of an outer MDS code and an inner fractional repetition code that specifies the placement of the encoded symbols on the storage nodes. For our class of codes, we identify the tradeoff between the local repair property and the minimum distance. We present codes based on graphs of high girth, affine resolvable designs and projective planes that meet the minimum distance bound for specific choices of file sizes

    Repairable Replication-based Storage Systems Using Resolvable Designs

    Get PDF
    We consider the design of regenerating codes for distributed storage systems at the minimum bandwidth regeneration (MBR) point. The codes allow for a repair process that is exact and uncoded, but table-based. These codes were introduced in prior work and consist of an outer MDS code followed by an inner fractional repetition (FR) code where copies of the coded symbols are placed on the storage nodes. The main challenge in this domain is the design of the inner FR code. In our work, we consider generalizations of FR codes, by establishing their connection with a family of combinatorial structures known as resolvable designs. Our constructions based on affine geometries, Hadamard designs and mutually orthogonal Latin squares allow the design of systems where a new node can be exactly regenerated by downloading β1\beta \geq 1 packets from a subset of the surviving nodes (prior work only considered the case of β=1\beta = 1). Our techniques allow the design of systems over a large range of parameters. Specifically, the repetition degree of a symbol, which dictates the resilience of the system can be varied over a large range in a simple manner. Moreover, the actual table needed for the repair can also be implemented in a rather straightforward way. Furthermore, we answer an open question posed in prior work by demonstrating the existence of codes with parameters that are not covered by Steiner systems

    Protection against link errors and failures using network coding

    Get PDF
    We propose a network-coding based scheme to protect multiple bidirectional unicast connections against adversarial errors and failures in a network. The network consists of a set of bidirectional primary path connections that carry the uncoded traffic. The end nodes of the bidirectional connections are connected by a set of shared protection paths that provide the redundancy required for protection. Such protection strategies are employed in the domain of optical networks for recovery from failures. In this work we consider the problem of simultaneous protection against adversarial errors and failures. Suppose that n_e paths are corrupted by the omniscient adversary. Under our proposed protocol, the errors can be corrected at all the end nodes with 4n_e protection paths. More generally, if there are n_e adversarial errors and n_f failures, 4n_e + 2n_f protection paths are sufficient. The number of protection paths only depends on the number of errors and failures being protected against and is independent of the number of unicast connections.Comment: The first version of this paper was accepted by IEEE Intl' Symp. on Info. Theo. 2009. The second version of this paper is submitted to IEEE Transactions on Communications (under minor revision). The third version of this paper has been accepted by IEEE Transactions on Communication

    Capacity of Sum-networks for Different Message Alphabets

    Get PDF
    A sum-network is a directed acyclic network in which all terminal nodes demand the `sum' of the independent information observed at the source nodes. Many characteristics of the well-studied multiple-unicast network communication problem also hold for sum-networks due to a known reduction between instances of these two problems. Our main result is that unlike a multiple unicast network, the coding capacity of a sum-network is dependent on the message alphabet. We demonstrate this using a construction procedure and show that the choice of a message alphabet can reduce the coding capacity of a sum-network from 11 to close to 00

    Universally Decodable Matrices for Distributed Matrix-Vector Multiplication

    Get PDF
    Coded computation is an emerging research area that leverages concepts from erasure coding to mitigate the effect of stragglers (slow nodes) in distributed computation clusters, especially for matrix computation problems. In this work, we present a class of distributed matrix-vector multiplication schemes that are based on codes in the Rosenbloom-Tsfasman metric and universally decodable matrices. Our schemes take into account the inherent computation order within a worker node. In particular, they allow us to effectively leverage partial computations performed by stragglers (a feature that many prior works lack). An additional main contribution of our work is a companion matrix-based embedding of these codes that allows us to obtain sparse and numerically stable schemes for the problem at hand. Experimental results confirm the effectiveness of our techniques.Comment: 6 pages, 1 figur
    corecore