17 research outputs found

    Fault-Tolerant Strassen-Like Matrix Multiplication

    Full text link
    In this study, we propose a simple method for fault-tolerant Strassen-like matrix multiplications. The proposed method is based on using two distinct Strassen-like algorithms instead of replicating a given one. We have realized that using two different algorithms, new check relations arise resulting in more local computations. These local computations are found using computer aided search. To improve performance, special parity (extra) sub-matrix multiplications (PSMMs) are generated (two of them) at the expense of increasing communication/computation cost of the system. Our preliminary results demonstrate that the proposed method outperforms a Strassen-like algorithm with two copies and secures a very close performance to three copy version using only 2 PSMMs, reducing the total number of compute nodes by around 24\% i.e., from 21 to 16.Comment: 6 pages, 2 figure

    Exploitation of Stragglers in Coded Computation

    Full text link
    In cloud computing systems slow processing nodes, often referred to as "stragglers", can significantly extend the computation time. Recent results have shown that error correction coding can be used to reduce the effect of stragglers. In this work we introduce a scheme that, in addition to using error correction to distribute mixed jobs across nodes, is also able to exploit the work completed by all nodes, including stragglers. We first consider vector-matrix multiplication and apply maximum distance separable (MDS) codes to small blocks of sub-matrices. The worker nodes process blocks sequentially, working block-by-block, transmitting partial per-block results to the master as they are completed. Sub-blocking allows a more continuous completion process, which thereby allows us to exploit the work of a much broader spectrum of processors and reduces computation time. We then apply this technique to matrix-matrix multiplication using product code. In this case, we show that the order of computing sub-tasks is a new degree of design freedom that can be exploited to reduce computation time further. We propose a novel approach to analyze the finishing time, which is different from typical order statistics. Simulation results show that the expected computation time decreases by a factor of at least two in compared to previous methods

    Hierarchical Coded Computation

    Full text link
    Coded computation is a method to mitigate "stragglers" in distributed computing systems through the use of error correction coding that has lately received significant attention. First used in vector-matrix multiplication, the range of application was later extended to include matrix-matrix multiplication, heterogeneous networks, convolution, and approximate computing. A drawback to previous results is they completely ignore work completed by stragglers. While stragglers are slower compute nodes, in many settings the amount of work completed by stragglers can be non-negligible. Thus, in this work, we propose a hierarchical coded computation method that exploits the work completed by all compute nodes. We partition each node's computation into layers of sub-computations such that each layer can be treated as (distinct) erasure channel. We then design different erasure codes for each layer so that all layers have the same failure exponent. We propose design guidelines to optimize parameters of such codes. Numerical results show the proposed scheme has an improvement of a factor of 1.5 in the expected finishing time compared to previous work
    corecore