16,010 research outputs found

    A distributed algorithm to find k-dominating sets

    Get PDF
    We consider a connected undirected graph G(n,m)G(n,m) with nn nodes and mm edges. A kk-dominating set DD in GG is a set of nodes having the property that every node in GG is at most kk edges away from at least one node in DD. Finding a kk-dominating set of minimum size is NP-hard. We give a new synchronous distributed algorithm to find a kk-dominating set in GG of size no greater than n/(k+1)\lfloor n/(k+1)\rfloor. Our algorithm requires O(klogn)O(k\log^*n) time and O(mlogk+nlogklogn)O(m\log k+n\log k\log^*n) messages to run. It has the same time complexity as the best currently known algorithm, but improves on that algorithm's message complexity and is, in addition, conceptually simpler.Comment: To appear in Discrete Applied Mathematic

    A Scalable Asynchronous Distributed Algorithm for Topic Modeling

    Full text link
    Learning meaningful topic models with massive document collections which contain millions of documents and billions of tokens is challenging because of two reasons: First, one needs to deal with a large number of topics (typically in the order of thousands). Second, one needs a scalable and efficient way of distributing the computation across multiple machines. In this paper we present a novel algorithm F+Nomad LDA which simultaneously tackles both these problems. In order to handle large number of topics we use an appropriately modified Fenwick tree. This data structure allows us to sample from a multinomial distribution over TT items in O(logT)O(\log T) time. Moreover, when topic counts change the data structure can be updated in O(logT)O(\log T) time. In order to distribute the computation across multiple processor we present a novel asynchronous framework inspired by the Nomad algorithm of \cite{YunYuHsietal13}. We show that F+Nomad LDA significantly outperform state-of-the-art on massive problems which involve millions of documents, billions of words, and thousands of topics

    An Improved Distributed Algorithm for Maximal Independent Set

    Full text link
    The Maximal Independent Set (MIS) problem is one of the basics in the study of locality in distributed graph algorithms. This paper presents an extremely simple randomized algorithm providing a near-optimal local complexity for this problem, which incidentally, when combined with some recent techniques, also leads to a near-optimal global complexity. Classical algorithms of Luby [STOC'85] and Alon, Babai and Itai [JALG'86] provide the global complexity guarantee that, with high probability, all nodes terminate after O(logn)O(\log n) rounds. In contrast, our initial focus is on the local complexity, and our main contribution is to provide a very simple algorithm guaranteeing that each particular node vv terminates after O(logdeg(v)+log1/ϵ)O(\log \mathsf{deg}(v)+\log 1/\epsilon) rounds, with probability at least 1ϵ1-\epsilon. The guarantee holds even if the randomness outside 22-hops neighborhood of vv is determined adversarially. This degree-dependency is optimal, due to a lower bound of Kuhn, Moscibroda, and Wattenhofer [PODC'04]. Interestingly, this local complexity smoothly transitions to a global complexity: by adding techniques of Barenboim, Elkin, Pettie, and Schneider [FOCS'12, arXiv: 1202.1983v3], we get a randomized MIS algorithm with a high probability global complexity of O(logΔ)+2O(loglogn)O(\log \Delta) + 2^{O(\sqrt{\log \log n})}, where Δ\Delta denotes the maximum degree. This improves over the O(log2Δ)+2O(loglogn)O(\log^2 \Delta) + 2^{O(\sqrt{\log \log n})} result of Barenboim et al., and gets close to the Ω(min{logΔ,logn})\Omega(\min\{\log \Delta, \sqrt{\log n}\}) lower bound of Kuhn et al. Corollaries include improved algorithms for MIS in graphs of upper-bounded arboricity, or lower-bounded girth, for Ruling Sets, for MIS in the Local Computation Algorithms (LCA) model, and a faster distributed algorithm for the Lov\'asz Local Lemma

    A Distributed Algorithm for Directed Minimum-Weight Spanning Tree

    Get PDF

    Computational Limits of A Distributed Algorithm For Smoothing Spline

    Get PDF
    In this paper, we explore statistical versus computational trade-off to address a basic question in the application of a distributed algorithm: what is the minimal computational cost in obtaining statistical optimality? In smoothing spline setup, we observe a phase transition phenomenon for the number of deployed machines that ends up being a simple proxy for computing cost. Specifically, a sharp upper bound for the number of machines is established: when the number is below this bound, statistical optimality (in terms of nonparametric estimation or testing) is achievable; otherwise, statistical optimality becomes impossible. These sharp bounds partly capture intrinsic computational limits of the distributed algorithm considered in this paper, and turn out to be fully determined by the smoothness of the regression function. As a side remark, we argue that sample splitting may be viewed as an alternative form of regularization, playing a similar role as smoothing parameter.Comment: To Appear in Journal of Machine Learning Researc
    corecore