16,010 research outputs found
A distributed algorithm to find k-dominating sets
We consider a connected undirected graph with nodes and
edges. A -dominating set in is a set of nodes having the property
that every node in is at most edges away from at least one node in .
Finding a -dominating set of minimum size is NP-hard. We give a new
synchronous distributed algorithm to find a -dominating set in of size
no greater than . Our algorithm requires
time and messages to run. It has the same time
complexity as the best currently known algorithm, but improves on that
algorithm's message complexity and is, in addition, conceptually simpler.Comment: To appear in Discrete Applied Mathematic
A Scalable Asynchronous Distributed Algorithm for Topic Modeling
Learning meaningful topic models with massive document collections which
contain millions of documents and billions of tokens is challenging because of
two reasons: First, one needs to deal with a large number of topics (typically
in the order of thousands). Second, one needs a scalable and efficient way of
distributing the computation across multiple machines. In this paper we present
a novel algorithm F+Nomad LDA which simultaneously tackles both these problems.
In order to handle large number of topics we use an appropriately modified
Fenwick tree. This data structure allows us to sample from a multinomial
distribution over items in time. Moreover, when topic counts
change the data structure can be updated in time. In order to
distribute the computation across multiple processor we present a novel
asynchronous framework inspired by the Nomad algorithm of
\cite{YunYuHsietal13}. We show that F+Nomad LDA significantly outperform
state-of-the-art on massive problems which involve millions of documents,
billions of words, and thousands of topics
An Improved Distributed Algorithm for Maximal Independent Set
The Maximal Independent Set (MIS) problem is one of the basics in the study
of locality in distributed graph algorithms. This paper presents an extremely
simple randomized algorithm providing a near-optimal local complexity for this
problem, which incidentally, when combined with some recent techniques, also
leads to a near-optimal global complexity.
Classical algorithms of Luby [STOC'85] and Alon, Babai and Itai [JALG'86]
provide the global complexity guarantee that, with high probability, all nodes
terminate after rounds. In contrast, our initial focus is on the
local complexity, and our main contribution is to provide a very simple
algorithm guaranteeing that each particular node terminates after rounds, with probability at least
. The guarantee holds even if the randomness outside -hops
neighborhood of is determined adversarially. This degree-dependency is
optimal, due to a lower bound of Kuhn, Moscibroda, and Wattenhofer [PODC'04].
Interestingly, this local complexity smoothly transitions to a global
complexity: by adding techniques of Barenboim, Elkin, Pettie, and Schneider
[FOCS'12, arXiv: 1202.1983v3], we get a randomized MIS algorithm with a high
probability global complexity of ,
where denotes the maximum degree. This improves over the result of Barenboim et al., and gets close
to the lower bound of Kuhn et al.
Corollaries include improved algorithms for MIS in graphs of upper-bounded
arboricity, or lower-bounded girth, for Ruling Sets, for MIS in the Local
Computation Algorithms (LCA) model, and a faster distributed algorithm for the
Lov\'asz Local Lemma
Computational Limits of A Distributed Algorithm For Smoothing Spline
In this paper, we explore statistical versus computational trade-off to
address a basic question in the application of a distributed algorithm: what is
the minimal computational cost in obtaining statistical optimality? In
smoothing spline setup, we observe a phase transition phenomenon for the number
of deployed machines that ends up being a simple proxy for computing cost.
Specifically, a sharp upper bound for the number of machines is established:
when the number is below this bound, statistical optimality (in terms of
nonparametric estimation or testing) is achievable; otherwise, statistical
optimality becomes impossible. These sharp bounds partly capture intrinsic
computational limits of the distributed algorithm considered in this paper, and
turn out to be fully determined by the smoothness of the regression function.
As a side remark, we argue that sample splitting may be viewed as an
alternative form of regularization, playing a similar role as smoothing
parameter.Comment: To Appear in Journal of Machine Learning Researc
- …