81,872 research outputs found
Scalable Facility Location for Massive Graphs on Pregel-like Systems
We propose a new scalable algorithm for facility location. Facility location
is a classic problem, where the goal is to select a subset of facilities to
open, from a set of candidate facilities F , in order to serve a set of clients
C. The objective is to minimize the total cost of opening facilities plus the
cost of serving each client from the facility it is assigned to. In this work,
we are interested in the graph setting, where the cost of serving a client from
a facility is represented by the shortest-path distance on the graph. This
setting allows to model natural problems arising in the Web and in social media
applications. It also allows to leverage the inherent sparsity of such graphs,
as the input is much smaller than the full pairwise distances between all
vertices.
To obtain truly scalable performance, we design a parallel algorithm that
operates on clusters of shared-nothing machines. In particular, we target
modern Pregel-like architectures, and we implement our algorithm on Apache
Giraph. Our solution makes use of a recent result to build sketches for massive
graphs, and of a fast parallel algorithm to find maximal independent sets, as
building blocks. In so doing, we show how these problems can be solved on a
Pregel-like architecture, and we investigate the properties of these
algorithms. Extensive experimental results show that our algorithm scales
gracefully to graphs with billions of edges, while obtaining values of the
objective function that are competitive with a state-of-the-art sequential
algorithm
Sample-And-Gather: Fast Ruling Set Algorithms in the Low-Memory MPC Model
Motivated by recent progress on symmetry breaking problems such as maximal independent set (MIS) and maximal matching in the low-memory Massively Parallel Computation (MPC) model (e.g., Behnezhad et al. PODC 2019; Ghaffari-Uitto SODA 2019), we investigate the complexity of ruling set problems in this model. The MPC model has become very popular as a model for large-scale distributed computing and it comes with the constraint that the memory-per-machine is strongly sublinear in the input size. For graph problems, extremely fast MPC algorithms have been designed assuming ??(n) memory-per-machine, where n is the number of nodes in the graph (e.g., the O(log log n) MIS algorithm of Ghaffari et al., PODC 2018). However, it has proven much more difficult to design fast MPC algorithms for graph problems in the low-memory MPC model, where the memory-per-machine is restricted to being strongly sublinear in the number of nodes, i.e., O(n^?) for constant 0 < ? < 1.
In this paper, we present an algorithm for the 2-ruling set problem, running in O?(log^{1/6} ?) rounds whp, in the low-memory MPC model. Here ? is the maximum degree of the graph. We then extend this result to ?-ruling sets for any integer ? > 1. Specifically, we show that a ?-ruling set can be computed in the low-memory MPC model with O(n^?) memory-per-machine in O?(? ? log^{1/(2^{?+1}-2)} ?) rounds, whp. From this it immediately follows that a ?-ruling set for ? = ?(log log log ?)-ruling set can be computed in in just O(? log log n) rounds whp. The above results assume a total memory of O?(m + n^{1+?}). We also present algorithms for ?-ruling sets in the low-memory MPC model assuming that the total memory over all machines is restricted to O?(m). For ? > 1, these algorithms are all substantially faster than the Ghaffari-Uitto O?(?{log ?})-round MIS algorithm in the low-memory MPC model.
All our results follow from a Sample-and-Gather Simulation Theorem that shows how random-sampling-based Congest algorithms can be efficiently simulated in the low-memory MPC model. We expect this simulation theorem to be of independent interest beyond the ruling set algorithms derived here
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
Fast Local Computation Algorithms
For input , let denote the set of outputs that are the "legal"
answers for a computational problem . Suppose and members of are
so large that there is not time to read them in their entirety. We propose a
model of {\em local computation algorithms} which for a given input ,
support queries by a user to values of specified locations in a legal
output . When more than one legal output exists for a given
, the local computation algorithm should output in a way that is consistent
with at least one such . Local computation algorithms are intended to
distill the common features of several concepts that have appeared in various
algorithmic subfields, including local distributed computation, local
algorithms, locally decodable codes, and local reconstruction.
We develop a technique, based on known constructions of small sample spaces
of -wise independent random variables and Beck's analysis in his algorithmic
approach to the Lov{\'{a}}sz Local Lemma, which under certain conditions can be
applied to construct local computation algorithms that run in {\em
polylogarithmic} time and space. We apply this technique to maximal independent
set computations, scheduling radio network broadcasts, hypergraph coloring and
satisfying -SAT formulas.Comment: A preliminary version of this paper appeared in ICS 2011, pp. 223-23
Distributed Symmetry Breaking in Hypergraphs
Fundamental local symmetry breaking problems such as Maximal Independent Set
(MIS) and coloring have been recognized as important by the community, and
studied extensively in (standard) graphs. In particular, fast (i.e.,
logarithmic run time) randomized algorithms are well-established for MIS and
-coloring in both the LOCAL and CONGEST distributed computing
models. On the other hand, comparatively much less is known on the complexity
of distributed symmetry breaking in {\em hypergraphs}. In particular, a key
question is whether a fast (randomized) algorithm for MIS exists for
hypergraphs.
In this paper, we study the distributed complexity of symmetry breaking in
hypergraphs by presenting distributed randomized algorithms for a variety of
fundamental problems under a natural distributed computing model for
hypergraphs. We first show that MIS in hypergraphs (of arbitrary dimension) can
be solved in rounds ( is the number of nodes of the
hypergraph) in the LOCAL model. We then present a key result of this paper ---
an -round hypergraph MIS algorithm in
the CONGEST model where is the maximum node degree of the hypergraph
and is any arbitrarily small constant.
To demonstrate the usefulness of hypergraph MIS, we present applications of
our hypergraph algorithm to solving problems in (standard) graphs. In
particular, the hypergraph MIS yields fast distributed algorithms for the {\em
balanced minimal dominating set} problem (left open in Harris et al. [ICALP
2013]) and the {\em minimal connected dominating set problem}. We also present
distributed algorithms for coloring, maximal matching, and maximal clique in
hypergraphs.Comment: Changes from the previous version: More references adde
- …