112,516 research outputs found
Scalable Facility Location for Massive Graphs on Pregel-like Systems
We propose a new scalable algorithm for facility location. Facility location
is a classic problem, where the goal is to select a subset of facilities to
open, from a set of candidate facilities F , in order to serve a set of clients
C. The objective is to minimize the total cost of opening facilities plus the
cost of serving each client from the facility it is assigned to. In this work,
we are interested in the graph setting, where the cost of serving a client from
a facility is represented by the shortest-path distance on the graph. This
setting allows to model natural problems arising in the Web and in social media
applications. It also allows to leverage the inherent sparsity of such graphs,
as the input is much smaller than the full pairwise distances between all
vertices.
To obtain truly scalable performance, we design a parallel algorithm that
operates on clusters of shared-nothing machines. In particular, we target
modern Pregel-like architectures, and we implement our algorithm on Apache
Giraph. Our solution makes use of a recent result to build sketches for massive
graphs, and of a fast parallel algorithm to find maximal independent sets, as
building blocks. In so doing, we show how these problems can be solved on a
Pregel-like architecture, and we investigate the properties of these
algorithms. Extensive experimental results show that our algorithm scales
gracefully to graphs with billions of edges, while obtaining values of the
objective function that are competitive with a state-of-the-art sequential
algorithm
Graph sparsification for derandomizing massively parallel computation with low space
Massively Parallel Computation (MPC) is an emerging model which distills core aspects of distributed and parallel computation. It was developed as a tool to solve (typically graph) problems in systems where input is distributed over many machines with limited space. Recent work has focused on the regime in which machines have sublinear (in n, number of nodes in the input graph) space, with randomized algorithms presented for the fundamental problems of Maximal Matching and Maximal Independent Set. There are, however, no prior corresponding deterministic algorithms.
A major challenge in the sublinear space setting is that the local space of each machine may be too small to store all the edges incident to a single node. To overcome this barrier we introduce a new graph sparsification technique that deterministically computes a low-degree subgraph with additional desired properties: degrees in the subgraph are sufficiently small that nodes’ neighborhoods can be stored on single machines, and solving the problem on the subgraph provides significant global progress towards solving the problem for the original input graph.
Using this framework to derandomize the well-known randomized algorithm of Luby [SICOMP’86], we obtain O(log(\Delta) + loglog(n))- round deterministic MPC algorithms for solving the fundamental problems of Maximal Matching and Maximal Independent Set with O(n epsilon) space on each machine for any constant epsilon > 0. Based on the recent work of Ghaffari et al. [FOCS’18], this additive O(loglog(n)) factor is conditionally essential. These algorithms can also be shown
to run in O(log(\Delta)) rounds in the closely related model of CONGESTED CLIQUE, improving upon the state-of-the-art bound of O(log^2(\Delta)) rounds by Censor-Hillel et al. [DISC’17]
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
On the Enumeration of all Minimal Triangulations
We present an algorithm that enumerates all the minimal triangulations of a
graph in incremental polynomial time. Consequently, we get an algorithm for
enumerating all the proper tree decompositions, in incremental polynomial time,
where "proper" means that the tree decomposition cannot be improved by removing
or splitting a bag
Distributed and Parallel Algorithms for Set Cover Problems with Small Neighborhood Covers
In this paper, we study a class of set cover problems that satisfy a special
property which we call the {\em small neighborhood cover} property. This class
encompasses several well-studied problems including vertex cover, interval
cover, bag interval cover and tree cover. We design unified distributed and
parallel algorithms that can handle any set cover problem falling under the
above framework and yield constant factor approximations. These algorithms run
in polylogarithmic communication rounds in the distributed setting and are in
NC, in the parallel setting.Comment: Full version of FSTTCS'13 pape
Fast Local Computation Algorithms
For input , let denote the set of outputs that are the "legal"
answers for a computational problem . Suppose and members of are
so large that there is not time to read them in their entirety. We propose a
model of {\em local computation algorithms} which for a given input ,
support queries by a user to values of specified locations in a legal
output . When more than one legal output exists for a given
, the local computation algorithm should output in a way that is consistent
with at least one such . Local computation algorithms are intended to
distill the common features of several concepts that have appeared in various
algorithmic subfields, including local distributed computation, local
algorithms, locally decodable codes, and local reconstruction.
We develop a technique, based on known constructions of small sample spaces
of -wise independent random variables and Beck's analysis in his algorithmic
approach to the Lov{\'{a}}sz Local Lemma, which under certain conditions can be
applied to construct local computation algorithms that run in {\em
polylogarithmic} time and space. We apply this technique to maximal independent
set computations, scheduling radio network broadcasts, hypergraph coloring and
satisfying -SAT formulas.Comment: A preliminary version of this paper appeared in ICS 2011, pp. 223-23
- …