26,237 research outputs found
Average Sensitivity of Graph Algorithms
In modern applications of graphs algorithms, where the graphs of interest are
large and dynamic, it is unrealistic to assume that an input representation
contains the full information of a graph being studied. Hence, it is desirable
to use algorithms that, even when only a (large) subgraph is available, output
solutions that are close to the solutions output when the whole graph is
available. We formalize this idea by introducing the notion of average
sensitivity of graph algorithms, which is the average earth mover's distance
between the output distributions of an algorithm on a graph and its subgraph
obtained by removing an edge, where the average is over the edges removed and
the distance between two outputs is the Hamming distance.
In this work, we initiate a systematic study of average sensitivity. After
deriving basic properties of average sensitivity such as composition, we
provide efficient approximation algorithms with low average sensitivities for
concrete graph problems, including the minimum spanning forest problem, the
global minimum cut problem, the minimum - cut problem, and the maximum
matching problem. In addition, we prove that the average sensitivity of our
global minimum cut algorithm is almost optimal, by showing a nearly matching
lower bound. We also show that every algorithm for the 2-coloring problem has
average sensitivity linear in the number of vertices. One of the main ideas
involved in designing our algorithms with low average sensitivity is the
following fact; if the presence of a vertex or an edge in the solution output
by an algorithm can be decided locally, then the algorithm has a low average
sensitivity, allowing us to reuse the analyses of known sublinear-time
algorithms and local computation algorithms (LCAs). Using this connection, we
show that every LCA for 2-coloring has linear query complexity, thereby
answering an open question.Comment: 39 pages, 1 figur
A Simple Deterministic Distributed MST Algorithm, with Near-Optimal Time and Message Complexities
Distributed minimum spanning tree (MST) problem is one of the most central
and fundamental problems in distributed graph algorithms. Garay et al.
\cite{GKP98,KP98} devised an algorithm with running time , where is the hop-diameter of the input -vertex -edge
graph, and with message complexity . Peleg and Rubinovich
\cite{PR99} showed that the running time of the algorithm of \cite{KP98} is
essentially tight, and asked if one can achieve near-optimal running time
**together with near-optimal message complexity**.
In a recent breakthrough, Pandurangan et al. \cite{PRS16} answered this
question in the affirmative, and devised a **randomized** algorithm with time
and message complexity . They asked if
such a simultaneous time- and message-optimality can be achieved by a
**deterministic** algorithm.
In this paper, building upon the work of \cite{PRS16}, we answer this
question in the affirmative, and devise a **deterministic** algorithm that
computes MST in time , using messages. The polylogarithmic factors in the time
and message complexities of our algorithm are significantly smaller than the
respective factors in the result of \cite{PRS16}. Also, our algorithm and its
analysis are very **simple** and self-contained, as opposed to rather
complicated previous sublinear-time algorithms \cite{GKP98,KP98,E04b,PRS16}
Almost-Smooth Histograms and Sliding-Window Graph Algorithms
We study algorithms for the sliding-window model, an important variant of the
data-stream model, in which the goal is to compute some function of a
fixed-length suffix of the stream. We extend the smooth-histogram framework of
Braverman and Ostrovsky (FOCS 2007) to almost-smooth functions, which includes
all subadditive functions. Specifically, we show that if a subadditive function
can be -approximated in the insertion-only streaming model, then
it can be -approximated also in the sliding-window model with
space complexity larger by factor , where is the
window size.
We demonstrate how our framework yields new approximation algorithms with
relatively little effort for a variety of problems that do not admit the
smooth-histogram technique. For example, in the frequency-vector model, a
symmetric norm is subadditive and thus we obtain a sliding-window
-approximation algorithm for it. Another example is for streaming
matrices, where we derive a new sliding-window
-approximation algorithm for Schatten -norm. We then
consider graph streams and show that many graph problems are subadditive,
including maximum submodular matching, minimum vertex-cover, and maximum
-cover, thereby deriving sliding-window -approximation algorithms for
them almost for free (using known insertion-only algorithms). Finally, we
design for every an artificial function, based on the
maximum-matching size, whose almost-smoothness parameter is exactly
Parallel Algorithms for Geometric Graph Problems
We give algorithms for geometric graph problems in the modern parallel models
inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem
over a set of points in the two-dimensional space, our algorithm computes a
-approximate MST. Our algorithms work in a constant number of
rounds of communication, while using total space and communication proportional
to the size of the data (linear space and near linear time algorithms). In
contrast, for general graphs, achieving the same result for MST (or even
connectivity) remains a challenging open problem, despite drawing significant
attention in recent years.
We develop a general algorithmic framework that, besides MST, also applies to
Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic
framework has implications beyond the MapReduce model. For example it yields a
new algorithm for computing EMD cost in the plane in near-linear time,
. We note that while recently Sharathkumar and Agarwal
developed a near-linear time algorithm for -approximating EMD,
our algorithm is fundamentally different, and, for example, also solves the
transportation (cost) problem, raised as an open question in their work.
Furthermore, our algorithm immediately gives a -approximation
algorithm with space in the streaming-with-sorting model with
passes. As such, it is tempting to conjecture that the
parallel models may also constitute a concrete playground in the quest for
efficient algorithms for EMD (and other similar problems) in the vanilla
streaming model, a well-known open problem
Large induced subgraphs via triangulations and CMSO
We obtain an algorithmic meta-theorem for the following optimization problem.
Let \phi\ be a Counting Monadic Second Order Logic (CMSO) formula and t be an
integer. For a given graph G, the task is to maximize |X| subject to the
following: there is a set of vertices F of G, containing X, such that the
subgraph G[F] induced by F is of treewidth at most t, and structure (G[F],X)
models \phi.
Some special cases of this optimization problem are the following generic
examples. Each of these cases contains various problems as a special subcase:
1) "Maximum induced subgraph with at most l copies of cycles of length 0
modulo m", where for fixed nonnegative integers m and l, the task is to find a
maximum induced subgraph of a given graph with at most l vertex-disjoint cycles
of length 0 modulo m.
2) "Minimum \Gamma-deletion", where for a fixed finite set of graphs \Gamma\
containing a planar graph, the task is to find a maximum induced subgraph of a
given graph containing no graph from \Gamma\ as a minor.
3) "Independent \Pi-packing", where for a fixed finite set of connected
graphs \Pi, the task is to find an induced subgraph G[F] of a given graph G
with the maximum number of connected components, such that each connected
component of G[F] is isomorphic to some graph from \Pi.
We give an algorithm solving the optimization problem on an n-vertex graph G
in time O(#pmc n^{t+4} f(t,\phi)), where #pmc is the number of all potential
maximal cliques in G and f is a function depending of t and \phi\ only. We also
show how a similar running time can be obtained for the weighted version of the
problem. Pipelined with known bounds on the number of potential maximal
cliques, we deduce that our optimization problem can be solved in time
O(1.7347^n) for arbitrary graphs, and in polynomial time for graph classes with
polynomial number of minimal separators
Instance and Output Optimal Parallel Algorithms for Acyclic Joins
Massively parallel join algorithms have received much attention in recent
years, while most prior work has focused on worst-optimal algorithms. However,
the worst-case optimality of these join algorithms relies on hard instances
having very large output sizes, which rarely appear in practice. A stronger
notion of optimality is {\em output-optimal}, which requires an algorithm to be
optimal within the class of all instances sharing the same input and output
size. An even stronger optimality is {\em instance-optimal}, i.e., the
algorithm is optimal on every single instance, but this may not always be
achievable.
In the traditional RAM model of computation, the classical Yannakakis
algorithm is instance-optimal on any acyclic join. But in the massively
parallel computation (MPC) model, the situation becomes much more complicated.
We first show that for the class of r-hierarchical joins, instance-optimality
can still be achieved in the MPC model. Then, we give a new MPC algorithm for
an arbitrary acyclic join with load O ({\IN \over p} + {\sqrt{\IN \cdot \OUT}
\over p}), where \IN,\OUT are the input and output sizes of the join, and
is the number of servers in the MPC model. This improves the MPC version of
the Yannakakis algorithm by an O (\sqrt{\OUT \over \IN} ) factor.
Furthermore, we show that this is output-optimal when \OUT = O(p \cdot \IN),
for every acyclic but non-r-hierarchical join. Finally, we give the first
output-sensitive lower bound for the triangle join in the MPC model, showing
that it is inherently more difficult than acyclic joins
Bicriteria Network Design Problems
We study a general class of bicriteria network design problems. A generic
problem in this class is as follows: Given an undirected graph and two
minimization objectives (under different cost functions), with a budget
specified on the first, find a <subgraph \from a given subgraph-class that
minimizes the second objective subject to the budget on the first. We consider
three different criteria - the total edge cost, the diameter and the maximum
degree of the network. Here, we present the first polynomial-time approximation
algorithms for a large class of bicriteria network design problems for the
above mentioned criteria. The following general types of results are presented.
First, we develop a framework for bicriteria problems and their
approximations. Second, when the two criteria are the same %(note that the cost
functions continue to be different) we present a ``black box'' parametric
search technique. This black box takes in as input an (approximation) algorithm
for the unicriterion situation and generates an approximation algorithm for the
bicriteria case with only a constant factor loss in the performance guarantee.
Third, when the two criteria are the diameter and the total edge costs we use a
cluster-based approach to devise a approximation algorithms --- the solutions
output violate both the criteria by a logarithmic factor. Finally, for the
class of treewidth-bounded graphs, we provide pseudopolynomial-time algorithms
for a number of bicriteria problems using dynamic programming. We show how
these pseudopolynomial-time algorithms can be converted to fully
polynomial-time approximation schemes using a scaling technique.Comment: 24 pages 1 figur
Forest resampling for distributed sequential Monte Carlo
This paper brings explicit considerations of distributed computing
architectures and data structures into the rigorous design of Sequential Monte
Carlo (SMC) methods. A theoretical result established recently by the authors
shows that adapting interaction between particles to suitably control the
Effective Sample Size (ESS) is sufficient to guarantee stability of SMC
algorithms. Our objective is to leverage this result and devise algorithms
which are thus guaranteed to work well in a distributed setting. We make three
main contributions to achieve this. Firstly, we study mathematical properties
of the ESS as a function of matrices and graphs that parameterize the
interaction amongst particles. Secondly, we show how these graphs can be
induced by tree data structures which model the logical network topology of an
abstract distributed computing environment. Thirdly, we present efficient
distributed algorithms that achieve the desired ESS control, perform resampling
and operate on forests associated with these trees
- …