3,441 research outputs found
Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set Cover Problem
We study the classic set cover problem in the streaming model: the sets that
comprise the instance are revealed one by one in a stream and the goal is to
solve the problem by making one or few passes over the stream while maintaining
a sublinear space in the input size; here denotes the number of the
sets and is the universe size. Notice that in this model, we are mainly
concerned with the space requirement of the algorithms and hence do not
restrict their computation time.
Our main result is a resolution of the space-approximation tradeoff for the
streaming set cover problem: we show that any -approximation algorithm
for the set cover problem requires space,
even if it is allowed polylog passes over the stream, and even if the
sets are arriving in a random order in the stream. This space-approximation
tradeoff matches the best known bounds achieved by the recent algorithm of
Har-Peled et.al. (PODS 2016) that requires only passes over the
stream in an adversarial order, hence settling the space complexity of
approximating the set cover problem in data streams in a quite robust manner.
Additionally, our approach yields tight lower bounds for the space complexity
of -approximating the streaming maximum coverage problem studied
in several recent works
Simple Round Compression for Parallel Vertex Cover
Recently, Czumaj et.al. (arXiv 2017) presented a parallel (almost)
-approximation algorithm for the maximum matching problem in only
rounds of the massive parallel computation (MPC)
framework, when the memory per machine is . The main approach in their
work is a way of compressing rounds of a distributed algorithm for
maximum matching into only MPC rounds.
In this note, we present a similar algorithm for the closely related problem
of approximating the minimum vertex cover in the MPC framework. We show that
one can achieve an approximation to minimum vertex cover in only
MPC rounds when the memory per machine is . Our
algorithm for vertex cover is similar to the maximum matching algorithm of
Czumaj et.al. but avoids many of the intricacies in their approach and as a
result admits a considerably simpler analysis (at a cost of a worse
approximation guarantee). We obtain this result by modifying a previous
parallel algorithm by Khanna and the author (SPAA 2017) for vertex cover that
allowed for compressing rounds of a distributed algorithm into
constant MPC rounds when the memory allowed per machine is
Randomized Composable Coresets for Matching and Vertex Cover
A common approach for designing scalable algorithms for massive data sets is
to distribute the computation across, say , machines and process the data
using limited communication between them. A particularly appealing framework
here is the simultaneous communication model whereby each machine constructs a
small representative summary of its own data and one obtains an
approximate/exact solution from the union of the representative summaries. If
the representative summaries needed for a problem are small, then this results
in a communication-efficient and round-optimal protocol. While many fundamental
graph problems admit efficient solutions in this model, two prominent problems
are notably absent from the list of successes, namely, the maximum matching
problem and the minimum vertex cover problem. Indeed, it was shown recently
that for both these problems, even achieving a polylog approximation
requires essentially sending the entire input graph from each machine.
The main insight of our work is that the intractability of matching and
vertex cover in the simultaneous communication model is inherently connected to
an adversarial partitioning of the underlying graph across machines. We show
that when the underlying graph is randomly partitioned across machines, both
these problems admit randomized composable coresets of size
that yield an -approximate solution. This results in an
-approximation simultaneous protocol for these problems with
total communication when the input is randomly partitioned
across machines. We further prove the optimality of our results. Finally,
by a standard application of composable coresets, our results also imply
MapReduce algorithms with the same approximation guarantee in one or two rounds
of communicatio
On Decidability of the Ordered Structures of Numbers
The ordered structures of natural, integer, rational and real numbers are
studied here. It is known that the theories of these numbers in the language of
order are decidable and finitely axiomatizable. Also, their theories in the
language of order and addition are decidable and infinitely axiomatizable. For
the language of order and multiplication, it is known that the theories of
and are not decidable (and so not axiomatizable by
any computably enumerable set of sentences). By Tarski's theorem, the
multiplicative ordered structure of is decidable also; here we
prove this result directly and present an axiomatization. The structure of
in the language of order and multiplication seems to be missing in
the literature; here we show the decidability of its theory by the technique of
quantifier elimination and after presenting an infinite axiomatization for this
structure we prove that it is not finitely axiomatizable.Comment: 17 page
Polynomial Pass Lower Bounds for Graph Streaming Algorithms
We present new lower bounds that show that a polynomial number of passes are
necessary for solving some fundamental graph problems in the streaming model of
computation. For instance, we show that any streaming algorithm that finds a
weighted minimum - cut in an -vertex undirected graph requires
space unless it makes passes over the stream.
To prove our lower bounds, we introduce and analyze a new four-player
communication problem that we refer to as the hidden-pointer chasing problem.
This is a problem in spirit of the standard pointer chasing problem with the
key difference that the pointers in this problem are hidden to players and
finding each one of them requires solving another communication problem, namely
the set intersection problem. Our lower bounds for graph problems are then
obtained by reductions from the hidden-pointer chasing problem.
Our hidden-pointer chasing problem appears flexible enough to find other
applications and is therefore interesting in its own right. To showcase this,
we further present an interesting application of this problem beyond streaming
algorithms. Using a reduction from hidden-pointer chasing, we prove that any
algorithm for submodular function minimization needs to make value
queries to the function unless it has a polynomial degree of adaptivity
Tight Bounds for Single-Pass Streaming Complexity of the Set Cover Problem
We resolve the space complexity of single-pass streaming algorithms for
approximating the classic set cover problem. For finding an
-approximate set cover (for any ) using a
single-pass streaming algorithm, we show that space is both
sufficient and necessary (up to an factor); here denotes
number of the sets and denotes size of the universe. This provides a strong
negative answer to the open question posed by Indyk et al. (2015) regarding the
possibility of having a single-pass algorithm with a small approximation factor
that uses sub-linear space.
We further study the problem of estimating the size of a minimum set cover
(as opposed to finding the actual sets), and establish that an additional
factor of saving in the space is achievable in this case and that this
is the best possible. In other words, we show that space
is both sufficient and necessary (up to logarithmic factors) for estimating the
size of a minimum set cover to within a factor of . Our algorithm in
fact works for the more general problem of estimating the optimal value of a
covering integer program. On the other hand, our lower bound holds even for set
cover instances where the sets are presented in a random order
The Stochastic Matching Problem: Beating Half with a Non-Adaptive Algorithm
In the stochastic matching problem, we are given a general (not necessarily
bipartite) graph , where each edge in is realized with some
constant probability and the goal is to compute a bounded-degree
(bounded by a function depending only on ) subgraph of such that the
expected maximum matching size in is close to the expected maximum matching
size in . The algorithms in this setting are considered non-adaptive as they
have to choose the subgraph without knowing any information about the set
of realized edges in . Originally motivated by an application to kidney
exchange, the stochastic matching problem and its variants have received
significant attention in recent years.
The state-of-the-art non-adaptive algorithms for stochastic matching achieve
an approximation ratio of for any ,
naturally raising the question that if is the limit of what can be
achieved with a non-adaptive algorithm. In this work, we resolve this question
by presenting the first algorithm for stochastic matching with an approximation
guarantee that is strictly better than : the algorithm computes a subgraph
of with the maximum degree such that the
ratio of expected size of a maximum matching in realizations of and is
at least for some absolute constant . The degree
bound on achieved by our algorithm is essentially the best possible (up to
an factor) for any constant factor approximation algorithm,
since an degree in is necessary for a vertex to
acquire at least one incident edge in a realization
Online Assignment of Heterogeneous Tasks in Crowdsourcing Markets
We investigate the problem of heterogeneous task assignment in crowdsourcing
markets from the point of view of the requester, who has a collection of tasks.
Workers arrive online one by one, and each declare a set of feasible tasks they
can solve, and desired payment for each feasible task. The requester must
decide on the fly which task (if any) to assign to the worker, while assigning
workers only to feasible tasks. The goal is to maximize the number of assigned
tasks with a fixed overall budget.
We provide an online algorithm for this problem and prove an upper bound on
the competitive ratio of this algorithm against an arbitrary (possibly
worst-case) sequence of workers who want small payments relative to the
requester's total budget. We further show an almost matching lower bound on the
competitive ratio of any algorithm in this setting. Finally, we propose a
different algorithm that achieves an improved competitive ratio in the random
permutation model, where the order of arrival of the workers is chosen
uniformly at random. Apart from these strong theoretical guarantees, we carry
out experiments on simulated data which demonstrates the practical
applicability of our algorithms.Comment: Extended version of paper in HCOMP 201
Distributed and Streaming Linear Programming in Low Dimensions
We study linear programming and general LP-type problems in several big data
(streaming and distributed) models. We mainly focus on low dimensional problems
in which the number of constraints is much larger than the number of variables.
Low dimensional LP-type problems appear frequently in various machine learning
tasks such as robust regression, support vector machines, and core vector
machines. As supporting large-scale machine learning queries in database
systems has become an important direction for database research, obtaining
efficient algorithms for low dimensional LP-type problems on massive datasets
is of great value. In this paper we give both upper and lower bounds for
LP-type problems in distributed and streaming models. Our bounds are almost
tight when the dimensionality of the problem is a fixed constant.Comment: To appear in PODS'19; 28 page
Stochastic Submodular Cover with Limited Adaptivity
In the submodular cover problem, we are given a non-negative monotone
submodular function over a ground set of items, and the goal is to
choose a smallest subset such that where .
In the stochastic version of the problem, we are given stochastic items
which are different random variables that independently realize to some item in
, and the goal is to find a smallest set of stochastic items whose
realization satisfies . The problem captures as a special case
the stochastic set cover problem and more generally, stochastic covering
integer programs.
We define an -round adaptive algorithm to be an algorithm that chooses a
permutation of all available items in each round , and a threshold
, and realizes items in the order specified by the permutation until
the function value is at least . The permutation for each round is
chosen adaptively based on the realization in the previous rounds, but the
ordering inside each round remains fixed regardless of the realizations seen
inside the round. Our main result is that for any integer , there exists a
poly-time -round adaptive algorithm for stochastic submodular cover whose
expected cost is times the expected cost of a fully
adaptive algorithm. Prior to our work, such a result was not known even for the
case of and when is the coverage function. On the other hand, we show
that for any , there exist instances of the stochastic submodular cover
problem where no -round adaptive algorithm can achieve better than
approximation to the expected cost of a fully adaptive
algorithm. Our lower bound result holds even for coverage function and for
algorithms with unbounded computational power
- …