3,441 research outputs found

    Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set Cover Problem

    Full text link
    We study the classic set cover problem in the streaming model: the sets that comprise the instance are revealed one by one in a stream and the goal is to solve the problem by making one or few passes over the stream while maintaining a sublinear space o(mn)o(mn) in the input size; here mm denotes the number of the sets and nn is the universe size. Notice that in this model, we are mainly concerned with the space requirement of the algorithms and hence do not restrict their computation time. Our main result is a resolution of the space-approximation tradeoff for the streaming set cover problem: we show that any α\alpha-approximation algorithm for the set cover problem requires Ω~(mn1/α)\widetilde{\Omega}(mn^{1/\alpha}) space, even if it is allowed polylog(n){(n)} passes over the stream, and even if the sets are arriving in a random order in the stream. This space-approximation tradeoff matches the best known bounds achieved by the recent algorithm of Har-Peled et.al. (PODS 2016) that requires only O(α)O(\alpha) passes over the stream in an adversarial order, hence settling the space complexity of approximating the set cover problem in data streams in a quite robust manner. Additionally, our approach yields tight lower bounds for the space complexity of (1ϵ)(1- \epsilon)-approximating the streaming maximum coverage problem studied in several recent works

    Simple Round Compression for Parallel Vertex Cover

    Full text link
    Recently, Czumaj et.al. (arXiv 2017) presented a parallel (almost) 22-approximation algorithm for the maximum matching problem in only O((loglogn)2)O({(\log\log{n})^2}) rounds of the massive parallel computation (MPC) framework, when the memory per machine is O(n)O(n). The main approach in their work is a way of compressing O(logn)O(\log{n}) rounds of a distributed algorithm for maximum matching into only O((loglogn)2)O({(\log\log{n})^2}) MPC rounds. In this note, we present a similar algorithm for the closely related problem of approximating the minimum vertex cover in the MPC framework. We show that one can achieve an O(logn)O(\log{n}) approximation to minimum vertex cover in only O(loglogn)O(\log\log{n}) MPC rounds when the memory per machine is O(n)O(n). Our algorithm for vertex cover is similar to the maximum matching algorithm of Czumaj et.al. but avoids many of the intricacies in their approach and as a result admits a considerably simpler analysis (at a cost of a worse approximation guarantee). We obtain this result by modifying a previous parallel algorithm by Khanna and the author (SPAA 2017) for vertex cover that allowed for compressing O(logn)O(\log{n}) rounds of a distributed algorithm into constant MPC rounds when the memory allowed per machine is O(nn)O(n\sqrt{n})

    Randomized Composable Coresets for Matching and Vertex Cover

    Full text link
    A common approach for designing scalable algorithms for massive data sets is to distribute the computation across, say kk, machines and process the data using limited communication between them. A particularly appealing framework here is the simultaneous communication model whereby each machine constructs a small representative summary of its own data and one obtains an approximate/exact solution from the union of the representative summaries. If the representative summaries needed for a problem are small, then this results in a communication-efficient and round-optimal protocol. While many fundamental graph problems admit efficient solutions in this model, two prominent problems are notably absent from the list of successes, namely, the maximum matching problem and the minimum vertex cover problem. Indeed, it was shown recently that for both these problems, even achieving a polylog(n)(n) approximation requires essentially sending the entire input graph from each machine. The main insight of our work is that the intractability of matching and vertex cover in the simultaneous communication model is inherently connected to an adversarial partitioning of the underlying graph across machines. We show that when the underlying graph is randomly partitioned across machines, both these problems admit randomized composable coresets of size O~(n)\widetilde{O}(n) that yield an O~(1)\widetilde{O}(1)-approximate solution. This results in an O~(1)\widetilde{O}(1)-approximation simultaneous protocol for these problems with O~(nk)\widetilde{O}(nk) total communication when the input is randomly partitioned across kk machines. We further prove the optimality of our results. Finally, by a standard application of composable coresets, our results also imply MapReduce algorithms with the same approximation guarantee in one or two rounds of communicatio

    On Decidability of the Ordered Structures of Numbers

    Full text link
    The ordered structures of natural, integer, rational and real numbers are studied here. It is known that the theories of these numbers in the language of order are decidable and finitely axiomatizable. Also, their theories in the language of order and addition are decidable and infinitely axiomatizable. For the language of order and multiplication, it is known that the theories of N\mathbb{N} and Z\mathbb{Z} are not decidable (and so not axiomatizable by any computably enumerable set of sentences). By Tarski's theorem, the multiplicative ordered structure of R\mathbb{R} is decidable also; here we prove this result directly and present an axiomatization. The structure of Q\mathbb{Q} in the language of order and multiplication seems to be missing in the literature; here we show the decidability of its theory by the technique of quantifier elimination and after presenting an infinite axiomatization for this structure we prove that it is not finitely axiomatizable.Comment: 17 page

    Polynomial Pass Lower Bounds for Graph Streaming Algorithms

    Full text link
    We present new lower bounds that show that a polynomial number of passes are necessary for solving some fundamental graph problems in the streaming model of computation. For instance, we show that any streaming algorithm that finds a weighted minimum ss-tt cut in an nn-vertex undirected graph requires n2o(1)n^{2-o(1)} space unless it makes nΩ(1)n^{\Omega(1)} passes over the stream. To prove our lower bounds, we introduce and analyze a new four-player communication problem that we refer to as the hidden-pointer chasing problem. This is a problem in spirit of the standard pointer chasing problem with the key difference that the pointers in this problem are hidden to players and finding each one of them requires solving another communication problem, namely the set intersection problem. Our lower bounds for graph problems are then obtained by reductions from the hidden-pointer chasing problem. Our hidden-pointer chasing problem appears flexible enough to find other applications and is therefore interesting in its own right. To showcase this, we further present an interesting application of this problem beyond streaming algorithms. Using a reduction from hidden-pointer chasing, we prove that any algorithm for submodular function minimization needs to make n2o(1)n^{2-o(1)} value queries to the function unless it has a polynomial degree of adaptivity

    Tight Bounds for Single-Pass Streaming Complexity of the Set Cover Problem

    Full text link
    We resolve the space complexity of single-pass streaming algorithms for approximating the classic set cover problem. For finding an α\alpha-approximate set cover (for any α=o(n)\alpha= o(\sqrt{n})) using a single-pass streaming algorithm, we show that Θ(mn/α)\Theta(mn/\alpha) space is both sufficient and necessary (up to an O(logn)O(\log{n}) factor); here mm denotes number of the sets and nn denotes size of the universe. This provides a strong negative answer to the open question posed by Indyk et al. (2015) regarding the possibility of having a single-pass algorithm with a small approximation factor that uses sub-linear space. We further study the problem of estimating the size of a minimum set cover (as opposed to finding the actual sets), and establish that an additional factor of α\alpha saving in the space is achievable in this case and that this is the best possible. In other words, we show that Θ(mn/α2)\Theta(mn/\alpha^2) space is both sufficient and necessary (up to logarithmic factors) for estimating the size of a minimum set cover to within a factor of α\alpha. Our algorithm in fact works for the more general problem of estimating the optimal value of a covering integer program. On the other hand, our lower bound holds even for set cover instances where the sets are presented in a random order

    The Stochastic Matching Problem: Beating Half with a Non-Adaptive Algorithm

    Full text link
    In the stochastic matching problem, we are given a general (not necessarily bipartite) graph G(V,E)G(V,E), where each edge in EE is realized with some constant probability p>0p > 0 and the goal is to compute a bounded-degree (bounded by a function depending only on pp) subgraph HH of GG such that the expected maximum matching size in HH is close to the expected maximum matching size in GG. The algorithms in this setting are considered non-adaptive as they have to choose the subgraph HH without knowing any information about the set of realized edges in GG. Originally motivated by an application to kidney exchange, the stochastic matching problem and its variants have received significant attention in recent years. The state-of-the-art non-adaptive algorithms for stochastic matching achieve an approximation ratio of 12ϵ\frac{1}{2}-\epsilon for any ϵ>0\epsilon > 0, naturally raising the question that if 1/21/2 is the limit of what can be achieved with a non-adaptive algorithm. In this work, we resolve this question by presenting the first algorithm for stochastic matching with an approximation guarantee that is strictly better than 1/21/2: the algorithm computes a subgraph HH of GG with the maximum degree O(log(1/p)p)O(\frac{\log{(1/ p)}}{p}) such that the ratio of expected size of a maximum matching in realizations of HH and GG is at least 1/2+δ01/2+\delta_0 for some absolute constant δ0>0\delta_0 > 0. The degree bound on HH achieved by our algorithm is essentially the best possible (up to an O(log(1/p))O(\log{(1/p)}) factor) for any constant factor approximation algorithm, since an Ω(1p)\Omega(\frac{1}{p}) degree in HH is necessary for a vertex to acquire at least one incident edge in a realization

    Online Assignment of Heterogeneous Tasks in Crowdsourcing Markets

    Full text link
    We investigate the problem of heterogeneous task assignment in crowdsourcing markets from the point of view of the requester, who has a collection of tasks. Workers arrive online one by one, and each declare a set of feasible tasks they can solve, and desired payment for each feasible task. The requester must decide on the fly which task (if any) to assign to the worker, while assigning workers only to feasible tasks. The goal is to maximize the number of assigned tasks with a fixed overall budget. We provide an online algorithm for this problem and prove an upper bound on the competitive ratio of this algorithm against an arbitrary (possibly worst-case) sequence of workers who want small payments relative to the requester's total budget. We further show an almost matching lower bound on the competitive ratio of any algorithm in this setting. Finally, we propose a different algorithm that achieves an improved competitive ratio in the random permutation model, where the order of arrival of the workers is chosen uniformly at random. Apart from these strong theoretical guarantees, we carry out experiments on simulated data which demonstrates the practical applicability of our algorithms.Comment: Extended version of paper in HCOMP 201

    Distributed and Streaming Linear Programming in Low Dimensions

    Full text link
    We study linear programming and general LP-type problems in several big data (streaming and distributed) models. We mainly focus on low dimensional problems in which the number of constraints is much larger than the number of variables. Low dimensional LP-type problems appear frequently in various machine learning tasks such as robust regression, support vector machines, and core vector machines. As supporting large-scale machine learning queries in database systems has become an important direction for database research, obtaining efficient algorithms for low dimensional LP-type problems on massive datasets is of great value. In this paper we give both upper and lower bounds for LP-type problems in distributed and streaming models. Our bounds are almost tight when the dimensionality of the problem is a fixed constant.Comment: To appear in PODS'19; 28 page

    Stochastic Submodular Cover with Limited Adaptivity

    Full text link
    In the submodular cover problem, we are given a non-negative monotone submodular function ff over a ground set EE of items, and the goal is to choose a smallest subset SES \subseteq E such that f(S)=Qf(S) = Q where Q=f(E)Q = f(E). In the stochastic version of the problem, we are given mm stochastic items which are different random variables that independently realize to some item in EE, and the goal is to find a smallest set of stochastic items whose realization RR satisfies f(R)=Qf(R) = Q. The problem captures as a special case the stochastic set cover problem and more generally, stochastic covering integer programs. We define an rr-round adaptive algorithm to be an algorithm that chooses a permutation of all available items in each round k[r]k \in [r], and a threshold τk\tau_k, and realizes items in the order specified by the permutation until the function value is at least τk\tau_k. The permutation for each round kk is chosen adaptively based on the realization in the previous rounds, but the ordering inside each round remains fixed regardless of the realizations seen inside the round. Our main result is that for any integer rr, there exists a poly-time rr-round adaptive algorithm for stochastic submodular cover whose expected cost is O~(Q1/r)\tilde{O}(Q^{{1}/{r}}) times the expected cost of a fully adaptive algorithm. Prior to our work, such a result was not known even for the case of r=1r=1 and when ff is the coverage function. On the other hand, we show that for any rr, there exist instances of the stochastic submodular cover problem where no rr-round adaptive algorithm can achieve better than Ω(Q1/r)\Omega(Q^{{1}/{r}}) approximation to the expected cost of a fully adaptive algorithm. Our lower bound result holds even for coverage function and for algorithms with unbounded computational power
    corecore