    Better Streaming Algorithms for the Maximum Coverage Problem

    We study the classic NP-Hard problem of finding the maximum k-set coverage in the data stream model: given a set system of m sets that are subsets of a universe {1,...,n}, find the k sets that cover the most number of distinct elements. The problem can be approximated up to a factor 1-1/e in polynomial time. In the streaming-set model, the sets and their elements are revealed online. The main goal of our work is to design algorithms, with approximation guarantees as close as possible to 1-1/e, that use sublinear space o(mn). Our main results are: 1) Two (1-1/e-epsilon) approximation algorithms: One uses O(1/epsilon) passes and O(k/epsilon^2 polylog(m,n)) space whereas the other uses only a single pass but O(m/epsilon^2 polylog(m,n)) space. 2) We show that any approximation factor better than (1-(1-1/k)^k) in constant passes require space that is linear in m for constant k even if the algorithm is allowed unbounded processing time. We also demonstrate a single-pass, (1-epsilon) approximation algorithm using O(m/epsilon^2 min(k,1/epsilon) polylog(m,n)) space. We also study the maximum k-vertex coverage problem in the dynamic graph stream model. In this model, the stream consists of edge insertions and deletions of a graph on N vertices. The goal is to find k vertices that cover the most number of distinct edges. We show that any constant approximation in constant passes requires space that is linear in N for constant k whereas O(N/epsilon^2 polylog(m,n)) space is sufficient for a (1-epsilon) approximation and arbitrary k in a single pass. For regular graphs, we show that O(k/epsilon^3 polylog(m,n)) space is sufficient for a (1-epsilon) approximation in a single pass. We generalize this to a K-epsilon approximation when the ratio between the minimum and maximum degree is bounded below by K

    Maximum Coverage in Sublinear Space, Faster

    Given a collection of m sets from a universe ?, the Maximum Set Coverage problem consists of finding k sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor 1-1/e. However, this algorithm does not scale well with the input size. In a streaming context, practical high-quality solutions are found, but with space complexity that scales linearly with respect to the size of the universe n = |?|. However, one randomized streaming algorithm has been shown to produce a 1-1/e-? approximation of the optimal solution with a space complexity that scales only poly-logarithmically with respect to m and n. In order to achieve such a low space complexity, the authors used two techniques in their multi-pass approach: - F?-sketching, allows to determine with great accuracy the number of distinct elements in a set using less space than the set itself. - Subsampling, consists of only solving the problem on a subspace of the universe. It is implemented using ?-independent hash functions. This article focuses on the sublinear-space algorithm and highlights the time cost of these two techniques, especially subsampling. We present optimizations that significantly reduce the time complexity of the algorithm. Firstly, we give some optimizations that do not alter the space complexity, number of passes and approximation quality of the original algorithm. In particular, we reanalyze the error bounds to show that the original independence factor of ?(?^{-2} k log m) can be fine-tuned to ?(k log m); we also show how F?-sketching can be removed. Secondly, we derive a new lower bound for the probability of producing a 1-1/e-? approximation using only pairwise independence: 1- (4/(c k log m)) compared to 1-(2e/(m^{ck/6})) with ?(k log m)-independence. Although the theoretical guarantees are weaker, suggesting the approximation quality would suffer, for large streams, our algorithms perform well in practice. Finally, our experimental results show that even a pairwise-independent hash-function sampler does not produce worse solution than the original algorithm, while running significantly faster by several orders of magnitude

    Optimal Bounds for Dominating Set in Graph Streams

    Maximum Coverage in Random-Arrival Streams

    Final report on the evaluation of RRM/CRRM algorithms

    Deliverable public del projecte EVERESTThis deliverable provides a definition and a complete evaluation of the RRM/CRRM algorithms selected in D11 and D15, and evolved and refined on an iterative process. The evaluation will be carried out by means of simulations using the simulators provided at D07, and D14.Preprin

    Maximum Coverage in the Data Stream Model: Parameterized and Generalized

    We present algorithms for the Max-Cover and Max-Unique-Cover problems in the data stream model. The input to both problems are mm subsets of a universe of size nn and a value k[m]k\in [m]. In Max-Cover, the problem is to find a collection of at most kk sets such that the number of elements covered by at least one set is maximized. In Max-Unique-Cover, the problem is to find a collection of at most kk sets such that the number of elements covered by exactly one set is maximized. Our goal is to design single-pass algorithms that use space that is sublinear in the input size. Our main algorithmic results are: If the sets have size at most dd, there exist single-pass algorithms using O~(dd+1kd)\tilde{O}(d^{d+1} k^d) space that solve both problems exactly. This is optimal up to polylogarithmic factors for constant dd. If each element appears in at most rr sets, we present single pass algorithms using O~(k2r/ϵ3)\tilde{O}(k^2 r/\epsilon^3) space that return a 1+ϵ1+\epsilon approximation in the case of Max-Cover. We also present a single-pass algorithm using slightly more memory, i.e., O~(k3r/ϵ4)\tilde{O}(k^3 r/\epsilon^{4}) space, that 1+ϵ1+\epsilon approximates Max-Unique-Cover. In contrast to the above results, when dd and rr are arbitrary, any constant pass 1+ϵ1+\epsilon approximation algorithm for either problem requires Ω(ϵ2m)\Omega(\epsilon^{-2}m) space but a single pass O(ϵ2mk)O(\epsilon^{-2}mk) space algorithm exists. In fact any constant-pass algorithm with an approximation better than e/(e1)e/(e-1) and e11/ke^{1-1/k} for Max-Cover and Max-Unique-Cover respectively requires Ω(m/k2)\Omega(m/k^2) space when dd and rr are unrestricted. En route, we also obtain an algorithm for a parameterized version of the streaming Set-Cover problem.Comment: Conference version to appear at ICDT 202

    Real-Time Scheduling for Content Broadcasting in LTE

    Broadcasting capabilities are one of the most promising features of upcoming LTE-Advanced networks. However, the task of scheduling broadcasting sessions is far from trivial, since it affects the available resources of several contiguous cells as well as the amount of resources that can be devoted to unicast traffic. In this paper, we present a compact, convenient model for broadcasting in LTE, as well as a set of efficient algorithms to define broadcasting areas and to actually perform content scheduling. We study the performance of our algorithms in a realistic scenario, deriving interesting insights on the possible trade-offs between effectiveness and computational efficienc

    The Power of Randomization: Distributed Submodular Maximization on Massive Datasets

    A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We develop a simple distributed algorithm that is embarrassingly parallel and it achieves provable, constant factor, worst-case approximation guarantees. In our experiments, we demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting

    High-Dimensional Geometric Streaming in Polynomial Space

    Many existing algorithms for streaming geometric data analysis have been plagued by exponential dependencies in the space complexity, which are undesirable for processing high-dimensional data sets. In particular, once dlognd\geq\log n, there are no known non-trivial streaming algorithms for problems such as maintaining convex hulls and L\"owner-John ellipsoids of nn points, despite a long line of work in streaming computational geometry since [AHV04]. We simultaneously improve these results to poly(d,logn)\mathrm{poly}(d,\log n) bits of space by trading off with a poly(d,logn)\mathrm{poly}(d,\log n) factor distortion. We achieve these results in a unified manner, by designing the first streaming algorithm for maintaining a coreset for \ell_\infty subspace embeddings with poly(d,logn)\mathrm{poly}(d,\log n) space and poly(d,logn)\mathrm{poly}(d,\log n) distortion. Our algorithm also gives similar guarantees in the \emph{online coreset} model. Along the way, we sharpen results for online numerical linear algebra by replacing a log condition number dependence with a logn\log n dependence, answering a question of [BDM+20]. Our techniques provide a novel connection between leverage scores, a fundamental object in numerical linear algebra, and computational geometry. For p\ell_p subspace embeddings, we give nearly optimal trade-offs between space and distortion for one-pass streaming algorithms. For instance, we give a deterministic coreset using O(d2logn)O(d^2\log n) space and O((dlogn)1/21/p)O((d\log n)^{1/2-1/p}) distortion for p>2p>2, whereas previous deterministic algorithms incurred a poly(n)\mathrm{poly}(n) factor in the space or the distortion [CDW18]. Our techniques have implications in the offline setting, where we give optimal trade-offs between the space complexity and distortion of subspace sketch data structures. To do this, we give an elementary proof of a "change of density" theorem of [LT80] and make it algorithmic.Comment: Abstract shortened to meet arXiv limits; v2 fix statements concerning online condition numbe