2,722 research outputs found
Better Streaming Algorithms for the Maximum Coverage Problem
We study the classic NP-Hard problem of finding the maximum k-set coverage in the data stream model: given a set system of m sets that are subsets of a universe {1,...,n}, find the k sets that cover the most number of distinct elements. The problem can be approximated up to a factor 1-1/e in polynomial time. In the streaming-set model, the sets and their elements are revealed online. The main goal of our work is to design algorithms, with approximation guarantees as close as possible to 1-1/e, that use sublinear space o(mn). Our main results are: 1) Two (1-1/e-epsilon) approximation algorithms: One uses O(1/epsilon) passes and O(k/epsilon^2 polylog(m,n)) space whereas the other uses only a single pass but O(m/epsilon^2 polylog(m,n)) space. 2) We show that any approximation factor better than (1-(1-1/k)^k) in constant passes require space that is linear in m for constant k even if the algorithm is allowed unbounded processing time. We also demonstrate a single-pass, (1-epsilon) approximation algorithm using O(m/epsilon^2 min(k,1/epsilon) polylog(m,n)) space.
We also study the maximum k-vertex coverage problem in the dynamic graph stream model. In this model, the stream consists of edge insertions and deletions of a graph on N vertices. The goal is to find k vertices that cover the most number of distinct edges. We show that any constant approximation in constant passes requires space that is linear in N for constant k whereas O(N/epsilon^2 polylog(m,n)) space is sufficient for a (1-epsilon) approximation and arbitrary k in a single pass. For regular graphs, we show that O(k/epsilon^3 polylog(m,n)) space is sufficient for a (1-epsilon) approximation in a single pass. We generalize this to a K-epsilon approximation when the ratio between the minimum and maximum degree is bounded below by K
Maximum Coverage in Sublinear Space, Faster
Given a collection of m sets from a universe ?, the Maximum Set Coverage problem consists of finding k sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor 1-1/e. However, this algorithm does not scale well with the input size.
In a streaming context, practical high-quality solutions are found, but with space complexity that scales linearly with respect to the size of the universe n = |?|. However, one randomized streaming algorithm has been shown to produce a 1-1/e-? approximation of the optimal solution with a space complexity that scales only poly-logarithmically with respect to m and n. In order to achieve such a low space complexity, the authors used two techniques in their multi-pass approach:
- F?-sketching, allows to determine with great accuracy the number of distinct elements in a set using less space than the set itself.
- Subsampling, consists of only solving the problem on a subspace of the universe. It is implemented using ?-independent hash functions.
This article focuses on the sublinear-space algorithm and highlights the time cost of these two techniques, especially subsampling. We present optimizations that significantly reduce the time complexity of the algorithm. Firstly, we give some optimizations that do not alter the space complexity, number of passes and approximation quality of the original algorithm. In particular, we reanalyze the error bounds to show that the original independence factor of ?(?^{-2} k log m) can be fine-tuned to ?(k log m); we also show how F?-sketching can be removed. Secondly, we derive a new lower bound for the probability of producing a 1-1/e-? approximation using only pairwise independence: 1- (4/(c k log m)) compared to 1-(2e/(m^{ck/6})) with ?(k log m)-independence.
Although the theoretical guarantees are weaker, suggesting the approximation quality would suffer, for large streams, our algorithms perform well in practice. Finally, our experimental results show that even a pairwise-independent hash-function sampler does not produce worse solution than the original algorithm, while running significantly faster by several orders of magnitude
Final report on the evaluation of RRM/CRRM algorithms
Deliverable public del projecte EVERESTThis deliverable provides a definition and a complete evaluation of the RRM/CRRM algorithms selected in D11 and D15, and evolved and refined on an iterative process. The evaluation will be carried out by means of simulations using the simulators provided at D07, and D14.Preprin
Maximum Coverage in the Data Stream Model: Parameterized and Generalized
We present algorithms for the Max-Cover and Max-Unique-Cover problems in the
data stream model. The input to both problems are subsets of a universe of
size and a value . In Max-Cover, the problem is to find a
collection of at most sets such that the number of elements covered by at
least one set is maximized. In Max-Unique-Cover, the problem is to find a
collection of at most sets such that the number of elements covered by
exactly one set is maximized. Our goal is to design single-pass algorithms that
use space that is sublinear in the input size. Our main algorithmic results
are:
If the sets have size at most , there exist single-pass algorithms using
space that solve both problems exactly. This is
optimal up to polylogarithmic factors for constant .
If each element appears in at most sets, we present single pass
algorithms using space that return a
approximation in the case of Max-Cover. We also present a single-pass algorithm
using slightly more memory, i.e., space, that
approximates Max-Unique-Cover.
In contrast to the above results, when and are arbitrary, any
constant pass approximation algorithm for either problem requires
space but a single pass space
algorithm exists. In fact any constant-pass algorithm with an approximation
better than and for Max-Cover and Max-Unique-Cover
respectively requires space when and are unrestricted.
En route, we also obtain an algorithm for a parameterized version of the
streaming Set-Cover problem.Comment: Conference version to appear at ICDT 202
Real-Time Scheduling for Content Broadcasting in LTE
Broadcasting capabilities are one of the most promising features of upcoming LTE-Advanced networks. However, the task of scheduling broadcasting sessions is far from trivial, since it affects the available resources of several contiguous cells as well as the amount of resources that can be devoted to unicast traffic. In this paper, we present a compact, convenient model for broadcasting in LTE, as well as a set of efficient algorithms to define broadcasting areas and to actually perform content scheduling. We study the performance of our algorithms in a realistic scenario, deriving interesting insights on the possible trade-offs between effectiveness and computational efficienc
The Power of Randomization: Distributed Submodular Maximization on Massive Datasets
A wide variety of problems in machine learning, including exemplar
clustering, document summarization, and sensor placement, can be cast as
constrained submodular maximization problems. Unfortunately, the resulting
submodular optimization problems are often too large to be solved on a single
machine. We develop a simple distributed algorithm that is embarrassingly
parallel and it achieves provable, constant factor, worst-case approximation
guarantees. In our experiments, we demonstrate its efficiency in large problems
with different kinds of constraints with objective values always close to what
is achievable in the centralized setting
High-Dimensional Geometric Streaming in Polynomial Space
Many existing algorithms for streaming geometric data analysis have been
plagued by exponential dependencies in the space complexity, which are
undesirable for processing high-dimensional data sets. In particular, once
, there are no known non-trivial streaming algorithms for problems
such as maintaining convex hulls and L\"owner-John ellipsoids of points,
despite a long line of work in streaming computational geometry since [AHV04].
We simultaneously improve these results to bits of
space by trading off with a factor distortion. We
achieve these results in a unified manner, by designing the first streaming
algorithm for maintaining a coreset for subspace embeddings with
space and distortion. Our
algorithm also gives similar guarantees in the \emph{online coreset} model.
Along the way, we sharpen results for online numerical linear algebra by
replacing a log condition number dependence with a dependence,
answering a question of [BDM+20]. Our techniques provide a novel connection
between leverage scores, a fundamental object in numerical linear algebra, and
computational geometry.
For subspace embeddings, we give nearly optimal trade-offs between
space and distortion for one-pass streaming algorithms. For instance, we give a
deterministic coreset using space and
distortion for , whereas previous deterministic algorithms incurred a
factor in the space or the distortion [CDW18].
Our techniques have implications in the offline setting, where we give
optimal trade-offs between the space complexity and distortion of subspace
sketch data structures. To do this, we give an elementary proof of a "change of
density" theorem of [LT80] and make it algorithmic.Comment: Abstract shortened to meet arXiv limits; v2 fix statements concerning
online condition numbe
- …