1,481 research outputs found
Recommended from our members
Sketching and streaming algorithms for processing massive data
The rate at which electronic information is generated in the world is exploding. In this article we explore techniques known as sketching and streaming for processing massive data both quickly and memory-efficiently.Engineering and Applied Science
Sketching Cuts in Graphs and Hypergraphs
Sketching and streaming algorithms are in the forefront of current research
directions for cut problems in graphs. In the streaming model, we show that
-approximation for Max-Cut must use space;
moreover, beating -approximation requires polynomial space. For the
sketching model, we show that -uniform hypergraphs admit a
-cut-sparsifier (i.e., a weighted subhypergraph that
approximately preserves all the cuts) with
edges. We also make first steps towards sketching general CSPs (Constraint
Satisfaction Problems)
Almost Optimal Streaming Algorithms for Coverage Problems
Maximum coverage and minimum set cover problems --collectively called
coverage problems-- have been studied extensively in streaming models. However,
previous research not only achieve sub-optimal approximation factors and space
complexities, but also study a restricted set arrival model which makes an
explicit or implicit assumption on oracle access to the sets, ignoring the
complexity of reading and storing the whole set at once. In this paper, we
address the above shortcomings, and present algorithms with improved
approximation factor and improved space complexity, and prove that our results
are almost tight. Moreover, unlike most of previous work, our results hold on a
more general edge arrival model. More specifically, we present (almost) optimal
approximation algorithms for maximum coverage and minimum set cover problems in
the streaming model with an (almost) optimal space complexity of
, i.e., the space is {\em independent of the size of the sets or
the size of the ground set of elements}. These results not only improve over
the best known algorithms for the set arrival model, but also are the first
such algorithms for the more powerful {\em edge arrival} model. In order to
achieve the above results, we introduce a new general sketching technique for
coverage functions: This sketching scheme can be applied to convert an
-approximation algorithm for a coverage problem to a
(1-\eps)\alpha-approximation algorithm for the same problem in streaming, or
RAM models. We show the significance of our sketching technique by ruling out
the possibility of solving coverage problems via accessing (as a black box) a
(1 \pm \eps)-approximate oracle (e.g., a sketch function) that estimates the
coverage function on any subfamily of the sets
Fully decentralized computation of aggregates over data streams
In several emerging applications, data is collected in massive streams at several distributed points of observation. A basic and challenging task is to allow every node to monitor a neighbourhood of interest by issuing continuous aggregate queries on the streams observed in its vicinity. This class of algorithms is fully decentralized and diffusive in nature: collecting all data at few central nodes of the network is unfeasible in networks of low capability devices or in the presence of massive data sets. The main difficulty in designing diffusive algorithms is to cope with duplicate detections. These arise both from the observation of the same event at several nodes of the network and/or receipt of the same aggregated information along multiple paths of diffusion. In this paper, we consider fully decentralized algorithms that answer locally continuous aggregate queries on the number of distinct events, total number of events and the second frequency moment in the scenario outlined above. The proposed algorithms use in the worst case or on realistic distributions sublinear space at every node. We also propose strategies that minimize the communication needed to update the aggregates when new events are observed. We experimentally evaluate for the efficiency and accuracy of our algorithms on realistic simulated scenarios
Approximate F_2-Sketching of Valuation Functions
We study the problem of constructing a linear sketch of minimum dimension that allows approximation of a given real-valued function f : F_2^n - > R with small expected squared error. We develop a general theory of linear sketching for such functions through which we analyze their dimension for most commonly studied types of valuation functions: additive, budget-additive, coverage, alpha-Lipschitz submodular and matroid rank functions. This gives a characterization of how many bits of information have to be stored about the input x so that one can compute f under additive updates to its coordinates.
Our results are tight in most cases and we also give extensions to the distributional version of the problem where the input x in F_2^n is generated uniformly at random. Using known connections with dynamic streaming algorithms, both upper and lower bounds on dimension obtained in our work extend to the space complexity of algorithms evaluating f(x) under long sequences of additive updates to the input x presented as a stream. Similar results hold for simultaneous communication in a distributed setting
- …