247 research outputs found
Incidence Geometries and the Pass Complexity of Semi-Streaming Set Cover
Set cover, over a universe of size , may be modelled as a data-streaming
problem, where the sets that comprise the instance are to be read one by
one. A semi-streaming algorithm is allowed only space to process this stream. For each , we give a very
simple deterministic algorithm that makes passes over the input stream and
returns an appropriately certified -approximation to the
optimum set cover. More importantly, we proceed to show that this approximation
factor is essentially tight, by showing that a factor better than
is unachievable for a -pass semi-streaming
algorithm, even allowing randomisation. In particular, this implies that
achieving a -approximation requires
passes, which is tight up to the factor. These results extend to a
relaxation of the set cover problem where we are allowed to leave an
fraction of the universe uncovered: the tight bounds on the best
approximation factor achievable in passes turn out to be
. Our lower bounds are based
on a construction of a family of high-rank incidence geometries, which may be
thought of as vast generalisations of affine planes. This construction, based
on algebraic techniques, appears flexible enough to find other applications and
is therefore interesting in its own right.Comment: 20 page
Unifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering
Graph clustering, or community detection, is the task of identifying groups
of closely related objects in a large network. In this paper we introduce a new
community-detection framework called LambdaCC that is based on a specially
weighted version of correlation clustering. A key component in our methodology
is a clustering resolution parameter, , which implicitly controls the
size and structure of clusters formed by our framework. We show that, by
increasing this parameter, our objective effectively interpolates between two
different strategies in graph clustering: finding a sparse cut and forming
dense subgraphs. Our methodology unifies and generalizes a number of other
important clustering quality functions including modularity, sparsest cut, and
cluster deletion, and places them all within the context of an optimization
problem that has been well studied from the perspective of approximation
algorithms. Our approach is particularly relevant in the regime of finding
dense clusters, as it leads to a 2-approximation for the cluster deletion
problem. We use our approach to cluster several graphs, including large
collaboration networks and social networks
Precedence-Constrained Min Sum Set Cover
We introduce a version of the Min Sum Set Cover (MSSC) problem in which there are "AND" precedence constraints on the m sets. In the Precedence-Constrained Min Sum Set Cover (PCMSSC) problem, when interpreted as directed edges, the constraints induce an acyclic directed graph. PCMSSC models the aim of scheduling software tests to prioritize the rate of fault detection subject to dependencies between tests.
Our greedy scheme for PCMSSC is similar to the approaches of Feige, Lovasz, and, Tetali for MSSC, and Chekuri and Motwani for precedence-constrained scheduling to minimize weighted completion time. With a factor-4 increase in approximation ratio, we reduce PCMSSC to the problem of
finding a maximum-density precedence-closed sub-family of sets, where density is the ratio of sub-family union size to cardinality. We provide a greedy factor-sqrt m algorithm for maximizing density; on forests of in-trees, we show this algorithm finds an optimal solution. Harnessing an alternative greedy argument of Chekuri and Kumar for Maximum Coverage with Group Budget Constraints, on forests of out-trees, we design an algorithm with approximation ratio equal to maximum tree height.
Finally, with a reduction from the Planted Dense Subgraph detection problem, we show that its conjectured hardness implies there is no polynomial-time algorithm for PCMSSC with approximation factor in O(m^{1/12-epsilon})
On Optimal Arrangements of Binary Sensors
A large range of monitoring applications can benefit from binary sensor networks. Binary sensors can detect the presence or absence of a particular target in their sensing regions. They can be used to partition a monitored area and provide localization functionality. If many of these sensors are deployed to monitor an area, the area is partitioned into sub-regions: each sub-region is characterized by the sensors detecting targets within it. We aim to maximize the number of unique, distinguishable sub-regions. Our goal is an optimal placement of both omni-directional and directional static binary sensors. We compute an upper bound on the number of unique sub-regions, which grows quadratically with respect to the number of sensors. In particular, we propose arrangements of sensors within a monitored area whose number of unique sub-regions is asymptotically equivalent to the upper bound
Recommended from our members
On Approximating Target Set Selection
We study the Target Set Selection (TSS) problem introduced by Kempe, Kleinberg, and Tardos (2003). This problem models the propagation of influence in a network, in a sequence of rounds. A set of nodes is made "active" initially. In each subsequent round, a vertex is activated if at least a certain number of its neighbors are (already) active. In the minimization version, the goal is to activate a small set of vertices initially - a seed, or target, set - so that activation spreads to the entire graph. In the absence of a sublinear-factor algorithm for the general version, we provide a (sublinear) approximation algorithm for the bounded-round version, where the goal is to activate all the vertices in r rounds. Assuming a known conjecture on the hardness of Planted Dense Subgraph, we establish hardness-of-approximation results for the bounded-round version. We show that they translate to general Target Set Selection, leading to a hardness factor of n^(1/2-epsilon) for all epsilon > 0. This is the first polynomial hardness result for Target Set Selection, and the strongest conditional result known for a large class of monotone satisfiability problems. In the maximization version of TSS, the goal is to pick a target set of size k so as to maximize the number of nodes eventually active. We show an n^(1-epsilon) hardness result for the undirected maximization version of the problem, thus establishing that the undirected case is as hard as the directed case. Finally, we demonstrate an SETH lower bound for the exact computation of the optimal seed set
Maximum Coverage in Sublinear Space, Faster
Given a collection of m sets from a universe ?, the Maximum Set Coverage problem consists of finding k sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor 1-1/e. However, this algorithm does not scale well with the input size.
In a streaming context, practical high-quality solutions are found, but with space complexity that scales linearly with respect to the size of the universe n = |?|. However, one randomized streaming algorithm has been shown to produce a 1-1/e-? approximation of the optimal solution with a space complexity that scales only poly-logarithmically with respect to m and n. In order to achieve such a low space complexity, the authors used two techniques in their multi-pass approach:
- F?-sketching, allows to determine with great accuracy the number of distinct elements in a set using less space than the set itself.
- Subsampling, consists of only solving the problem on a subspace of the universe. It is implemented using ?-independent hash functions.
This article focuses on the sublinear-space algorithm and highlights the time cost of these two techniques, especially subsampling. We present optimizations that significantly reduce the time complexity of the algorithm. Firstly, we give some optimizations that do not alter the space complexity, number of passes and approximation quality of the original algorithm. In particular, we reanalyze the error bounds to show that the original independence factor of ?(?^{-2} k log m) can be fine-tuned to ?(k log m); we also show how F?-sketching can be removed. Secondly, we derive a new lower bound for the probability of producing a 1-1/e-? approximation using only pairwise independence: 1- (4/(c k log m)) compared to 1-(2e/(m^{ck/6})) with ?(k log m)-independence.
Although the theoretical guarantees are weaker, suggesting the approximation quality would suffer, for large streams, our algorithms perform well in practice. Finally, our experimental results show that even a pairwise-independent hash-function sampler does not produce worse solution than the original algorithm, while running significantly faster by several orders of magnitude
Tight Data Access Bounds for Private Top- Selection
We study the top- selection problem under the differential privacy model:
items are rated according to votes of a set of clients. We consider a
setting in which algorithms can retrieve data via a sequence of accesses, each
either a random access or a sorted access; the goal is to minimize the total
number of data accesses. Our algorithm requires only expected
accesses: to our knowledge, this is the first sublinear data-access upper bound
for this problem. Our analysis also shows that the well-known exponential
mechanism requires only expected accesses. Accompanying this, we
develop the first lower bounds for the problem, in three settings: only random
accesses; only sorted accesses; a sequence of accesses of either kind. We show
that, to avoid access cost, supporting *both* kinds of access is
necessary, and that in this case our algorithm's access cost is optimal
Result-Sensitive Binary Search with Noisy Information
We describe new algorithms for the predecessor problem in the Noisy Comparison Model. In this problem, given a sorted list L of n (distinct) elements and a query q, we seek the predecessor of q in L: denoted by u, the largest element less than or equal to q. In the Noisy Comparison Model, the result of a comparison between two elements is non-deterministic. Moreover, multiple comparisons of the same pair of elements might have different results: each is generated independently, and is correct with probability p > 1/2. Given an overall error tolerance Q, the cost of an algorithm is measured by the total number of noisy comparisons; these must guarantee the predecessor is returned with probability at least 1 - Q. Feige et al. showed that predecessor queries can be answered by a modified binary search with Theta(log (n/Q)) noisy comparisons.
We design result-sensitive algorithms for answering predecessor queries. The query cost is related to the index, k, of the predecessor u in L. Our first algorithm answers predecessor queries with O(log ((log^{*(c)} n)/Q) + log (k/Q)) noisy comparisons, for an arbitrarily large constant c. The function log^{*(c)} n iterates c times the iterated-logarithm function, log^* n. Our second algorithm is a genuinely result-sensitive algorithm whose expected query cost is bounded by O(log (k/Q)), and is guaranteed to terminate after at most O(log((log n)/Q)) noisy comparisons.
Our results strictly improve the state-of-the-art bounds when k is in omega(1) intersected with o(n^epsilon), where epsilon > 0 is some constant. Moreover, we show that our result-sensitive algorithms immediately improve not only predecessor-query algorithms, but also binary-search-like algorithms for solving key applications
Recency Queries with Succinct Representation
In the context of the sliding-window set membership problem, and caching policies that require knowledge of item recency, we formalize the problem of Recency on a stream. Informally, the query asks, "when was the last time I saw item x?" Existing structures, such as hash tables, can support a recency query by augmenting item occurrences with timestamps. To support recency queries on a window of W items, this might require ?(W log W) bits.
We propose a succinct data structure for Recency. By combining sliding-window dictionaries in a hierarchical structure, and careful design of the underlying hash tables, we achieve a data structure that returns a 1+? approximation to the recency of every item in O(log(? W)) time, in only (1+o(1))(1+?)(?+Wlog(?^(-1))) bits. Here, ? is the information-theoretic lower bound on the number of bits for a set of size W, in a universe of cardinality N
- …