Search CORE

247 research outputs found

Incidence Geometries and the Pass Complexity of Semi-Streaming Set Cover

Author: Chakrabarti Amit
Wirth Anthony
Publication venue
Publication date: 16/07/2015
Field of study

Set cover, over a universe of size

n

, may be modelled as a data-streaming problem, where the

m

sets that comprise the instance are to be read one by one. A semi-streaming algorithm is allowed only

O(n\, \mathrm{poly}\{\log n, \log m\})

space to process this stream. For each

p \ge 1

, we give a very simple deterministic algorithm that makes

p

passes over the input stream and returns an appropriately certified

(p+1)n^{1/(p+1)}

-approximation to the optimum set cover. More importantly, we proceed to show that this approximation factor is essentially tight, by showing that a factor better than

0.99\,n^{1/(p+1)}/(p+1)^2

is unachievable for a

p

-pass semi-streaming algorithm, even allowing randomisation. In particular, this implies that achieving a

\Theta(\log n)

-approximation requires

\Omega(\log n/\log\log n)

passes, which is tight up to the

\log\log n

factor. These results extend to a relaxation of the set cover problem where we are allowed to leave an

\varepsilon

fraction of the universe uncovered: the tight bounds on the best approximation factor achievable in

p

passes turn out to be

\Theta_p(\min\{n^{1/(p+1)}, \varepsilon^{-1/p}\})

. Our lower bounds are based on a construction of a family of high-rank incidence geometries, which may be thought of as vast generalisations of affine planes. This construction, based on algebraic techniques, appears flexible enough to find other applications and is therefore interesting in its own right.Comment: 20 page

arXiv.org e-Print Archive

Crossref

Unifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering

Author: Gleich David
Veldt Nate
Wirth Anthony
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Graph clustering, or community detection, is the task of identifying groups of closely related objects in a large network. In this paper we introduce a new community-detection framework called LambdaCC that is based on a specially weighted version of correlation clustering. A key component in our methodology is a clustering resolution parameter,

\lambda

, which implicitly controls the size and structure of clusters formed by our framework. We show that, by increasing this parameter, our objective effectively interpolates between two different strategies in graph clustering: finding a sparse cut and forming dense subgraphs. Our methodology unifies and generalizes a number of other important clustering quality functions including modularity, sparsest cut, and cluster deletion, and places them all within the context of an optimization problem that has been well studied from the perspective of approximation algorithms. Our approach is particularly relevant in the regime of finding dense clusters, as it leads to a 2-approximation for the cluster deletion problem. We use our approach to cluster several graphs, including large collaboration networks and social networks

arXiv.org e-Print Archive

University of Melbourne Institutional Repository

Precedence-Constrained Min Sum Set Cover

Author: McClintock Jessica
Wirth Anthony
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th International Symposium on Algorithms and Computation (ISAAC 2017)
Publication date: 01/01/2017
Field of study

We introduce a version of the Min Sum Set Cover (MSSC) problem in which there are "AND" precedence constraints on the m sets. In the Precedence-Constrained Min Sum Set Cover (PCMSSC) problem, when interpreted as directed edges, the constraints induce an acyclic directed graph. PCMSSC models the aim of scheduling software tests to prioritize the rate of fault detection subject to dependencies between tests. Our greedy scheme for PCMSSC is similar to the approaches of Feige, Lovasz, and, Tetali for MSSC, and Chekuri and Motwani for precedence-constrained scheduling to minimize weighted completion time. With a factor-4 increase in approximation ratio, we reduce PCMSSC to the problem of finding a maximum-density precedence-closed sub-family of sets, where density is the ratio of sub-family union size to cardinality. We provide a greedy factor-sqrt m algorithm for maximizing density; on forests of in-trees, we show this algorithm finds an optimal solution. Harnessing an alternative greedy argument of Chekuri and Kumar for Maximum Coverage with Group Budget Constraints, on forests of out-trees, we design an algorithm with approximation ratio equal to maximum tree height. Finally, with a reduction from the Planted Dense Subgraph detection problem, we show that its conjectured hardness implies there is no polynomial-time algorithm for PCMSSC with approximation factor in O(m^{1/12-epsilon})

Dagstuhl Research Online Publication Server

On Optimal Arrangements of Binary Sensors

Author: Asadzadeh Birjandi Parvin
Kulik Lars
Tanin Egemen
Wirth Anthony
Publication venue
Publication date
Field of study

A large range of monitoring applications can benefit from binary sensor networks. Binary sensors can detect the presence or absence of a particular target in their sensing regions. They can be used to partition a monitored area and provide localization functionality. If many of these sensors are deployed to monitor an area, the area is partitioned into sub-regions: each sub-region is characterized by the sensors detecting targets within it. We aim to maximize the number of unique, distinguishable sub-regions. Our goal is an optimal placement of both omni-directional and directional static binary sensors. We compute an upper bound on the number of unique sub-regions, which grows quadratically with respect to the number of sensors. In particular, we propose arrangements of sensors within a monitored area whose number of unique sub-regions is asymptotically equivalent to the upper bound

Enlighten

Recommended from our members

On Approximating Target Set Selection

Author: Charikar Moses
Naamad Yonatan
Wirth Anthony
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016)
Publication date: 01/01/2016
Field of study

We study the Target Set Selection (TSS) problem introduced by Kempe, Kleinberg, and Tardos (2003). This problem models the propagation of influence in a network, in a sequence of rounds. A set of nodes is made "active" initially. In each subsequent round, a vertex is activated if at least a certain number of its neighbors are (already) active. In the minimization version, the goal is to activate a small set of vertices initially - a seed, or target, set - so that activation spreads to the entire graph. In the absence of a sublinear-factor algorithm for the general version, we provide a (sublinear) approximation algorithm for the bounded-round version, where the goal is to activate all the vertices in r rounds. Assuming a known conjecture on the hardness of Planted Dense Subgraph, we establish hardness-of-approximation results for the bounded-round version. We show that they translate to general Target Set Selection, leading to a hardness factor of n^(1/2-epsilon) for all epsilon > 0. This is the first polynomial hardness result for Target Set Selection, and the strongest conditional result known for a large class of monotone satisfiability problems. In the maximization version of TSS, the goal is to pick a target set of size k so as to maximize the number of nodes eventually active. We show an n^(1-epsilon) hardness result for the undirected maximization version of the problem, thus establishing that the undirected case is as hard as the directed case. Finally, we demonstrate an SETH lower bound for the exact computation of the optimal seed set

Princeton University Open Access Repository

Dagstuhl Research Online Publication Server

Maximum Coverage in Sublinear Space, Faster

Author: Choudhury Farhana
Jaud Stephen
Wirth Anthony
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Symposium on Experimental Algorithms (SEA 2023)
Publication date: 01/01/2023
Field of study

Given a collection of m sets from a universe ?, the Maximum Set Coverage problem consists of finding k sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor 1-1/e. However, this algorithm does not scale well with the input size. In a streaming context, practical high-quality solutions are found, but with space complexity that scales linearly with respect to the size of the universe n = |?|. However, one randomized streaming algorithm has been shown to produce a 1-1/e-? approximation of the optimal solution with a space complexity that scales only poly-logarithmically with respect to m and n. In order to achieve such a low space complexity, the authors used two techniques in their multi-pass approach: - F?-sketching, allows to determine with great accuracy the number of distinct elements in a set using less space than the set itself. - Subsampling, consists of only solving the problem on a subspace of the universe. It is implemented using ?-independent hash functions. This article focuses on the sublinear-space algorithm and highlights the time cost of these two techniques, especially subsampling. We present optimizations that significantly reduce the time complexity of the algorithm. Firstly, we give some optimizations that do not alter the space complexity, number of passes and approximation quality of the original algorithm. In particular, we reanalyze the error bounds to show that the original independence factor of ?(?^{-2} k log m) can be fine-tuned to ?(k log m); we also show how F?-sketching can be removed. Secondly, we derive a new lower bound for the probability of producing a 1-1/e-? approximation using only pairwise independence: 1- (4/(c k log m)) compared to 1-(2e/(m^{ck/6})) with ?(k log m)-independence. Although the theoretical guarantees are weaker, suggesting the approximation quality would suffer, for large streams, our algorithms perform well in practice. Finally, our experimental results show that even a pairwise-independent hash-function sampler does not produce worse solution than the original algorithm, while running significantly faster by several orders of magnitude

Dagstuhl Research Online Publication Server

Maximum Coverage in Random-Arrival Streams

Author: Choudhury Farhana
Warneke Rowan
Wirth Anthony
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual European Symposium on Algorithms (ESA 2023)
Publication date: 01/01/2023
Field of study

Dagstuhl Research Online Publication Server

Tight Data Access Bounds for Private Top- $k$ Selection

Author: Ohrimenko Olga
Wirth Anthony
Wu Hao
Publication venue
Publication date: 30/05/2023
Field of study

We study the top-

k

selection problem under the differential privacy model:

m

items are rated according to votes of a set of clients. We consider a setting in which algorithms can retrieve data via a sequence of accesses, each either a random access or a sorted access; the goal is to minimize the total number of data accesses. Our algorithm requires only

O(\sqrt{mk})

expected accesses: to our knowledge, this is the first sublinear data-access upper bound for this problem. Our analysis also shows that the well-known exponential mechanism requires only

O(\sqrt{m})

expected accesses. Accompanying this, we develop the first lower bounds for the problem, in three settings: only random accesses; only sorted accesses; a sequence of accesses of either kind. We show that, to avoid

\Omega(m)

access cost, supporting *both* kinds of access is necessary, and that in this case our algorithm's access cost is optimal

arXiv.org e-Print Archive

Result-Sensitive Binary Search with Noisy Information

Author: Epa Narthana S.
Gan Junhao
Wirth Anthony
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th International Symposium on Algorithms and Computation (ISAAC 2019)
Publication date: 01/01/2019
Field of study

We describe new algorithms for the predecessor problem in the Noisy Comparison Model. In this problem, given a sorted list L of n (distinct) elements and a query q, we seek the predecessor of q in L: denoted by u, the largest element less than or equal to q. In the Noisy Comparison Model, the result of a comparison between two elements is non-deterministic. Moreover, multiple comparisons of the same pair of elements might have different results: each is generated independently, and is correct with probability p > 1/2. Given an overall error tolerance Q, the cost of an algorithm is measured by the total number of noisy comparisons; these must guarantee the predecessor is returned with probability at least 1 - Q. Feige et al. showed that predecessor queries can be answered by a modified binary search with Theta(log (n/Q)) noisy comparisons. We design result-sensitive algorithms for answering predecessor queries. The query cost is related to the index, k, of the predecessor u in L. Our first algorithm answers predecessor queries with O(log ((log^{*(c)} n)/Q) + log (k/Q)) noisy comparisons, for an arbitrarily large constant c. The function log^{*(c)} n iterates c times the iterated-logarithm function, log^* n. Our second algorithm is a genuinely result-sensitive algorithm whose expected query cost is bounded by O(log (k/Q)), and is guaranteed to terminate after at most O(log((log n)/Q)) noisy comparisons. Our results strictly improve the state-of-the-art bounds when k is in omega(1) intersected with o(n^epsilon), where epsilon > 0 is some constant. Moreover, we show that our result-sensitive algorithms immediately improve not only predecessor-query algorithms, but also binary-search-like algorithms for solving key applications

Dagstuhl Research Online Publication Server

Recency Queries with Succinct Representation

Author: Holland William L.
Wirth Anthony
Zobel Justin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st International Symposium on Algorithms and Computation (ISAAC 2020)
Publication date: 01/01/2020
Field of study

In the context of the sliding-window set membership problem, and caching policies that require knowledge of item recency, we formalize the problem of Recency on a stream. Informally, the query asks, "when was the last time I saw item x?" Existing structures, such as hash tables, can support a recency query by augmenting item occurrences with timestamps. To support recency queries on a window of W items, this might require ?(W log W) bits. We propose a succinct data structure for Recency. By combining sliding-window dictionaries in a hierarchical structure, and careful design of the underlying hash tables, we achieve a data structure that returns a 1+? approximation to the recency of every item in O(log(? W)) time, in only (1+o(1))(1+?)(?+Wlog(?^(-1))) bits. Here, ? is the information-theoretic lower bound on the number of bits for a set of size W, in a universe of cardinality N

Dagstuhl Research Online Publication Server