Search CORE

24,469 research outputs found

Testing bounded arboricity

Author: Eden Talya
Levi Reut
Ron Dana
Publication venue
Publication date: 16/07/2017
Field of study

In this paper we consider the problem of testing whether a graph has bounded arboricity. The family of graphs with bounded arboricity includes, among others, bounded-degree graphs, all minor-closed graph classes (e.g. planar graphs, graphs with bounded treewidth) and randomly generated preferential attachment graphs. Graphs with bounded arboricity have been studied extensively in the past, in particular since for many problems they allow for much more efficient algorithms and/or better approximation ratios. We present a tolerant tester in the sparse-graphs model. The sparse-graphs model allows access to degree queries and neighbor queries, and the distance is defined with respect to the actual number of edges. More specifically, our algorithm distinguishes between graphs that are

\epsilon

-close to having arboricity

\alpha

and graphs that

c \cdot \epsilon

-far from having arboricity

3\alpha

, where

c

is an absolute small constant. The query complexity and running time of the algorithm are

\tilde{O}\left(\frac{n}{\sqrt{m}}\cdot \frac{\log(1/\epsilon)}{\epsilon} + \frac{n\cdot \alpha}{m} \cdot \left(\frac{1}{\epsilon}\right)^{O(\log(1/\epsilon))}\right)

where

n

denotes the number of vertices and

m

denotes the number of edges. In terms of the dependence on

n

and

m

this bound is optimal up to poly-logarithmic factors since

\Omega(n/\sqrt{m})

queries are necessary (and

\alpha = O(\sqrt{m}))

. We leave it as an open question whether the dependence on

1/\epsilon

can be improved from quasi-polynomial to polynomial. Our techniques include an efficient local simulation for approximating the outcome of a global (almost) forest-decomposition algorithm as well as a tailored procedure of edge sampling

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Testing formula satisfaction

Author: D. Ron
E. Ben-Sasson
E. Ben-Sasson
E. Fischer
E. Fischer
I. Newman
I. Newman
M. Blum
N. Alon
O. Goldreich
O. Goldreich
R. Rubinfeld
S. Chakraborty
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We study the query complexity of testing for properties defined by read once formulae, as instances of massively parametrized properties, and prove several testability and non-testability results. First we prove the testability of any property accepted by a Boolean read-once formula involving any bounded arity gates, with a number of queries exponential in \epsilon and independent of all other parameters. When the gates are limited to being monotone, we prove that there is an estimation algorithm, that outputs an approximation of the distance of the input from satisfying the property. For formulae only involving And/Or gates, we provide a more efficient test whose query complexity is only quasi-polynomial in \epsilon. On the other hand we show that such testability results do not hold in general for formulae over non-Boolean alphabets; specifically we construct a property defined by a read-once arity 2 (non-Boolean) formula over alphabets of size 4, such that any 1/4-test for it requires a number of queries depending on the formula size

Crossref

Birkbeck Institutional Research Online

Metric Clustering and MST with Strong and Weak Distance Oracles

Author: Bateni MohammadHossein
Dharangutte Prathamesh
Jayaram Rajesh
Wang Chen
Publication venue
Publication date: 24/10/2023
Field of study

We study optimization problems in a metric space

(\mathcal{X},d)

where we can compute distances in two ways: via a ''strong'' oracle that returns exact distances

d(x,y)

, and a ''weak'' oracle that returns distances

\tilde{d}(x,y)

which may be arbitrarily corrupted with some probability. This model captures the increasingly common trade-off between employing both an expensive similarity model (e.g. a large-scale embedding model), and a less accurate but cheaper model. Hence, the goal is to make as few queries to the strong oracle as possible. We consider both so-called ''point queries'', where the strong oracle is queried on a set of points

S \subset \mathcal{X}

and returns

d(x,y)

for all

x,y \in S

, and ''edge queries'' where it is queried for individual distances

d(x,y)

. Our main contributions are optimal algorithms and lower bounds for clustering and Minimum Spanning Tree (MST) in this model. For

k

-centers,

k

-median, and

k

-means, we give constant factor approximation algorithms with only

\tilde{O}(k)

strong oracle point queries, and prove that

\Omega(k)

queries are required for any bounded approximation. For edge queries, our upper and lower bounds are both

\tilde{\Theta}(k^2)

. Surprisingly, for the MST problem we give a

O(\sqrt{\log n})

approximation algorithm using no strong oracle queries at all, and a matching

\Omega(\sqrt{\log n})

lower bound. We empirically evaluate our algorithms, and show that their quality is comparable to that of the baseline algorithms that are given all true distances, but while querying the strong oracle on only a small fraction (

<1\%

) of points

arXiv.org e-Print Archive

On the Complexity of Searching in Trees: Average-case Minimization

Author: A. Garsia
A. Schäffer
D. Dereniowski
D. Knuth
E. Arkin
E. Laber
E. Laber
L. Hyafil
M. Adler
M. Garey
M. Garey
M. Lipman
O. Ibarra
P. Torre de la
R. Carmo
R. Kosaraju
Y. Ben-Asher
Publication venue
Publication date: 01/01/2009
Field of study

We focus on the average-case analysis: A function w : V -> Z+ is given which defines the likelihood for a node to be the one marked, and we want the strategy that minimizes the expected number of queries. Prior to this paper, very little was known about this natural question and the complexity of the problem had remained so far an open question. We close this question and prove that the above tree search problem is NP-complete even for the class of trees with diameter at most 4. This results in a complete characterization of the complexity of the problem with respect to the diameter size. In fact, for diameter not larger than 3 the problem can be shown to be polynomially solvable using a dynamic programming approach. In addition we prove that the problem is NP-complete even for the class of trees of maximum degree at most 16. To the best of our knowledge, the only known result in this direction is that the tree search problem is solvable in O(|V| log|V|) time for trees with degree at most 2 (paths). We match the above complexity results with a tight algorithmic analysis. We first show that a natural greedy algorithm attains a 2-approximation. Furthermore, for the bounded degree instances, we show that any optimal strategy (i.e., one that minimizes the expected number of queries) performs at most O(\Delta(T) (log |V| + log w(T))) queries in the worst case, where w(T) is the sum of the likelihoods of the nodes of T and \Delta(T) is the maximum degree of T. We combine this result with a non-trivial exponential time algorithm to provide an FPTAS for trees with bounded degree

arXiv.org e-Print Archive

CiteSeerX

Crossref

Catalogo dei prodotti della ricerca

Publications at Bielefeld University

Archivio della Ricerca - Università di Salerno

Querying big data with bounded data access

Author: Cao Yang
Publication venue: The University of Edinburgh
Publication date: 29/11/2016
Field of study

Query answering over big data is cost-prohibitive. A linear scan of a dataset D may take days with a solid state device if D is of PB size and years if D is of EB size. In other words, polynomial-time (PTIME) algorithms for query evaluation are already not feasible on big data. To tackle this, we propose querying big data with bounded data access, such that the cost of query evaluation is independent of the scale of D. First of all, we propose a class of boundedly evaluable queries. A query Q is boundedly evaluable under a set A of access constraints if for any dataset D that satisfies constraints in A, there exists a subset DQ ⊆ D such that (a) Q(DQ) = Q(D), and (b) the time for identifying DQ from D, and hence the size |DQ| of DQ, are independent of |D|. That is, we can compute Q(D) by accessing a bounded amount of data no matter how big D grows.We study the problem of deciding whether a query is boundedly evaluable under A. It is known that the problem is undecidable for FO without access constraints. We show that, in the presence of access constraints, it is decidable in 2EXPSPACE for positive fragments of FO queries, but is already EXPSPACE-hard even for CQ. To handle the undecidability and high complexity of the analysis, we develop effective syntax for boundedly evaluable queries under A, referred to as queries covered by A, such that, (a) any boundedly evaluable query under A is equivalent to a query covered by A, (b) each covered query is boundedly evaluable, and (c) it is efficient to decide whether Q is covered by A. On top of DBMS, we develop practical algorithms for checking whether queries are covered by A, and generating bounded plans if so. For queries that are not boundedly evaluable, we extend bounded evaluability to resource-bounded approximation and bounded query rewriting using views. (1) Resource-bounded approximation is parameterized with a resource ratio a ∈ (0,1], such that for any query Q and dataset D, it computes approximate answers with an accuracy bound h by accessing at most a|D| tuples. It is based on extended access constraints and a new accuracy measure. (2) Bounded query rewriting tackles the problem by incorporating bounded evaluability with views, such that the queries can be exactly answered by accessing cached views and a bounded amount of data in D. We study the problem of deciding whether a query has a bounded rewriting, establish its complexity bounds, and develop effective syntax for FO queries with a bounded rewriting. Finally, we extend bounded evaluability to graph pattern queries, by extending access constraints to graph data. We characterize bounded evaluability for subgraph and simulation patterns and develop practical algorithms for associated problems

Edinburgh Research Archive

Lower Bounds on Query Complexity for Testing Bounded-Degree CSPs

Author: Yoshida Yuichi
Publication venue
Publication date: 19/07/2010
Field of study

In this paper, we consider lower bounds on the query complexity for testing CSPs in the bounded-degree model. First, for any ``symmetric'' predicate

P:{0,1}^{k} \to {0,1}

except \equ where

k\geq 3

, we show that every (randomized) algorithm that distinguishes satisfiable instances of CSP(P) from instances

(|P^{-1}(0)|/2^k-\epsilon)

-far from satisfiability requires

\Omega(n^{1/2+\delta})

queries where

n

is the number of variables and

\delta>0

is a constant that depends on

P

and

\epsilon

. This breaks a natural lower bound

\Omega(n^{1/2})

, which is obtained by the birthday paradox. We also show that every one-sided error tester requires

\Omega(n)

queries for such

P

. These results are hereditary in the sense that the same results hold for any predicate

Q

such that

P^{-1}(1) \subseteq Q^{-1}(1)

. For EQU, we give a one-sided error tester whose query complexity is

\tilde{O}(n^{1/2})

. Also, for 2-XOR (or, equivalently E2LIN2), we show an

\Omega(n^{1/2+\delta})

lower bound for distinguishing instances between

\epsilon

-close to and

(1/2-\epsilon)

-far from satisfiability. Next, for the general k-CSP over the binary domain, we show that every algorithm that distinguishes satisfiable instances from instances

(1-2k/2^k-\epsilon)

-far from satisfiability requires

\Omega(n)

queries. The matching NP-hardness is not known, even assuming the Unique Games Conjecture or the

d

-to-

1

Conjecture. As a corollary, for Maximum Independent Set on graphs with

n

vertices and a degree bound

d

, we show that every approximation algorithm within a factor d/\poly\log d and an additive error of

\epsilon n

requires

\Omega(n)

queries. Previously, only super-constant lower bounds were known

arXiv.org e-Print Archive

CiteSeerX

Efficient discrete-time simulations of continuous-time quantum query algorithms

Author: Cleve R.
Gottesman D.
Mosca M.
Somma R. D.
Yonge-Mallo D. L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/11/2008
Field of study

The continuous-time query model is a variant of the discrete query model in which queries can be interleaved with known operations (called "driving operations") continuously in time. Interesting algorithms have been discovered in this model, such as an algorithm for evaluating nand trees more efficiently than any classical algorithm. Subsequent work has shown that there also exists an efficient algorithm for nand trees in the discrete query model; however, there is no efficient conversion known for continuous-time query algorithms for arbitrary problems. We show that any quantum algorithm in the continuous-time query model whose total query time is T can be simulated by a quantum algorithm in the discrete query model that makes O[T log(T) / log(log(T))] queries. This is the first upper bound that is independent of the driving operations (i.e., it holds even if the norm of the driving Hamiltonian is very large). A corollary is that any lower bound of T queries for a problem in the discrete-time query model immediately carries over to a lower bound of \Omega[T log(log(T))/log (T)] in the continuous-time query model.Comment: 12 pages, 6 fig

arXiv.org e-Print Archive

Crossref

The Power of an Example: Hidden Set Size Approximation Using Group Queries and Conditional Sampling

Author: Ron Dana
Tsur Gilad
Publication venue
Publication date: 20/04/2014
Field of study

We study a basic problem of approximating the size of an unknown set

S

in a known universe

U

. We consider two versions of the problem. In both versions the algorithm can specify subsets

T\subseteq U

. In the first version, which we refer to as the group query or subset query version, the algorithm is told whether

T\cap S

is non-empty. In the second version, which we refer to as the subset sampling version, if

T\cap S

is non-empty, then the algorithm receives a uniformly selected element from

T\cap S

. We study the difference between these two versions under different conditions on the subsets that the algorithm may query/sample, and in both the case that the algorithm is adaptive and the case where it is non-adaptive. In particular we focus on a natural family of allowed subsets, which correspond to intervals, as well as variants of this family

arXiv.org e-Print Archive

CiteSeerX