24,469 research outputs found
Testing bounded arboricity
In this paper we consider the problem of testing whether a graph has bounded
arboricity. The family of graphs with bounded arboricity includes, among
others, bounded-degree graphs, all minor-closed graph classes (e.g. planar
graphs, graphs with bounded treewidth) and randomly generated preferential
attachment graphs. Graphs with bounded arboricity have been studied extensively
in the past, in particular since for many problems they allow for much more
efficient algorithms and/or better approximation ratios.
We present a tolerant tester in the sparse-graphs model. The sparse-graphs
model allows access to degree queries and neighbor queries, and the distance is
defined with respect to the actual number of edges. More specifically, our
algorithm distinguishes between graphs that are -close to having
arboricity and graphs that -far from having
arboricity , where is an absolute small constant. The query
complexity and running time of the algorithm are
where denotes
the number of vertices and denotes the number of edges. In terms of the
dependence on and this bound is optimal up to poly-logarithmic factors
since queries are necessary (and .
We leave it as an open question whether the dependence on can be
improved from quasi-polynomial to polynomial. Our techniques include an
efficient local simulation for approximating the outcome of a global (almost)
forest-decomposition algorithm as well as a tailored procedure of edge
sampling
Testing formula satisfaction
We study the query complexity of testing for properties defined by read once formulae, as instances of massively parametrized properties, and prove several testability and non-testability results. First we prove the testability of any property accepted by a Boolean read-once formula involving any bounded arity gates, with a number of queries exponential in \epsilon and independent of all other parameters. When the gates are limited to being monotone, we prove that there is an estimation algorithm, that outputs an approximation of the distance of the input from
satisfying the property. For formulae only involving And/Or gates, we provide a more efficient test whose query complexity is only quasi-polynomial in \epsilon. On the other hand we show that such testability results do not hold in general for formulae over non-Boolean alphabets; specifically we construct a property defined by a read-once arity 2 (non-Boolean) formula over alphabets of size 4, such that any 1/4-test for it requires a number of queries depending on the formula size
Metric Clustering and MST with Strong and Weak Distance Oracles
We study optimization problems in a metric space where we
can compute distances in two ways: via a ''strong'' oracle that returns exact
distances , and a ''weak'' oracle that returns distances
which may be arbitrarily corrupted with some probability. This
model captures the increasingly common trade-off between employing both an
expensive similarity model (e.g. a large-scale embedding model), and a less
accurate but cheaper model. Hence, the goal is to make as few queries to the
strong oracle as possible. We consider both so-called ''point queries'', where
the strong oracle is queried on a set of points and
returns for all , and ''edge queries'' where it is queried
for individual distances .
Our main contributions are optimal algorithms and lower bounds for clustering
and Minimum Spanning Tree (MST) in this model. For -centers, -median, and
-means, we give constant factor approximation algorithms with only
strong oracle point queries, and prove that queries
are required for any bounded approximation. For edge queries, our upper and
lower bounds are both . Surprisingly, for the MST problem
we give a approximation algorithm using no strong oracle
queries at all, and a matching lower bound. We
empirically evaluate our algorithms, and show that their quality is comparable
to that of the baseline algorithms that are given all true distances, but while
querying the strong oracle on only a small fraction () of points
On the Complexity of Searching in Trees: Average-case Minimization
We focus on the average-case analysis: A function w : V -> Z+ is given which
defines the likelihood for a node to be the one marked, and we want the
strategy that minimizes the expected number of queries. Prior to this paper,
very little was known about this natural question and the complexity of the
problem had remained so far an open question.
We close this question and prove that the above tree search problem is
NP-complete even for the class of trees with diameter at most 4. This results
in a complete characterization of the complexity of the problem with respect to
the diameter size. In fact, for diameter not larger than 3 the problem can be
shown to be polynomially solvable using a dynamic programming approach.
In addition we prove that the problem is NP-complete even for the class of
trees of maximum degree at most 16. To the best of our knowledge, the only
known result in this direction is that the tree search problem is solvable in
O(|V| log|V|) time for trees with degree at most 2 (paths).
We match the above complexity results with a tight algorithmic analysis. We
first show that a natural greedy algorithm attains a 2-approximation.
Furthermore, for the bounded degree instances, we show that any optimal
strategy (i.e., one that minimizes the expected number of queries) performs at
most O(\Delta(T) (log |V| + log w(T))) queries in the worst case, where w(T) is
the sum of the likelihoods of the nodes of T and \Delta(T) is the maximum
degree of T. We combine this result with a non-trivial exponential time
algorithm to provide an FPTAS for trees with bounded degree
Querying big data with bounded data access
Query answering over big data is cost-prohibitive. A linear scan of a dataset D may
take days with a solid state device if D is of PB size and years if D is of EB size. In
other words, polynomial-time (PTIME) algorithms for query evaluation are already
not feasible on big data. To tackle this, we propose querying big data with bounded
data access, such that the cost of query evaluation is independent of the scale of D.
First of all, we propose a class of boundedly evaluable queries. A query Q is boundedly
evaluable under a set A of access constraints if for any dataset D that satisfies
constraints in A, there exists a subset DQ ⊆ D such that (a) Q(DQ) = Q(D), and (b) the
time for identifying DQ from D, and hence the size |DQ| of DQ, are independent of |D|.
That is, we can compute Q(D) by accessing a bounded amount of data no matter how
big D grows.We study the problem of deciding whether a query is boundedly evaluable
under A. It is known that the problem is undecidable for FO without access constraints.
We show that, in the presence of access constraints, it is decidable in 2EXPSPACE for
positive fragments of FO queries, but is already EXPSPACE-hard even for CQ.
To handle the undecidability and high complexity of the analysis, we develop effective
syntax for boundedly evaluable queries under A, referred to as queries covered
by A, such that, (a) any boundedly evaluable query under A is equivalent to a query
covered by A, (b) each covered query is boundedly evaluable, and (c) it is efficient to
decide whether Q is covered by A. On top of DBMS, we develop practical algorithms
for checking whether queries are covered by A, and generating bounded plans if so.
For queries that are not boundedly evaluable, we extend bounded evaluability
to resource-bounded approximation and bounded query rewriting using views.
(1) Resource-bounded approximation is parameterized with a resource ratio a ∈ (0,1],
such that for any query Q and dataset D, it computes approximate answers with an
accuracy bound h by accessing at most a|D| tuples. It is based on extended access constraints
and a new accuracy measure. (2) Bounded query rewriting tackles the problem
by incorporating bounded evaluability with views, such that the queries can be exactly
answered by accessing cached views and a bounded amount of data in D. We study the
problem of deciding whether a query has a bounded rewriting, establish its complexity
bounds, and develop effective syntax for FO queries with a bounded rewriting.
Finally, we extend bounded evaluability to graph pattern queries, by extending
access constraints to graph data. We characterize bounded evaluability for subgraph
and simulation patterns and develop practical algorithms for associated problems
Lower Bounds on Query Complexity for Testing Bounded-Degree CSPs
In this paper, we consider lower bounds on the query complexity for testing
CSPs in the bounded-degree model.
First, for any ``symmetric'' predicate except \equ
where , we show that every (randomized) algorithm that distinguishes
satisfiable instances of CSP(P) from instances -far
from satisfiability requires queries where is the
number of variables and is a constant that depends on and
. This breaks a natural lower bound , which is
obtained by the birthday paradox. We also show that every one-sided error
tester requires queries for such . These results are hereditary
in the sense that the same results hold for any predicate such that
. For EQU, we give a one-sided error tester
whose query complexity is . Also, for 2-XOR (or,
equivalently E2LIN2), we show an lower bound for
distinguishing instances between -close to and -far
from satisfiability.
Next, for the general k-CSP over the binary domain, we show that every
algorithm that distinguishes satisfiable instances from instances
-far from satisfiability requires queries. The
matching NP-hardness is not known, even assuming the Unique Games Conjecture or
the -to- Conjecture. As a corollary, for Maximum Independent Set on
graphs with vertices and a degree bound , we show that every
approximation algorithm within a factor d/\poly\log d and an additive error
of requires queries. Previously, only super-constant
lower bounds were known
Efficient discrete-time simulations of continuous-time quantum query algorithms
The continuous-time query model is a variant of the discrete query model in
which queries can be interleaved with known operations (called "driving
operations") continuously in time. Interesting algorithms have been discovered
in this model, such as an algorithm for evaluating nand trees more efficiently
than any classical algorithm. Subsequent work has shown that there also exists
an efficient algorithm for nand trees in the discrete query model; however,
there is no efficient conversion known for continuous-time query algorithms for
arbitrary problems.
We show that any quantum algorithm in the continuous-time query model whose
total query time is T can be simulated by a quantum algorithm in the discrete
query model that makes O[T log(T) / log(log(T))] queries. This is the first
upper bound that is independent of the driving operations (i.e., it holds even
if the norm of the driving Hamiltonian is very large). A corollary is that any
lower bound of T queries for a problem in the discrete-time query model
immediately carries over to a lower bound of \Omega[T log(log(T))/log (T)] in
the continuous-time query model.Comment: 12 pages, 6 fig
The Power of an Example: Hidden Set Size Approximation Using Group Queries and Conditional Sampling
We study a basic problem of approximating the size of an unknown set in a
known universe . We consider two versions of the problem. In both versions
the algorithm can specify subsets . In the first version, which
we refer to as the group query or subset query version, the algorithm is told
whether is non-empty. In the second version, which we refer to as the
subset sampling version, if is non-empty, then the algorithm receives
a uniformly selected element from . We study the difference between
these two versions under different conditions on the subsets that the algorithm
may query/sample, and in both the case that the algorithm is adaptive and the
case where it is non-adaptive. In particular we focus on a natural family of
allowed subsets, which correspond to intervals, as well as variants of this
family
- …