24,469 research outputs found

    Testing bounded arboricity

    Full text link
    In this paper we consider the problem of testing whether a graph has bounded arboricity. The family of graphs with bounded arboricity includes, among others, bounded-degree graphs, all minor-closed graph classes (e.g. planar graphs, graphs with bounded treewidth) and randomly generated preferential attachment graphs. Graphs with bounded arboricity have been studied extensively in the past, in particular since for many problems they allow for much more efficient algorithms and/or better approximation ratios. We present a tolerant tester in the sparse-graphs model. The sparse-graphs model allows access to degree queries and neighbor queries, and the distance is defined with respect to the actual number of edges. More specifically, our algorithm distinguishes between graphs that are ϵ\epsilon-close to having arboricity α\alpha and graphs that cϵc \cdot \epsilon-far from having arboricity 3α3\alpha, where cc is an absolute small constant. The query complexity and running time of the algorithm are O~(nmlog(1/ϵ)ϵ+nαm(1ϵ)O(log(1/ϵ)))\tilde{O}\left(\frac{n}{\sqrt{m}}\cdot \frac{\log(1/\epsilon)}{\epsilon} + \frac{n\cdot \alpha}{m} \cdot \left(\frac{1}{\epsilon}\right)^{O(\log(1/\epsilon))}\right) where nn denotes the number of vertices and mm denotes the number of edges. In terms of the dependence on nn and mm this bound is optimal up to poly-logarithmic factors since Ω(n/m)\Omega(n/\sqrt{m}) queries are necessary (and α=O(m))\alpha = O(\sqrt{m})). We leave it as an open question whether the dependence on 1/ϵ1/\epsilon can be improved from quasi-polynomial to polynomial. Our techniques include an efficient local simulation for approximating the outcome of a global (almost) forest-decomposition algorithm as well as a tailored procedure of edge sampling

    Testing formula satisfaction

    Get PDF
    We study the query complexity of testing for properties defined by read once formulae, as instances of massively parametrized properties, and prove several testability and non-testability results. First we prove the testability of any property accepted by a Boolean read-once formula involving any bounded arity gates, with a number of queries exponential in \epsilon and independent of all other parameters. When the gates are limited to being monotone, we prove that there is an estimation algorithm, that outputs an approximation of the distance of the input from satisfying the property. For formulae only involving And/Or gates, we provide a more efficient test whose query complexity is only quasi-polynomial in \epsilon. On the other hand we show that such testability results do not hold in general for formulae over non-Boolean alphabets; specifically we construct a property defined by a read-once arity 2 (non-Boolean) formula over alphabets of size 4, such that any 1/4-test for it requires a number of queries depending on the formula size

    Metric Clustering and MST with Strong and Weak Distance Oracles

    Full text link
    We study optimization problems in a metric space (X,d)(\mathcal{X},d) where we can compute distances in two ways: via a ''strong'' oracle that returns exact distances d(x,y)d(x,y), and a ''weak'' oracle that returns distances d~(x,y)\tilde{d}(x,y) which may be arbitrarily corrupted with some probability. This model captures the increasingly common trade-off between employing both an expensive similarity model (e.g. a large-scale embedding model), and a less accurate but cheaper model. Hence, the goal is to make as few queries to the strong oracle as possible. We consider both so-called ''point queries'', where the strong oracle is queried on a set of points SXS \subset \mathcal{X} and returns d(x,y)d(x,y) for all x,ySx,y \in S, and ''edge queries'' where it is queried for individual distances d(x,y)d(x,y). Our main contributions are optimal algorithms and lower bounds for clustering and Minimum Spanning Tree (MST) in this model. For kk-centers, kk-median, and kk-means, we give constant factor approximation algorithms with only O~(k)\tilde{O}(k) strong oracle point queries, and prove that Ω(k)\Omega(k) queries are required for any bounded approximation. For edge queries, our upper and lower bounds are both Θ~(k2)\tilde{\Theta}(k^2). Surprisingly, for the MST problem we give a O(logn)O(\sqrt{\log n}) approximation algorithm using no strong oracle queries at all, and a matching Ω(logn)\Omega(\sqrt{\log n}) lower bound. We empirically evaluate our algorithms, and show that their quality is comparable to that of the baseline algorithms that are given all true distances, but while querying the strong oracle on only a small fraction (<1%<1\%) of points

    On the Complexity of Searching in Trees: Average-case Minimization

    Full text link
    We focus on the average-case analysis: A function w : V -> Z+ is given which defines the likelihood for a node to be the one marked, and we want the strategy that minimizes the expected number of queries. Prior to this paper, very little was known about this natural question and the complexity of the problem had remained so far an open question. We close this question and prove that the above tree search problem is NP-complete even for the class of trees with diameter at most 4. This results in a complete characterization of the complexity of the problem with respect to the diameter size. In fact, for diameter not larger than 3 the problem can be shown to be polynomially solvable using a dynamic programming approach. In addition we prove that the problem is NP-complete even for the class of trees of maximum degree at most 16. To the best of our knowledge, the only known result in this direction is that the tree search problem is solvable in O(|V| log|V|) time for trees with degree at most 2 (paths). We match the above complexity results with a tight algorithmic analysis. We first show that a natural greedy algorithm attains a 2-approximation. Furthermore, for the bounded degree instances, we show that any optimal strategy (i.e., one that minimizes the expected number of queries) performs at most O(\Delta(T) (log |V| + log w(T))) queries in the worst case, where w(T) is the sum of the likelihoods of the nodes of T and \Delta(T) is the maximum degree of T. We combine this result with a non-trivial exponential time algorithm to provide an FPTAS for trees with bounded degree

    Querying big data with bounded data access

    Get PDF
    Query answering over big data is cost-prohibitive. A linear scan of a dataset D may take days with a solid state device if D is of PB size and years if D is of EB size. In other words, polynomial-time (PTIME) algorithms for query evaluation are already not feasible on big data. To tackle this, we propose querying big data with bounded data access, such that the cost of query evaluation is independent of the scale of D. First of all, we propose a class of boundedly evaluable queries. A query Q is boundedly evaluable under a set A of access constraints if for any dataset D that satisfies constraints in A, there exists a subset DQ ⊆ D such that (a) Q(DQ) = Q(D), and (b) the time for identifying DQ from D, and hence the size |DQ| of DQ, are independent of |D|. That is, we can compute Q(D) by accessing a bounded amount of data no matter how big D grows.We study the problem of deciding whether a query is boundedly evaluable under A. It is known that the problem is undecidable for FO without access constraints. We show that, in the presence of access constraints, it is decidable in 2EXPSPACE for positive fragments of FO queries, but is already EXPSPACE-hard even for CQ. To handle the undecidability and high complexity of the analysis, we develop effective syntax for boundedly evaluable queries under A, referred to as queries covered by A, such that, (a) any boundedly evaluable query under A is equivalent to a query covered by A, (b) each covered query is boundedly evaluable, and (c) it is efficient to decide whether Q is covered by A. On top of DBMS, we develop practical algorithms for checking whether queries are covered by A, and generating bounded plans if so. For queries that are not boundedly evaluable, we extend bounded evaluability to resource-bounded approximation and bounded query rewriting using views. (1) Resource-bounded approximation is parameterized with a resource ratio a ∈ (0,1], such that for any query Q and dataset D, it computes approximate answers with an accuracy bound h by accessing at most a|D| tuples. It is based on extended access constraints and a new accuracy measure. (2) Bounded query rewriting tackles the problem by incorporating bounded evaluability with views, such that the queries can be exactly answered by accessing cached views and a bounded amount of data in D. We study the problem of deciding whether a query has a bounded rewriting, establish its complexity bounds, and develop effective syntax for FO queries with a bounded rewriting. Finally, we extend bounded evaluability to graph pattern queries, by extending access constraints to graph data. We characterize bounded evaluability for subgraph and simulation patterns and develop practical algorithms for associated problems

    Lower Bounds on Query Complexity for Testing Bounded-Degree CSPs

    Full text link
    In this paper, we consider lower bounds on the query complexity for testing CSPs in the bounded-degree model. First, for any ``symmetric'' predicate P:0,1k0,1P:{0,1}^{k} \to {0,1} except \equ where k3k\geq 3, we show that every (randomized) algorithm that distinguishes satisfiable instances of CSP(P) from instances (P1(0)/2kϵ)(|P^{-1}(0)|/2^k-\epsilon)-far from satisfiability requires Ω(n1/2+δ)\Omega(n^{1/2+\delta}) queries where nn is the number of variables and δ>0\delta>0 is a constant that depends on PP and ϵ\epsilon. This breaks a natural lower bound Ω(n1/2)\Omega(n^{1/2}), which is obtained by the birthday paradox. We also show that every one-sided error tester requires Ω(n)\Omega(n) queries for such PP. These results are hereditary in the sense that the same results hold for any predicate QQ such that P1(1)Q1(1)P^{-1}(1) \subseteq Q^{-1}(1). For EQU, we give a one-sided error tester whose query complexity is O~(n1/2)\tilde{O}(n^{1/2}). Also, for 2-XOR (or, equivalently E2LIN2), we show an Ω(n1/2+δ)\Omega(n^{1/2+\delta}) lower bound for distinguishing instances between ϵ\epsilon-close to and (1/2ϵ)(1/2-\epsilon)-far from satisfiability. Next, for the general k-CSP over the binary domain, we show that every algorithm that distinguishes satisfiable instances from instances (12k/2kϵ)(1-2k/2^k-\epsilon)-far from satisfiability requires Ω(n)\Omega(n) queries. The matching NP-hardness is not known, even assuming the Unique Games Conjecture or the dd-to-11 Conjecture. As a corollary, for Maximum Independent Set on graphs with nn vertices and a degree bound dd, we show that every approximation algorithm within a factor d/\poly\log d and an additive error of ϵn\epsilon n requires Ω(n)\Omega(n) queries. Previously, only super-constant lower bounds were known

    Efficient discrete-time simulations of continuous-time quantum query algorithms

    Full text link
    The continuous-time query model is a variant of the discrete query model in which queries can be interleaved with known operations (called "driving operations") continuously in time. Interesting algorithms have been discovered in this model, such as an algorithm for evaluating nand trees more efficiently than any classical algorithm. Subsequent work has shown that there also exists an efficient algorithm for nand trees in the discrete query model; however, there is no efficient conversion known for continuous-time query algorithms for arbitrary problems. We show that any quantum algorithm in the continuous-time query model whose total query time is T can be simulated by a quantum algorithm in the discrete query model that makes O[T log(T) / log(log(T))] queries. This is the first upper bound that is independent of the driving operations (i.e., it holds even if the norm of the driving Hamiltonian is very large). A corollary is that any lower bound of T queries for a problem in the discrete-time query model immediately carries over to a lower bound of \Omega[T log(log(T))/log (T)] in the continuous-time query model.Comment: 12 pages, 6 fig

    The Power of an Example: Hidden Set Size Approximation Using Group Queries and Conditional Sampling

    Full text link
    We study a basic problem of approximating the size of an unknown set SS in a known universe UU. We consider two versions of the problem. In both versions the algorithm can specify subsets TUT\subseteq U. In the first version, which we refer to as the group query or subset query version, the algorithm is told whether TST\cap S is non-empty. In the second version, which we refer to as the subset sampling version, if TST\cap S is non-empty, then the algorithm receives a uniformly selected element from TST\cap S. We study the difference between these two versions under different conditions on the subsets that the algorithm may query/sample, and in both the case that the algorithm is adaptive and the case where it is non-adaptive. In particular we focus on a natural family of allowed subsets, which correspond to intervals, as well as variants of this family
    corecore