2,019 research outputs found

    Deterministic and Probabilistic Binary Search in Graphs

    Full text link
    We consider the following natural generalization of Binary Search: in a given undirected, positively weighted graph, one vertex is a target. The algorithm's task is to identify the target by adaptively querying vertices. In response to querying a node qq, the algorithm learns either that qq is the target, or is given an edge out of qq that lies on a shortest path from qq to the target. We study this problem in a general noisy model in which each query independently receives a correct answer with probability p>12p > \frac{1}{2} (a known constant), and an (adversarial) incorrect one with probability 1p1-p. Our main positive result is that when p=1p = 1 (i.e., all answers are correct), log2n\log_2 n queries are always sufficient. For general pp, we give an (almost information-theoretically optimal) algorithm that uses, in expectation, no more than (1δ)log2n1H(p)+o(logn)+O(log2(1/δ))(1 - \delta)\frac{\log_2 n}{1 - H(p)} + o(\log n) + O(\log^2 (1/\delta)) queries, and identifies the target correctly with probability at leas 1δ1-\delta. Here, H(p)=(plogp+(1p)log(1p))H(p) = -(p \log p + (1-p) \log(1-p)) denotes the entropy. The first bound is achieved by the algorithm that iteratively queries a 1-median of the nodes not ruled out yet; the second bound by careful repeated invocations of a multiplicative weights algorithm. Even for p=1p = 1, we show several hardness results for the problem of determining whether a target can be found using KK queries. Our upper bound of log2n\log_2 n implies a quasipolynomial-time algorithm for undirected connected graphs; we show that this is best-possible under the Strong Exponential Time Hypothesis (SETH). Furthermore, for directed graphs, or for undirected graphs with non-uniform node querying costs, the problem is PSPACE-complete. For a semi-adaptive version, in which one may query rr nodes each in kk rounds, we show membership in Σ2k1\Sigma_{2k-1} in the polynomial hierarchy, and hardness for Σ2k5\Sigma_{2k-5}

    S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification

    Full text link
    This paper investigates the problem of active learning for binary label prediction on a graph. We introduce a simple and label-efficient algorithm called S2 for this task. At each step, S2 selects the vertex to be labeled based on the structure of the graph and all previously gathered labels. Specifically, S2 queries for the label of the vertex that bisects the *shortest shortest* path between any pair of oppositely labeled vertices. We present a theoretical estimate of the number of queries S2 needs in terms of a novel parametrization of the complexity of binary functions on graphs. We also present experimental results demonstrating the performance of S2 on both real and synthetic data. While other graph-based active learning algorithms have shown promise in practice, our algorithm is the first with both good performance and theoretical guarantees. Finally, we demonstrate the implications of the S2 algorithm to the theory of nonparametric active learning. In particular, we show that S2 achieves near minimax optimal excess risk for an important class of nonparametric classification problems.Comment: A version of this paper appears in the Conference on Learning Theory (COLT) 201

    Simpler, faster and shorter labels for distances in graphs

    Full text link
    We consider how to assign labels to any undirected graph with n nodes such that, given the labels of two nodes and no other information regarding the graph, it is possible to determine the distance between the two nodes. The challenge in such a distance labeling scheme is primarily to minimize the maximum label lenght and secondarily to minimize the time needed to answer distance queries (decoding). Previous schemes have offered different trade-offs between label lengths and query time. This paper presents a simple algorithm with shorter labels and shorter query time than any previous solution, thereby improving the state-of-the-art with respect to both label length and query time in one single algorithm. Our solution addresses several open problems concerning label length and decoding time and is the first improvement of label length for more than three decades. More specifically, we present a distance labeling scheme with label size (log 3)/2 + o(n) (logarithms are in base 2) and O(1) decoding time. This outperforms all existing results with respect to both size and decoding time, including Winkler's (Combinatorica 1983) decade-old result, which uses labels of size (log 3)n and O(n/log n) decoding time, and Gavoille et al. (SODA'01), which uses labels of size 11n + o(n) and O(loglog n) decoding time. In addition, our algorithm is simpler than the previous ones. In the case of integral edge weights of size at most W, we present almost matching upper and lower bounds for label sizes. For r-additive approximation schemes, where distances can be off by an additive constant r, we give both upper and lower bounds. In particular, we present an upper bound for 1-additive approximation schemes which, in the unweighted case, has the same size (ignoring second order terms) as an adjacency scheme: n/2. We also give results for bipartite graphs and for exact and 1-additive distance oracles

    A Linear-Size Logarithmic Stretch Path-Reporting Distance Oracle for General Graphs

    Full text link
    In 2001 Thorup and Zwick devised a distance oracle, which given an nn-vertex undirected graph and a parameter kk, has size O(kn1+1/k)O(k n^{1+1/k}). Upon a query (u,v)(u,v) their oracle constructs a (2k1)(2k-1)-approximate path Π\Pi between uu and vv. The query time of the Thorup-Zwick's oracle is O(k)O(k), and it was subsequently improved to O(1)O(1) by Chechik. A major drawback of the oracle of Thorup and Zwick is that its space is Ω(nlogn)\Omega(n \cdot \log n). Mendel and Naor devised an oracle with space O(n1+1/k)O(n^{1+1/k}) and stretch O(k)O(k), but their oracle can only report distance estimates and not actual paths. In this paper we devise a path-reporting distance oracle with size O(n1+1/k)O(n^{1+1/k}), stretch O(k)O(k) and query time O(nϵ)O(n^\epsilon), for an arbitrarily small ϵ>0\epsilon > 0. In particular, our oracle can provide logarithmic stretch using linear size. Another variant of our oracle has size O(nloglogn)O(n \log\log n), polylogarithmic stretch, and query time O(loglogn)O(\log\log n). For unweighted graphs we devise a distance oracle with multiplicative stretch O(1)O(1), additive stretch O(β(k))O(\beta(k)), for a function β()\beta(\cdot), space O(n1+1/kβ)O(n^{1+1/k} \cdot \beta), and query time O(nϵ)O(n^\epsilon), for an arbitrarily small constant ϵ>0\epsilon >0. The tradeoff between multiplicative stretch and size in these oracles is far below girth conjecture threshold (which is stretch 2k12k-1 and size O(n1+1/k)O(n^{1+1/k})). Breaking the girth conjecture tradeoff is achieved by exhibiting a tradeoff of different nature between additive stretch β(k)\beta(k) and size O(n1+1/k)O(n^{1+1/k}). A similar type of tradeoff was exhibited by a construction of (1+ϵ,β)(1+\epsilon,\beta)-spanners due to Elkin and Peleg. However, so far (1+ϵ,β)(1+\epsilon,\beta)-spanners had no counterpart in the distance oracles' world. An important novel tool that we develop on the way to these results is a {distance-preserving path-reporting oracle}

    A Framework for Searching in Graphs in the Presence of Errors

    Get PDF
    We consider a problem of searching for an unknown target vertex t in a (possibly edge-weighted) graph. Each vertex-query points to a vertex v and the response either admits that v is the target or provides any neighbor s of v that lies on a shortest path from v to t. This model has been introduced for trees by Onak and Parys [FOCS 2006] and for general graphs by Emamjomeh-Zadeh et al. [STOC 2016]. In the latter, the authors provide algorithms for the error-less case and for the independent noise model (where each query independently receives an erroneous answer with known probability p<1/2 and a correct one with probability 1-p). We study this problem both with adversarial errors and independent noise models. First, we show an algorithm that needs at most (log_2 n)/(1 - H(r)) queries in case of adversarial errors, where the adversary is bounded with its rate of errors by a known constant r<1/2. Our algorithm is in fact a simplification of previous work, and our refinement lies in invoking an amortization argument. We then show that our algorithm coupled with a Chernoff bound argument leads to a simpler algorithm for the independent noise model and has a query complexity that is both simpler and asymptotically better than the one of Emamjomeh-Zadeh et al. [STOC 2016]. Our approach has a wide range of applications. First, it improves and simplifies the Robust Interactive Learning framework proposed by Emamjomeh-Zadeh and Kempe [NIPS 2017]. Secondly, performing analogous analysis for edge-queries (where a query to an edge e returns its endpoint that is closer to the target) we actually recover (as a special case) a noisy binary search algorithm that is asymptotically optimal, matching the complexity of Feige et al. [SIAM J. Comput. 1994]. Thirdly, we improve and simplify upon an algorithm for searching of unbounded domains due to Aslam and Dhagat [STOC 1991]

    An Indexing Framework for Queries on Probabilistic Graphs

    Get PDF
    postprin

    Reasoning & Querying – State of the Art

    Get PDF
    Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF

    Four Lessons in Versatility or How Query Languages Adapt to the Web

    Get PDF
    Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”
    corecore