1,261 research outputs found

    Answering Conjunctive Queries under Updates

    Full text link
    We consider the task of enumerating and counting answers to kk-ary conjunctive queries against relational databases that may be updated by inserting or deleting tuples. We exhibit a new notion of q-hierarchical conjunctive queries and show that these can be maintained efficiently in the following sense. During a linear time preprocessing phase, we can build a data structure that enables constant delay enumeration of the query results; and when the database is updated, we can update the data structure and restart the enumeration phase within constant time. For the special case of self-join free conjunctive queries we obtain a dichotomy: if a query is not q-hierarchical, then query enumeration with sublinear^\ast delay and sublinear update time (and arbitrary preprocessing time) is impossible. For answering Boolean conjunctive queries and for the more general problem of counting the number of solutions of k-ary queries we obtain complete dichotomies: if the query's homomorphic core is q-hierarchical, then size of the the query result can be computed in linear time and maintained with constant update time. Otherwise, the size of the query result cannot be maintained with sublinear update time. All our lower bounds rely on the OMv-conjecture, a conjecture on the hardness of online matrix-vector multiplication that has recently emerged in the field of fine-grained complexity to characterise the hardness of dynamic problems. The lower bound for the counting problem additionally relies on the orthogonal vectors conjecture, which in turn is implied by the strong exponential time hypothesis. )^\ast) By sublinear we mean O(n1ε)O(n^{1-\varepsilon}) for some ε>0\varepsilon>0, where nn is the size of the active domain of the current database

    Constant-Delay Enumeration for Nondeterministic Document Spanners

    Get PDF
    We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential variable-set automaton (VA). We pose this problem in the setting of enumeration algorithms, where we can first run a preprocessing phase and must then produce the results with a small delay between any two consecutive results. Our goal is to have an algorithm which is tractable in combined complexity, i.e., in the sizes of the input document and the VA; while ensuring the best possible data complexity bounds in the input document size, i.e., constant delay in the document size. Several recent works at PODS'18 proposed such algorithms but with linear delay in the document size or with an exponential dependency in size of the (generally nondeterministic) input VA. In particular, Florenzano et al. suggest that our desired runtime guarantees cannot be met for general sequential VAs. We refute this and show that, given a nondeterministic sequential VA and an input document, we can enumerate the mappings of the VA on the document with the following bounds: the preprocessing is linear in the document size and polynomial in the size of the VA, and the delay is independent of the document and polynomial in the size of the VA. The resulting algorithm thus achieves tractability in combined complexity and the best possible data complexity bounds. Moreover, it is rather easy to describe, in particular for the restricted case of so-called extended VAs. Finally, we evaluate our algorithm empirically using a prototype implementation.Comment: 29 pages. Extended version of arXiv:1807.09320. Integrates all corrections following reviewer feedback. Outside of some minor formatting differences and tweaks, this paper is the same as the paper to appear in the ACM TODS journa

    Constant Delay Enumeration with FPT-Preprocessing for Conjunctive Queries of Bounded Submodular Width

    Get PDF
    Marx (STOC 2010, J. ACM 2013) introduced the notion of submodular width of a conjunctive query (CQ) and showed that for any class Phi of Boolean CQs of bounded submodular width, the model-checking problem for Phi on the class of all finite structures is fixed-parameter tractable (FPT). Note that for non-Boolean queries, the size of the query result may be far too large to be computed entirely within FPT time. We investigate the free-connex variant of submodular width and generalise Marx\u27s result to non-Boolean queries as follows: For every class Phi of CQs of bounded free-connex submodular width, within FPT-preprocessing time we can build a data structure that allows to enumerate, without repetition and with constant delay, all tuples of the query result. Our proof builds upon Marx\u27s splitting routine to decompose the query result into a union of results; but we have to tackle the additional technical difficulty to ensure that these can be enumerated efficiently

    On the Enumeration of all Minimal Triangulations

    Full text link
    We present an algorithm that enumerates all the minimal triangulations of a graph in incremental polynomial time. Consequently, we get an algorithm for enumerating all the proper tree decompositions, in incremental polynomial time, where "proper" means that the tree decomposition cannot be improved by removing or splitting a bag

    Fine-Grained Complexity Analysis of Two Classic TSP Variants

    Get PDF
    We analyze two classic variants of the Traveling Salesman Problem using the toolkit of fine-grained complexity. Our first set of results is motivated by the Bitonic TSP problem: given a set of nn points in the plane, compute a shortest tour consisting of two monotone chains. It is a classic dynamic-programming exercise to solve this problem in O(n2)O(n^2) time. While the near-quadratic dependency of similar dynamic programs for Longest Common Subsequence and Discrete Frechet Distance has recently been proven to be essentially optimal under the Strong Exponential Time Hypothesis, we show that bitonic tours can be found in subquadratic time. More precisely, we present an algorithm that solves bitonic TSP in O(nlog2n)O(n \log^2 n) time and its bottleneck version in O(nlog3n)O(n \log^3 n) time. Our second set of results concerns the popular kk-OPT heuristic for TSP in the graph setting. More precisely, we study the kk-OPT decision problem, which asks whether a given tour can be improved by a kk-OPT move that replaces kk edges in the tour by kk new edges. A simple algorithm solves kk-OPT in O(nk)O(n^k) time for fixed kk. For 2-OPT, this is easily seen to be optimal. For k=3k=3 we prove that an algorithm with a runtime of the form O~(n3ϵ)\tilde{O}(n^{3-\epsilon}) exists if and only if All-Pairs Shortest Paths in weighted digraphs has such an algorithm. The results for k=2,3k=2,3 may suggest that the actual time complexity of kk-OPT is Θ(nk)\Theta(n^k). We show that this is not the case, by presenting an algorithm that finds the best kk-move in O(n2k/3+1)O(n^{\lfloor 2k/3 \rfloor + 1}) time for fixed k3k \geq 3. This implies that 4-OPT can be solved in O(n3)O(n^3) time, matching the best-known algorithm for 3-OPT. Finally, we show how to beat the quadratic barrier for k=2k=2 in two important settings, namely for points in the plane and when we want to solve 2-OPT repeatedly.Comment: Extended abstract appears in the Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016

    First-Order Query Evaluation with Cardinality Conditions

    Full text link
    We study an extension of first-order logic that allows to express cardinality conditions in a similar way as SQL's COUNT operator. The corresponding logic FOC(P) was introduced by Kuske and Schweikardt (LICS'17), who showed that query evaluation for this logic is fixed-parameter tractable on classes of structures (or databases) of bounded degree. In the present paper, we first show that the fixed-parameter tractability of FOC(P) cannot even be generalised to very simple classes of structures of unbounded degree such as unranked trees or strings with a linear order relation. Then we identify a fragment FOC1(P) of FOC(P) which is still sufficiently strong to express standard applications of SQL's COUNT operator. Our main result shows that query evaluation for FOC1(P) is fixed-parameter tractable with almost linear running time on nowhere dense classes of structures. As a corollary, we also obtain a fixed-parameter tractable algorithm for counting the number of tuples satisfying a query over nowhere dense classes of structures

    A Grammatical Inference Approach to Language-Based Anomaly Detection in XML

    Full text link
    False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and Countermeasures ECTCM 201

    Conjunctive Queries for Logic-Based Information Extraction

    Full text link
    This thesis offers two logic-based approaches to conjunctive queries in the context of information extraction. The first and main approach is the introduction of conjunctive query fragments of the logics FC and FC[REG], denoted as FC-CQ and FC[REG]-CQ respectively. FC is a first-order logic based on word equations, where the semantics are defined by limiting the universe to the factors of some finite input word. FC[REG] is FC extended with regular constraints. The second approach is to consider the dynamic complexity of FC.Comment: Based on the author's PhD thesis and contains work from two conference publications (arXiv:2104.04758, arXiv:1909.10869) which are joint work with Dominik D. Freydenberge
    corecore