627 research outputs found

    Queries with Guarded Negation (full version)

    Full text link
    A well-established and fundamental insight in database theory is that negation (also known as complementation) tends to make queries difficult to process and difficult to reason about. Many basic problems are decidable and admit practical algorithms in the case of unions of conjunctive queries, but become difficult or even undecidable when queries are allowed to contain negation. Inspired by recent results in finite model theory, we consider a restricted form of negation, guarded negation. We introduce a fragment of SQL, called GN-SQL, as well as a fragment of Datalog with stratified negation, called GN-Datalog, that allow only guarded negation, and we show that these query languages are computationally well behaved, in terms of testing query containment, query evaluation, open-world query answering, and boundedness. GN-SQL and GN-Datalog subsume a number of well known query languages and constraint languages, such as unions of conjunctive queries, monadic Datalog, and frontier-guarded tgds. In addition, an analysis of standard benchmark workloads shows that most usage of negation in SQL in practice is guarded negation

    Monadic Datalog Containment on Trees

    Get PDF
    We show that the query containment problem for monadic datalog on finite unranked labeled trees can be solved in 2-fold exponential time when (a) considering unordered trees using the axes child and descendant, and when (b) considering ordered trees using the axes firstchild, nextsibling, child, and descendant. When omitting the descendant-axis, we obtain that in both cases the problem is EXPTIME-complete.Comment: This article is the full version of an article published in the proccedings of the 8th Alberto Mendelzon Workshop (AMW 2014

    Dynamic Complexity under Definable Changes

    Get PDF
    This paper studies dynamic complexity under definable change operations in the DynFO framework by Patnaik and Immerman. It is shown that for changes definable by parameter-free first-order formulas, all (uniform) AC1 queries can be maintained by first-order dynamic programs. Furthermore, many maintenance results for single-tuple changes are extended to more powerful change operations: (1) The reachability query for undirected graphs is first-order maintainable under single tuple changes and first-order defined insertions, likewise the reachability query for directed acyclic graphs under quantifier-free insertions. (2) Context-free languages are first-order maintainable under EFO-defined changes. These results are complemented by several inexpressibility results, for example, that the reachability query cannot be maintained by quantifier-free programs under definable, quantifier-free deletions

    Dynamic Graph Queries

    Get PDF
    Graph databases in many applications - semantic web, transport or biological networks among others - are not only large, but also frequently modified. Evaluating graph queries in this dynamic context is a challenging task, as those queries often combine first-order and navigational features. Motivated by recent results on maintaining dynamic reachability, we study the dynamic evaluation of traditional query languages for graphs in the descriptive complexity framework. Our focus is on maintaining regular path queries, and extensions thereof, by first-order formulas. In particular we are interested in path queries defined by non-regular languages and in extended conjunctive regular path queries (which allow to compare labels of paths based on word relations). Further we study the closely related problems of maintaining distances in graphs and reachability in product graphs. In this preliminary study we obtain upper bounds for those problems in restricted settings, such as undirected and acyclic graphs, or under insertions only, and negative results regarding quantifier-free update formulas. In addition we point out interesting directions for further research

    Queries with Arithmetic on Incomplete Databases

    Get PDF
    International audienceThe standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases,relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies,we explain that it becomes much more problematic in situations when missing data occurs in numerical columns. We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behaviorof volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach

    On the Decidability of Semilinearity for Semialgebraic Sets and Its Implications for Spatial Databases

    Get PDF
    AbstractSeveral authors have suggested using first-order logic over the real numbers to describe spatial database applications. Geometric objects are then described by polynomial inequalities with integer coefficients involving the coordinates of the objects. Such geometric objects are called semialgebraic sets. Similarly, queries are expressed by polynomial inequalities. The query language thus obtained is usually referred to as FO+poly. From a practical point of view, it has been argued that a linear restriction of this so-called polynomial model is more desirable. In the so-called linear model, geometric objects are described by linear inequalities and are called semilinear sets. The language of the queries expressible by linear inequalities is usually referred to as FO+linear. As part of a general study of the feasibility of the linear model, we show in this paper that semilinearity is decidable for semialgebraic sets. In doing so, we point out important subtleties related to the type of the coefficients in the linear inequalities used to describe semilinear sets. An important concept in the development of the paper is regular stratification. We point out the geometric significance, as well as its significance in the context of FO+linear and FO+poly computations. The decidability of semilinearity of semialgebraic sets has an important consequence. It has been shown that it is undecidable whether a query expressible in FO+poly is linear, i.e., maps spatial databases of the linear model into spatial databases of the linear model. It follows now that, despite this negative result, there exists a syntactically definable language precisely expressing the linear queries expressible in FO+poly

    Queries with Arithmetic on Incomplete Databases

    Get PDF
    The standard notion of query answering over incomplete database is that of certain answers, guaranteeing correctness regardless of how incomplete data is interpreted. In majority of real-life databases, relations have numerical columns and queries use arithmetic and comparisons. Even though the notion of certain answers still applies, we explain that it becomes much more problematic in situations when missing data occurs in numerical columns. We propose a new general framework that allows us to assign a measure of certainty to query answers. We test it in the agnostic scenario where we do not have prior information about values of numerical attributes, similarly to the predominant approach in handling incomplete data which assumes that each null can be interpreted as an arbitrary value of the domain. The key technical challenge is the lack of a uniform distribution over the entire domain of numerical attributes, such as real numbers. We overcome this by associating the measure of certainty with the asymptotic behavior of volumes of some subsets of the Euclidean space. We show that this measure is well-defined, and describe approaches to computing and approximating it. While it can be computationally hard, or result in an irrational number, even for simple constraints, we produce polynomial-time randomized approximation schemes with multiplicative guarantees for conjunctive queries, and with additive guarantees for arbitrary first-order queries. We also describe a set of experimental results to confirm the feasibility of this approach
    • …
    corecore