135 research outputs found
Query evaluation revised: parallel, distributed, via rewritings
This is a thesis on query evaluation in parallel and distributed settings, and structurally simple rewritings.
It consists of three parts.
In the first part, we investigate the efficiency of constant-time parallel evaluation algorithms. That is, the number of required processors or, asymptotically equivalent, the work required to evaluate queries in constant time. It is known that relational algebra queries can be evaluated in constant time. However, work-efficiency has not been a focus, and indeed known evaluation algorithms yield huge (polynomial) work bounds. We establish work-efficient constant-time algorithms for several query classes: (free-connex) acyclic, semi-join algebra, and natural join queries; the latter in the worst-case framework.
The second part is about deciding parallel-correctness of distributed evaluation strategies: Given a query and policies specifying how data is distributed and communicated among multiple servers, does the distributed evaluation yield the same result as the classical evaluation, for every database? Ketsman et al. proved that parallel-correctness for Datalog is undecidable; by reduction from the undecidable containment problem for Datalog. We show that parallel-correctness is already undecidable for monadic and frontier-guarded Datalog queries, for which containment is decidable. However, deciding parallel-correctness for frontier-guarded Datalog and constraint-based communication policies satisfying a certain property is 2ExpTime-complete. Furthermore, we obtain the same bounds for the parallel-boundedness problem, which asks whether the number of required communication rounds is bounded, over all databases.
The third part is about structurally simple rewritings. The (classical) rewriting problem asks whether, for a given query and a set of views, there is a query, called rewriting, over the views that is equivalent to the given query. We study the variant of this problem for (subclasses of) conjunctive queries and views that asks for a structurally simple rewriting. We prove that, if the given query is acyclic, an acyclic rewriting exists if there is any rewriting at all. Analogous statements hold for free-connex acyclic, hierarchical, and q-hierarchical queries. Furthermore, we prove that the problem is NP-hard, even if the given query and the views are acyclic or hierarchical. It becomes tractable if the views are free-connex acyclic or q-hierarchical (and the arity of the database schema is bounded)
A Data Transformation System for Biological Data Sources
Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data
Synthesizing nested relational queries from implicit specifications: via model theory and via proof theory
Derived datasets can be defined implicitly or explicitly. An implicit definition (of dataset O in terms of datasets ) is a logical specification involving two distinguished sets of relational symbols. One set of relations is for the âsource dataâ , and the other is for the âinterface dataâ O. Such a specification is a valid definition of O in terms of , if any two models of the specification agreeing on agree on O. In contrast, an explicit definition is a transformation (or âqueryâ below) that produces O from . Variants of Bethâs theorem [Bet53] state that one can convert implicit definitions to explicit ones. Further, this conversion can be done effectively given a proof witnessing implicit definability in a suitable proof system. We prove the analogous implicit-to-explicit result for nested relations: implicit definitions, given in the natural logic for nested relations, can be converted to explicit definitions in the nested relational calculus (NRC). We first provide a model-theoretic argument for this result, which makes some additional connections that may be of independent interest, between NRC queries, interpretations, a standard mechanism for defining structure-to-structure translation in logic, and between interpretations and implicit to definability âup to unique isomorphismâ. The latter connection uses a variation of a result of Gaifman concerning ârelatively categoricalâ theories. We also provide a proof-theoretic result that provides an effective argument: from a proof witnessing implicit definability, we can efficiently produce an NRC definition. This will involve introducing the appropriate proof system for reasoning with nested sets, along with some auxiliary Beth-type results for this system. As a consequence, we can effectively extract rewritings of NRC queries in terms of NRC views, given a proof witnessing that the query is determined by the views
Synthesizing nested relational queries from implicit specifications: via model theory and via proof theory
Derived datasets can be defined implicitly or explicitly. An implicit
definition (of dataset O in terms of datasets I) is a logical specification
involving the source data I and the interface data O. It is a valid definition
of O in terms of I, if any two models of the specification agreeing on I agree
on O. In contrast, an explicit definition is a query that produces O from I.
Variants of Beth's theorem state that one can convert implicit definitions to
explicit ones. Further, this conversion can be done effectively given a proof
witnessing implicit definability in a suitable proof system.
We prove the analogous implicit-to-explicit result for nested relations:
implicit definitions, given in the natural logic for nested relations, can be
converted to explicit definitions in the nested relational calculus (NRC) We
first provide a model-theoretic argument for this result, which makes some
additional connections that may be of independent interest. between NRC
queries, interpretations, a standard mechanisms for defining
structure-to-structure translation in logic, and between interpretations and
implicit to definability "up to unique isomorphism". The latter connection
makes use of a variation of a result of Gaifman concerning "relatively
categorical" theories.
We also provide a proof-theoretic result that provides an effective argument:
from a proof witnessing implicit definability, we can efficiently produce an
NRC definition. This will involve introducing the appropriate proof system for
reasoning with nested sets, along with some auxiliary Beth-type results for
this system. As a consequence, we can effectively extract rewritings of NRC
queries in terms of NRC views, given a proof witnessing that the query is
determined by the views.Comment: arXiv admin note: substantial text overlap with arXiv:2209.08299,
arXiv:2005.0650
Logics for Unranked Trees: An Overview
Labeled unranked trees are used as a model of XML documents, and logical
languages for them have been studied actively over the past several years. Such
logics have different purposes: some are better suited for extracting data,
some for expressing navigational properties, and some make it easy to relate
complex properties of trees to the existence of tree automata for those
properties. Furthermore, logics differ significantly in their model-checking
properties, their automata models, and their behavior on ordered and unordered
trees. In this paper we present a survey of logics for unranked trees
Views and Queries: Determinacy and Rewriting
International audienceWe investigate the question of whether a query Q can be answered using a set V of views. We first define the problem in information-theoretic terms: we say that V determines Q if V provides enough information to uniquely determine the answer to Q . Next, we look at the problem of rewriting Q in terms of V using a specific language. Given a view language V and query language Q , we say that a rewriting language R is complete for V -to- Q rewritings if every Q â Q can be rewritten in terms of V â V using a query in R , whenever V determines Q . While query rewriting using views has been extensively investigated for some specific languages, the connection to the information-theoretic notion of determinacy, and the question of completeness of a rewriting language have received little attention. In this article we investigate systematically the notion of determinacy and its connection to rewriting. The results concern decidability of determinacy for various view and query languages, as well as the power required of complete rewriting languages. We consider languages ranging from first-order to conjunctive queries
- âŠ