10 research outputs found
On the Complexity of Nonrecursive XQuery and Functional Query Languages on Complex Values
This paper studies the complexity of evaluating functional query languages
for complex values such as monad algebra and the recursion-free fragment of
XQuery.
We show that monad algebra with equality restricted to atomic values is
complete for the class TA[2^{O(n)}, O(n)] of problems solvable in linear
exponential time with a linear number of alternations. The monotone fragment of
monad algebra with atomic value equality but without negation is complete for
nondeterministic exponential time. For monad algebra with deep equality, we
establish TA[2^{O(n)}, O(n)] lower and exponential-space upper bounds.
Then we study a fragment of XQuery, Core XQuery, that seems to incorporate
all the features of a query language on complex values that are traditionally
deemed essential. A close connection between monad algebra on lists and Core
XQuery (with ``child'' as the only axis) is exhibited, and it is shown that
these languages are expressively equivalent up to representation issues. We
show that Core XQuery is just as hard as monad algebra w.r.t. combined
complexity, and that it is in TC0 if the query is assumed fixed.Comment: Long version of PODS 2005 pape
Normal Forms And Conservative Extension Properties For Query Languages Over Collection Types
Strong normalization results are obtained for a general language for collection types. An induced normal form for sets and bags is then used to show that the class of functions whose input has height (that is, the maximal depth of nestings of sets/bags/lists in the complex object) at most i and output has height at most o definable in a nested relational query language without powerset operator is independent of the height of intermediate expressions used. Our proof holds regardless of whether the language is used for querying sets, bags, or lists, even in the presence of variant types. Moreover, the normal forms are useful in a general approach to query optimization. Paredaens and Van Gucht proved a similar result for the special case when i = o = 1. Their result is complemented by Hull and Su who demonstrated the failure of independence when powerset operator is present and i = o = 1. The theorem of Hull and Su was generalized to all i and o by Grumbach and Vianu. Our result generali..
Functional Collection Programming with Semi-Ring Dictionaries
This paper introduces semi-ring dictionaries, a powerful class of
compositional and purely functional collections that subsume other collection
types such as sets, multisets, arrays, vectors, and matrices. We developed
SDQL, a statically typed language that can express relational algebra with
aggregations, linear algebra, and functional collections over data such as
relations and matrices using semi-ring dictionaries. Furthermore, thanks to the
algebraic structure behind these dictionaries, SDQL unifies a wide range of
optimizations commonly used in databases (DB) and linear algebra (LA). As a
result, SDQL enables efficient processing of hybrid DB and LA workloads, by
putting together optimizations that are otherwise confined to either DB systems
or LA frameworks. We show experimentally that a handful of DB and LA workloads
can take advantage of the SDQL language and optimizations. Overall, we observe
that SDQL achieves competitive performance relative to Typer and Tectorwise,
which are state-of-the-art in-memory DB systems for (flat, not nested)
relational data, and achieves an average 2x speedup over SciPy for LA
workloads. For hybrid workloads involving LA processing, SDQL achieves up to
one order of magnitude speedup over Trance, a state-of-the-art nested
relational engine for nested biomedical data, and gives an average 40% speedup
over LMFAO, a state-of-the-art in-DB machine learning engine for two (flat)
relational real-world retail datasets
Generating collection transformations from proofs
Nested relations, built up from atomic types via product and set types, form a rich data model. Over the last decades the nested relational calculus, NRC, has emerged as a standard language for defining transformations on nested collections. NRC is a strongly-typed functional language which allows building up transformations using tupling and projections, a singleton-former, and a map operation that lifts transformations on tuples to transformations on sets.In this work we describe an alternative declarative method of describing transformations in logic. A formula with distinguished inputs and outputs gives an implicit definition if one can prove that for each input there is only one output that satisfies it. Our main result shows that one can synthesize transformations from proofs that a formula provides an implicit definition, where the proof is in an intuitionistic calculus that captures a natural style of reasoning about nested collections. Our polynomial time synthesis procedure is based on an analog of Craigâs interpolation lemma, starting with a provable containment between terms representing nested collections and generating an NRC expression that interpolates between them.We further show that NRC expressions that implement an implicit definition can be found when there is a classical proof of functionality, not just when there is an intuitionistic one. That is, whenever a formula implicitly defines a transformation, there is an NRC expression that implements it
Language-integrated provenance
Provenance, or information about the origin or derivation of data, is
important for assessing the trustworthiness of data and identifying and
correcting mistakes. Most prior implementations of data provenance have
involved heavyweight modifications to database systems and little attention has
been paid to how the provenance data can be used outside such a system. We
present extensions to the Links programming language that build on its support
for language-integrated query to support provenance queries by rewriting and
normalizing monadic comprehensions and extending the type system to distinguish
provenance metadata from normal data. The main contribution of this article is
to show that the two most common forms of provenance can be implemented
efficiently and used safely as a programming language feature with no changes
to the database system.Comment: Accepted to Science of Computer Programming special issue on PPDP
201