32 research outputs found
Structural characterizations of the navigational expressiveness of relation algebras on a tree
Given a document D in the form of an unordered node-labeled tree, we study
the expressiveness on D of various basic fragments of XPath, the core
navigational language on XML documents. Working from the perspective of these
languages as fragments of Tarski's relation algebra, we give characterizations,
in terms of the structure of D, for when a binary relation on its nodes is
definable by an expression in these algebras. Since each pair of nodes in such
a relation represents a unique path in D, our results therefore capture the
sets of paths in D definable in each of the fragments. We refer to this
perspective on language semantics as the "global view." In contrast with this
global view, there is also a "local view" where one is interested in the nodes
to which one can navigate starting from a particular node in the document. In
this view, we characterize when a set of nodes in D can be defined as the
result of applying an expression to a given node of D. All these definability
results, both in the global and the local view, are obtained by using a robust
two-step methodology, which consists of first characterizing when two nodes
cannot be distinguished by an expression in the respective fragments of XPath,
and then bootstrapping these characterizations to the desired results.Comment: 58 Page
Relative Expressive Power of Navigational Querying on Graphs
Motivated by both established and new applications, we study navigational
query languages for graphs (binary relations). The simplest language has only
the two operators union and composition, together with the identity relation.
We make more powerful languages by adding any of the following operators:
intersection; set difference; projection; coprojection; converse; and the
diversity relation. All these operators map binary relations to binary
relations. We compare the expressive power of all resulting languages. We do
this not only for general path queries (queries where the result may be any
binary relation) but also for boolean or yes/no queries (expressed by the
nonemptiness of an expression). For both cases, we present the complete Hasse
diagram of relative expressiveness. In particular the Hasse diagram for boolean
queries contains some nontrivial separations and a few surprising collapses.Comment: An extended abstract announcing the results of this paper was
presented at the 14th International Conference on Database Theory, Uppsala,
Sweden, March 201
Logics for Unranked Trees: An Overview
Labeled unranked trees are used as a model of XML documents, and logical
languages for them have been studied actively over the past several years. Such
logics have different purposes: some are better suited for extracting data,
some for expressing navigational properties, and some make it easy to relate
complex properties of trees to the existence of tree automata for those
properties. Furthermore, logics differ significantly in their model-checking
properties, their automata models, and their behavior on ordered and unordered
trees. In this paper we present a survey of logics for unranked trees
Similarity and bisimilarity notions appropriate for characterizing indistinguishability in fragments of the calculus of relations
Motivated by applications in databases, this paper considers various
fragments of the calculus of binary relations. The fragments are obtained by
leaving out, or keeping in, some of the standard operators, along with some
derived operators such as set difference, projection, coprojection, and
residuation. For each considered fragment, a characterization is obtained for
when two given binary relational structures are indistinguishable by
expressions in that fragment. The characterizations are based on appropriately
adapted notions of simulation and bisimulation.Comment: 36 pages, Journal of Logic and Computation 201
Implementation of Web Query Languages Reconsidered
Visions of the next generation Web such as the "Semantic Web" or the "Web 2.0" have triggered the emergence of a multitude of data formats. These formats have different characteristics as far as the shape of data is concerned (for example tree- vs. graph-shaped). They are accompanied by a puzzlingly large number of query languages each limited to one data format. Thus, a key feature of the Web, namely to make it possible to access anything published by anyone, is compromised.
This thesis is devoted to versatile query languages capable of accessing data in a variety of Web formats. The issue is addressed from three angles: language design, common, yet uniform semantics, and common, yet uniform evaluation. % Thus it is divided in three parts:
First, we consider the query language Xcerpt as an example of the advocated class of versatile Web query languages. Using this concrete exemplar allows us to clarify and discuss the vision of versatility in detail.
Second, a number of query languages, XPath, XQuery, SPARQL, and Xcerpt, are translated into a common intermediary language, CIQLog. This language has a purely logical semantics, which makes it easily amenable to optimizations. As a side effect, this provides the, to the best of our knowledge, first logical semantics for XQuery and SPARQL. It is a very useful tool for understanding the commonalities and differences of the considered languages.
Third, the intermediate logical language is translated into a query algebra, CIQCAG. The core feature of CIQCAG is that it scales from tree- to graph-shaped data and queries without efficiency losses when tree-data and -queries are considered: it is shown that, in these cases, optimal complexities are achieved. CIQCAG is also shown to evaluate each of the aforementioned query languages with a complexity at least as good as the best known evaluation methods so far. For example, navigational XPath is evaluated with space complexity O(q d) and time complexity O(q n) where q is the query size, n the data size, and d the depth of the (tree-shaped) data.
CIQCAG is further shown to provide linear time and space evaluation of tree-shaped queries for a larger class of graph-shaped data than any method previously proposed. This larger class of graph-shaped data, called continuous-image graphs, short CIGs, is introduced for the first time in this thesis. A (directed) graph is a CIG if its nodes can be totally ordered in such a manner that, for this order, the children of any node form a continuous interval.
CIQCAG achieves these properties by employing a novel data structure, called sequence map, that allows an efficient evaluation of tree-shaped queries, or of tree-shaped cores of graph-shaped queries on any graph-shaped data. While being ideally suited to trees and CIGs, the data structure gracefully degrades to unrestricted graphs. It yields a remarkably efficient evaluation on graph-shaped data that only a few edges prevent from being trees or CIGs
Leveraging query logs for user-centric OLAP
OLAP (On-Line Analytical Processing), the process of efficiently enabling common analytical operations on the multidimensional view of data, is a corner stone of Business Intelligence.While OLAP is now a mature, efficiently implemented technology, very little attention has been paid to the effectiveness of the analysis and the user-friendliness of this technology, often considered tedious of use.This dissertation is a contribution to developing user-centric OLAP, focusing on the use of former queries logged by an OLAP server to enhance subsequent analyses. It shows how logs of OLAP queries can be modeled, constructed, manipulated, compared, and finally leveraged for personalization and recommendation.Logs are modeled as sets of analytical sessions, sessions being modeled as sequences of OLAP queries. Three main approaches are presented for modeling queries: as unevaluated collections of fragments (e.g., group by sets, sets of selection predicates, sets of measures), as sets of references obtained by partially evaluating the query over dimensions, or as query answers. Such logs can be constructed even from sets of SQL query expressions, by translating these expressions into a multidimensional algebra, and bridging the translations to detect analytical sessions. Logs can be searched, filtered, compared, combined, modified and summarized with a language inspired by the relational algebra and parametrized by binary relations over sessions. In particular, these relations can be specialization relations or based on similarity measures tailored for OLAP queries and analytical sessions. Logs can be mined for various hidden knowledge, that, depending on the query model used, accurately represents the user behavior extracted.This knowledge includes simple preferences, navigational habits and discoveries made during former explorations,and can be it used in various query personalization or query recommendation approaches.Such approaches vary in terms of formulation effort, proactiveness, prescriptiveness and expressive power:query personalization, i.e., coping with a current query too few or too many results, can use dedicated operators for expressing preferences, or be based on query expansion;query recommendation, i.e., suggesting queries to pursue an analytical session,can be based on information extracted from the current state of the database and the query, or be purely history based, i.e., leveraging the query log.While they can be immediately integrated into a complete architecture for User-Centric Query Answering in data warehouses, the models and approaches introduced in this dissertation can also be seen as a starting point for assessing the effectiveness of analytical sessions, with the ultimate goal to enhance the overall decision making process
Ontological foundations for structural conceptual models
In this thesis, we aim at contributing to the theory of conceptual modeling and ontology representation. Our main objective here is to provide ontological foundations for the most fundamental concepts in conceptual modeling. These foundations comprise a number of ontological theories, which are built on established work on philosophical ontology, cognitive psychology, philosophy of language and linguistics. Together these theories amount to a system of categories and formal relations known as a foundational ontolog