137 research outputs found
A comparison between algebraic query languages for flat and nested databases
AbstractRecently, much attention has been paid to query languages for nested relations. In the present paper, we consider the nested algebra and the powerset algebra, and compare them both mutually as well as to the traditional flat algebra. We show that either nest or difference can be removed as a primitive operator in the powerset algebra. While the redundancy of the nest operator might have been expected, the same cannot be said of the difference. Basically, this result shows that the presence of one nonmonotonic operator suffices in the powerset algebra. As an interesting consequence of this result, the nested algebra without the difference remains complete in the sense of Bancilhon and Paredaens. Finally, we show there are both similarities and fundamental differences between the expressiveness of query languages for nested relations and that of their counterparts for flat relations
Similarity and bisimilarity notions appropriate for characterizing indistinguishability in fragments of the calculus of relations
Motivated by applications in databases, this paper considers various
fragments of the calculus of binary relations. The fragments are obtained by
leaving out, or keeping in, some of the standard operators, along with some
derived operators such as set difference, projection, coprojection, and
residuation. For each considered fragment, a characterization is obtained for
when two given binary relational structures are indistinguishable by
expressions in that fragment. The characterizations are based on appropriately
adapted notions of simulation and bisimulation.Comment: 36 pages, Journal of Logic and Computation 201
Structural characterizations of the navigational expressiveness of relation algebras on a tree
Given a document D in the form of an unordered node-labeled tree, we study
the expressiveness on D of various basic fragments of XPath, the core
navigational language on XML documents. Working from the perspective of these
languages as fragments of Tarski's relation algebra, we give characterizations,
in terms of the structure of D, for when a binary relation on its nodes is
definable by an expression in these algebras. Since each pair of nodes in such
a relation represents a unique path in D, our results therefore capture the
sets of paths in D definable in each of the fragments. We refer to this
perspective on language semantics as the "global view." In contrast with this
global view, there is also a "local view" where one is interested in the nodes
to which one can navigate starting from a particular node in the document. In
this view, we characterize when a set of nodes in D can be defined as the
result of applying an expression to a given node of D. All these definability
results, both in the global and the local view, are obtained by using a robust
two-step methodology, which consists of first characterizing when two nodes
cannot be distinguished by an expression in the respective fragments of XPath,
and then bootstrapping these characterizations to the desired results.Comment: 58 Page
Relative Expressive Power of Navigational Querying on Graphs
Motivated by both established and new applications, we study navigational
query languages for graphs (binary relations). The simplest language has only
the two operators union and composition, together with the identity relation.
We make more powerful languages by adding any of the following operators:
intersection; set difference; projection; coprojection; converse; and the
diversity relation. All these operators map binary relations to binary
relations. We compare the expressive power of all resulting languages. We do
this not only for general path queries (queries where the result may be any
binary relation) but also for boolean or yes/no queries (expressed by the
nonemptiness of an expression). For both cases, we present the complete Hasse
diagram of relative expressiveness. In particular the Hasse diagram for boolean
queries contains some nontrivial separations and a few surprising collapses.Comment: An extended abstract announcing the results of this paper was
presented at the 14th International Conference on Database Theory, Uppsala,
Sweden, March 201
On the effectiveness and efficiency of computing bounds on the support of item-sets in the frequent item-sets mining problem
A paper submitted to : OSDM '05 Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, Pages 46-55, Chicago, Illinois — August 21 - 21, 2005We study the relative effectiveness and the efficiency of computing
support-bounding rules that can be used to prune the search
space in algorithms to solve the frequent item-sets mining problem
(FIM). We develop a formalism wherein these rules can be stated
and analyzed using the concept of differentials and density functions
of the support function. We derive a general bounding theorem,
which provides lower and upper bounds on the supports of
item-sets in terms of the supports of their subsets. Since, in general,
many lower and upper bounds exists for the support of an item-set,
we show how to the best bounds. The result of this optimization
shows that the best bounds are among those that involve the supports
of all the strict subsets of an item-set of a particular size q.
These bounds are determined on the basis of so called q-rules. In
this way, we derive the bounding theorem established by Calders
[5]. For these types of bounds, we consider how they compare relative
to each other, and in so doing determine the best bounds. Since
determining these bounds is combinatorially expensive, we study
heuristics that efficiently produce bounds that are usually the best.
These heuristics always produce the best bounds on the support of
item-sets for basket databases that satisfies independence properties.
In particular, we show that for an item-set I determining which
bounds to compute that lead to the best lower and upper bounds on
freq(I) can be done in time O(|I|). Even though, in practice,
basket databases do not have these independence properties, we argue
that our analysis carries over to a much larger set of basket
databases where local “near” independence hold. Finally, we conduct
an experimental study using real baskets databases, where we
compute upper bounds in the context of generalizing the Apriori algorithm.
Both the analysis and the study confirm that the q-rule (q
odd and larger than 1) will almost always do better than the 1-rule
(Apriori rule) on large dense baskets databases. Our experiment reveal that on these baskets databases, the 3-rule prunes almost 100%
of the search space while, the 1-rule prunes 96% of the search space
in the early stages of the algorithm. We also observe a reduction in
wasted effort when applying the 3-rule to sparse baskets databases.
In addition, we give experimental evidence that the combined use
of the lower and upper bounds determine the exact support of many
frequent item-sets without counting
On the Decidability of Semilinearity for Semialgebraic Sets and Its Implications for Spatial Databases
AbstractSeveral authors have suggested using first-order logic over the real numbers to describe spatial database applications. Geometric objects are then described by polynomial inequalities with integer coefficients involving the coordinates of the objects. Such geometric objects are called semialgebraic sets. Similarly, queries are expressed by polynomial inequalities. The query language thus obtained is usually referred to as FO+poly. From a practical point of view, it has been argued that a linear restriction of this so-called polynomial model is more desirable. In the so-called linear model, geometric objects are described by linear inequalities and are called semilinear sets. The language of the queries expressible by linear inequalities is usually referred to as FO+linear. As part of a general study of the feasibility of the linear model, we show in this paper that semilinearity is decidable for semialgebraic sets. In doing so, we point out important subtleties related to the type of the coefficients in the linear inequalities used to describe semilinear sets. An important concept in the development of the paper is regular stratification. We point out the geometric significance, as well as its significance in the context of FO+linear and FO+poly computations. The decidability of semilinearity of semialgebraic sets has an important consequence. It has been shown that it is undecidable whether a query expressible in FO+poly is linear, i.e., maps spatial databases of the linear model into spatial databases of the linear model. It follows now that, despite this negative result, there exists a syntactically definable language precisely expressing the linear queries expressible in FO+poly
- …