137 research outputs found

    A comparison between algebraic query languages for flat and nested databases

    Get PDF
    AbstractRecently, much attention has been paid to query languages for nested relations. In the present paper, we consider the nested algebra and the powerset algebra, and compare them both mutually as well as to the traditional flat algebra. We show that either nest or difference can be removed as a primitive operator in the powerset algebra. While the redundancy of the nest operator might have been expected, the same cannot be said of the difference. Basically, this result shows that the presence of one nonmonotonic operator suffices in the powerset algebra. As an interesting consequence of this result, the nested algebra without the difference remains complete in the sense of Bancilhon and Paredaens. Finally, we show there are both similarities and fundamental differences between the expressiveness of query languages for nested relations and that of their counterparts for flat relations

    Similarity and bisimilarity notions appropriate for characterizing indistinguishability in fragments of the calculus of relations

    Full text link
    Motivated by applications in databases, this paper considers various fragments of the calculus of binary relations. The fragments are obtained by leaving out, or keeping in, some of the standard operators, along with some derived operators such as set difference, projection, coprojection, and residuation. For each considered fragment, a characterization is obtained for when two given binary relational structures are indistinguishable by expressions in that fragment. The characterizations are based on appropriately adapted notions of simulation and bisimulation.Comment: 36 pages, Journal of Logic and Computation 201

    Structural characterizations of the navigational expressiveness of relation algebras on a tree

    Full text link
    Given a document D in the form of an unordered node-labeled tree, we study the expressiveness on D of various basic fragments of XPath, the core navigational language on XML documents. Working from the perspective of these languages as fragments of Tarski's relation algebra, we give characterizations, in terms of the structure of D, for when a binary relation on its nodes is definable by an expression in these algebras. Since each pair of nodes in such a relation represents a unique path in D, our results therefore capture the sets of paths in D definable in each of the fragments. We refer to this perspective on language semantics as the "global view." In contrast with this global view, there is also a "local view" where one is interested in the nodes to which one can navigate starting from a particular node in the document. In this view, we characterize when a set of nodes in D can be defined as the result of applying an expression to a given node of D. All these definability results, both in the global and the local view, are obtained by using a robust two-step methodology, which consists of first characterizing when two nodes cannot be distinguished by an expression in the respective fragments of XPath, and then bootstrapping these characterizations to the desired results.Comment: 58 Page

    Relative Expressive Power of Navigational Querying on Graphs

    Get PDF
    Motivated by both established and new applications, we study navigational query languages for graphs (binary relations). The simplest language has only the two operators union and composition, together with the identity relation. We make more powerful languages by adding any of the following operators: intersection; set difference; projection; coprojection; converse; and the diversity relation. All these operators map binary relations to binary relations. We compare the expressive power of all resulting languages. We do this not only for general path queries (queries where the result may be any binary relation) but also for boolean or yes/no queries (expressed by the nonemptiness of an expression). For both cases, we present the complete Hasse diagram of relative expressiveness. In particular the Hasse diagram for boolean queries contains some nontrivial separations and a few surprising collapses.Comment: An extended abstract announcing the results of this paper was presented at the 14th International Conference on Database Theory, Uppsala, Sweden, March 201

    On the effectiveness and efficiency of computing bounds on the support of item-sets in the frequent item-sets mining problem

    Get PDF
    A paper submitted to : OSDM '05 Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, Pages 46-55, Chicago, Illinois — August 21 - 21, 2005We study the relative effectiveness and the efficiency of computing support-bounding rules that can be used to prune the search space in algorithms to solve the frequent item-sets mining problem (FIM). We develop a formalism wherein these rules can be stated and analyzed using the concept of differentials and density functions of the support function. We derive a general bounding theorem, which provides lower and upper bounds on the supports of item-sets in terms of the supports of their subsets. Since, in general, many lower and upper bounds exists for the support of an item-set, we show how to the best bounds. The result of this optimization shows that the best bounds are among those that involve the supports of all the strict subsets of an item-set of a particular size q. These bounds are determined on the basis of so called q-rules. In this way, we derive the bounding theorem established by Calders [5]. For these types of bounds, we consider how they compare relative to each other, and in so doing determine the best bounds. Since determining these bounds is combinatorially expensive, we study heuristics that efficiently produce bounds that are usually the best. These heuristics always produce the best bounds on the support of item-sets for basket databases that satisfies independence properties. In particular, we show that for an item-set I determining which bounds to compute that lead to the best lower and upper bounds on freq(I) can be done in time O(|I|). Even though, in practice, basket databases do not have these independence properties, we argue that our analysis carries over to a much larger set of basket databases where local “near” independence hold. Finally, we conduct an experimental study using real baskets databases, where we compute upper bounds in the context of generalizing the Apriori algorithm. Both the analysis and the study confirm that the q-rule (q odd and larger than 1) will almost always do better than the 1-rule (Apriori rule) on large dense baskets databases. Our experiment reveal that on these baskets databases, the 3-rule prunes almost 100% of the search space while, the 1-rule prunes 96% of the search space in the early stages of the algorithm. We also observe a reduction in wasted effort when applying the 3-rule to sparse baskets databases. In addition, we give experimental evidence that the combined use of the lower and upper bounds determine the exact support of many frequent item-sets without counting

    On the Decidability of Semilinearity for Semialgebraic Sets and Its Implications for Spatial Databases

    Get PDF
    AbstractSeveral authors have suggested using first-order logic over the real numbers to describe spatial database applications. Geometric objects are then described by polynomial inequalities with integer coefficients involving the coordinates of the objects. Such geometric objects are called semialgebraic sets. Similarly, queries are expressed by polynomial inequalities. The query language thus obtained is usually referred to as FO+poly. From a practical point of view, it has been argued that a linear restriction of this so-called polynomial model is more desirable. In the so-called linear model, geometric objects are described by linear inequalities and are called semilinear sets. The language of the queries expressible by linear inequalities is usually referred to as FO+linear. As part of a general study of the feasibility of the linear model, we show in this paper that semilinearity is decidable for semialgebraic sets. In doing so, we point out important subtleties related to the type of the coefficients in the linear inequalities used to describe semilinear sets. An important concept in the development of the paper is regular stratification. We point out the geometric significance, as well as its significance in the context of FO+linear and FO+poly computations. The decidability of semilinearity of semialgebraic sets has an important consequence. It has been shown that it is undecidable whether a query expressible in FO+poly is linear, i.e., maps spatial databases of the linear model into spatial databases of the linear model. It follows now that, despite this negative result, there exists a syntactically definable language precisely expressing the linear queries expressible in FO+poly
    corecore