8 research outputs found

    Order dependency in the relational model

    Get PDF
    AbstractThe relational model is formally extended to include fixed orderings on attribute domains. A new constraint, called order dependency, is then introduced to incorporate semantic information involving these orderings. It is shown that this constraint can be applied to enhance the efficiency of an implemented database. The thrust of the paper is to study logical implication for order dependency. The main theoretical results consist in (i) introducing a formalism analogous to propositional calculus for analyzing order dependency, (ii) exhibiting a sound and complete set of inference rules for order dependency, and (iii) demonstrating that determining logical implication for order dependency is co-NP-complete. It is also shown that there are sets of order dependencies for which no Armstrong relations exist

    Fundamentals and applications of order dependencies

    Get PDF
    Business-intelligence queries often involve SQL functions and algebraic expressions. There can be clear semantic relationships between a column's values and the values of a function over that column. A common property is monotonicity: as the column's values ascend, so do the function's values (or the other column's values). This we call an order dependency (OD). Queries can be evaluated more efficiently when the query optimizer uses order dependencies. They can be run even faster when the optimizer can also reason over known ODs to infer new ones. Order dependencies can be declared as integrity constraints, and they can be detected automatically for many types of SQL functions and algebraic expressions. We present optimization techniques using ODs for queries that involve join, order by, group by, partition by, and distinct. Essentially, ODs can further exploit interesting orders to eliminate or simplify potentially expensive sorts in the query plan. We evaluate these techniques over our prototype implementation in IBM® DB2® using the TPC-DS® benchmark schema and some customer inspired queries. Our experimental results demonstrate a significant performance gain. Dependencies have played an important role in database theory. We study the theoretical aspects of order dependencies-and unidirectional order dependencies (UODs), a proper sub-class of ODs-which describe the relationships among lexicographical orderings of sets of tuples. We investigate the inference problem for order dependencies. We establish the following: (i) a sound and complete axiomatization for UODs which is sound for ODs; (ii) a hierarchy of order dependency classes; (iii) a proof of co-NP-completeness of the inference problem for ODs and for the subclass of UODs; (iv) a proof of co-NP-completeness of the inference problem of functional dependencies (FDs) from ODs in general, but demonstrate linear time complexity for the inference of FDs from UODs; (v) a sound and complete elimination procedure for testing logical implication over ODs; and (vi) a sound and complete polynomial inference algorithm for sets of UODs over natural domains

    Discovering Domain Orders through Order Dependencies

    Get PDF
    Most real-world data come with explicitly defined domain orders; e.g., lexicographic order for strings, numeric for integers, and chronological for time. Our goal is to discover implicit domain orders that we do not already know; for instance, that the order of months in the Chinese Lunar calendar is Corner < Apricot < Peach. To do so, we enhance data profiling methods by discovering implicit domain orders in data through order dependencies. We enumerate tractable special cases and proceed towards the most general case, which we prove is NP-complete. We then consider discovering approximate implicit orders; i.e., those that exist with some exceptions. We propose definitions of approximate implicit orders and prove that all non-trivial cases are NP-complete. We show that the NP-complete cases nevertheless can be effectively handled by a SAT solver. We also devise an interestingness measure to rank the discovered implicit domain orders. Based on an extensive suite of experiments with real-world data, we establish the efficacy of our algorithms, and the utility of the domain orders discovered by demonstrating significant added value in two applications (data profiling and data mining)

    The Completeness Problem of Ordered Relational Databases

    Get PDF
    Support of order in query processing is a crucial component in relational database systems, not only because the output of a query is often required to be sorted in a specific order, but also because employing order properties can significantly reduce the query execution cost. Therefore, finding an effective approach to answer queries over ordered data is important to the efficiency of query processing in relational databases. In this dissertation, an ordered relational database model is proposed, which captures both data tuples of relations and tuple ordering in relations. Based on this conceptual model, ordered relational queries are formally defined in a two-sorted first-order calculus, which serves as a yardstick to evaluate expressive power of other ordered query representations. The primary purpose of this dissertation is to investigate the expressive power of different ordered query representations. Particularly, the completeness problem of ordered relational algebras is studied with respect to the first-order calculus: does there exist an ordered algebra such that any first-order expressible ordered relational query can be expressed by a finite sequence of ordered operations? The significance of studying the completeness problem of ordered relational algebras is in that the completeness of ordered relational algebras leads to the possibility of implementing a finite set of ordered operators to express all first-order expressible ordered queries in relational databases. The dissertation then focuses on the completeness problem of ordered conjunctive queries. This investigation is performed in an incremental manner: first, the ordered conjunctive queries with data-decided order is considered; then, the ordered conjunctive queries with t-decided order is studied; finally, the completeness problem for the general ordered conjunctive queries is explored. The completeness theorem of ordered algebras is proven for all three classes of ordered conjunctive queries. Although this ordered relational database model is only conceptual, and ordered operators are not implemented in this dissertation, we do prove that a complete set of ordered operators exists to retrieve all first order expressible ordered queries in the three classes of ordered conjunctive queries. This research sheds light on the possibility of implementing a complete set of ordered operators in relational databases to solve the performance problem of order-relevant queries

    Attribute-Level Versioning: A Relational Mechanism for Version Storage and Retrieval

    Get PDF
    Data analysts today have at their disposal a seemingly endless supply of data and repositories hence, datasets from which to draw. New datasets become available daily thus making the choice of which dataset to use difficult. Furthermore, traditional data analysis has been conducted using structured data repositories such as relational database management systems (RDBMS). These systems, by their nature and design, prohibit duplication for indexed collections forcing analysts to choose one value for each of the available attributes for an item in the collection. Often analysts discover two or more datasets with information about the same entity. When combining this data and transforming it into a form that is usable in an RDBMS, analysts are forced to deconflict the collisions and choose a single value for each duplicated attribute containing differing values. This deconfliction is the source of a considerable amount of guesswork and speculation on the part of the analyst in the absence of professional intuition. One must consider what is lost by discarding those alternative values. Are there relationships between the conflicting datasets that have meaning? Is each dataset presenting a different and valid view of the entity or are the alternate values erroneous? If so, which values are erroneous? Is there a historical significance of the variances? The analysis of modern datasets requires the use of specialized algorithms and storage and retrieval mechanisms to identify, deconflict, and assimilate variances of attributes for each entity encountered. These variances, or versions of attribute values, contribute meaning to the evolution and analysis of the entity and its relationship to other entities. A new, distinct storage and retrieval mechanism will enable analysts to efficiently store, analyze, and retrieve the attribute versions without unnecessary complexity or additional alterations of the original or derived dataset schemas. This paper presents technologies and innovations that assist data analysts in discovering meaning within their data and preserving all of the original data for every entity in the RDBMS
    corecore