522 research outputs found

    Fundamentals and applications of order dependencies

    Get PDF
    Business-intelligence queries often involve SQL functions and algebraic expressions. There can be clear semantic relationships between a column's values and the values of a function over that column. A common property is monotonicity: as the column's values ascend, so do the function's values (or the other column's values). This we call an order dependency (OD). Queries can be evaluated more efficiently when the query optimizer uses order dependencies. They can be run even faster when the optimizer can also reason over known ODs to infer new ones. Order dependencies can be declared as integrity constraints, and they can be detected automatically for many types of SQL functions and algebraic expressions. We present optimization techniques using ODs for queries that involve join, order by, group by, partition by, and distinct. Essentially, ODs can further exploit interesting orders to eliminate or simplify potentially expensive sorts in the query plan. We evaluate these techniques over our prototype implementation in IBM® DB2® using the TPC-DS® benchmark schema and some customer inspired queries. Our experimental results demonstrate a significant performance gain. Dependencies have played an important role in database theory. We study the theoretical aspects of order dependencies-and unidirectional order dependencies (UODs), a proper sub-class of ODs-which describe the relationships among lexicographical orderings of sets of tuples. We investigate the inference problem for order dependencies. We establish the following: (i) a sound and complete axiomatization for UODs which is sound for ODs; (ii) a hierarchy of order dependency classes; (iii) a proof of co-NP-completeness of the inference problem for ODs and for the subclass of UODs; (iv) a proof of co-NP-completeness of the inference problem of functional dependencies (FDs) from ODs in general, but demonstrate linear time complexity for the inference of FDs from UODs; (v) a sound and complete elimination procedure for testing logical implication over ODs; and (vi) a sound and complete polynomial inference algorithm for sets of UODs over natural domains

    A Call to Arms: Revisiting Database Design

    Get PDF
    Good database design is crucial to obtain a sound, consistent database, and - in turn - good database design methodologies are the best way to achieve the right design. These methodologies are taught to most Computer Science undergraduates, as part of any Introduction to Database class. They can be considered part of the "canon", and indeed, the overall approach to database design has been unchanged for years. Moreover, none of the major database research assessments identify database design as a strategic research direction. Should we conclude that database design is a solved problem? Our thesis is that database design remains a critical unsolved problem. Hence, it should be the subject of more research. Our starting point is the observation that traditional database design is not used in practice - and if it were used it would result in designs that are not well adapted to current environments. In short, database design has failed to keep up with the times. In this paper, we put forth arguments to support our viewpoint, analyze the root causes of this situation and suggest some avenues of research.Comment: Removed spurious column break. Nothing else was change

    A systems thinking approach to business intelligence solutions based on cloud computing

    Get PDF
    Thesis (S.M. in System Design and Management)--Massachusetts Institute of Technology, Engineering Systems Division, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 73-74).Business intelligence is the set of tools, processes, practices and people that are used to take advantage of information to support decision making in the organizations. Cloud computing is a new paradigm for offering computing resources that work on demand, are scalable and are charged by the time they are used. Organizations can save large amounts of money and effort using this approach. This document identifies the main challenges companies encounter while working on business intelligence applications in the cloud, such as security, availability, performance, integration, regulatory issues, and constraints on network bandwidth. All these challenges are addressed with a systems thinking approach, and several solutions are offered that can be applied according to the organization's needs. An evaluations of the main vendors of cloud computing technology is presented, so that business intelligence developers identify the available tools and companies they can depend on to migrate or build applications in the cloud. It is demonstrated how business intelligence applications can increase their availability with a cloud computing approach, by decreasing the mean time to recovery (handled by the cloud service provider) and increasing the mean time to failure (achieved by the introduction of more redundancy on the hardware). Innovative mechanisms are discussed in order to improve cloud applications, such as private, public and hybrid clouds, column-oriented databases, in-memory databases and the Data Warehouse 2.0 architecture. Finally, it is shown how the project management for a business intelligence application can be facilitated with a cloud computing approach. Design structure matrices are dramatically simplified by avoiding unnecessary iterations while sizing, validating, and testing hardware and software resources.by Eumir P. Reyes.S.M.in System Design and Managemen

    Adaptive query parallelization in multi-core column stores

    Get PDF
    With the rise of multi-core CPU platforms, their optimal utilization for in-memory OLAP workloads using column store databases has become one of the biggest challenges. Some of the inherent limi- tations in the achievable query parallelism are due to the degree of parallelism dependency on the data skew, the overheads incurred by thread coordination, and the hardware resource limits. Finding the right balance between the degree of parallelism and the multi-core utilizati
    • …
    corecore