522 research outputs found
Fundamentals and applications of order dependencies
Business-intelligence queries often involve SQL functions and algebraic expressions. There can be clear semantic relationships between a column's values and the values of a function over that column. A common property is monotonicity: as the column's values ascend, so do the function's values (or the other column's values). This we call an order dependency (OD). Queries can be evaluated more efficiently when the query optimizer uses order dependencies. They can be run even faster when the optimizer can also reason over known ODs to infer new ones.
Order dependencies can be declared as integrity constraints, and they can be detected automatically for many types of SQL functions and algebraic expressions. We present optimization techniques using ODs for queries that involve join, order by, group by, partition by, and distinct. Essentially, ODs can further exploit interesting orders to eliminate or simplify potentially expensive sorts in the query plan. We evaluate these techniques over our prototype implementation in IBM® DB2® using the TPC-DS® benchmark schema and some customer inspired queries. Our experimental results demonstrate a significant performance gain.
Dependencies have played an important role in database theory. We study the theoretical aspects of order dependencies-and unidirectional order dependencies (UODs), a proper sub-class of ODs-which describe the relationships among lexicographical orderings of sets of tuples. We investigate the inference problem for order dependencies. We establish the following: (i) a sound and complete axiomatization for UODs which is sound for ODs; (ii) a hierarchy of order dependency classes; (iii) a proof of co-NP-completeness of the inference problem for ODs and for the subclass of UODs; (iv) a proof of co-NP-completeness of the inference problem of functional dependencies (FDs) from ODs in general, but demonstrate linear time complexity for the inference of FDs from UODs; (v) a sound and complete elimination procedure for testing logical implication over ODs; and (vi) a sound and complete polynomial inference algorithm for sets of UODs over natural domains
A Call to Arms: Revisiting Database Design
Good database design is crucial to obtain a sound, consistent database, and -
in turn - good database design methodologies are the best way to achieve the
right design. These methodologies are taught to most Computer Science
undergraduates, as part of any Introduction to Database class. They can be
considered part of the "canon", and indeed, the overall approach to database
design has been unchanged for years. Moreover, none of the major database
research assessments identify database design as a strategic research
direction.
Should we conclude that database design is a solved problem?
Our thesis is that database design remains a critical unsolved problem.
Hence, it should be the subject of more research. Our starting point is the
observation that traditional database design is not used in practice - and if
it were used it would result in designs that are not well adapted to current
environments. In short, database design has failed to keep up with the times.
In this paper, we put forth arguments to support our viewpoint, analyze the
root causes of this situation and suggest some avenues of research.Comment: Removed spurious column break. Nothing else was change
Recommended from our members
Integration with Ontologies
One of today’s hottest IT topics is integration, as bringing together information from different sources and structures is not completely solved. The approach outlined here wants to illustrate how ontologies [Gr93] could help to support the integration process
A systems thinking approach to business intelligence solutions based on cloud computing
Thesis (S.M. in System Design and Management)--Massachusetts Institute of Technology, Engineering Systems Division, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 73-74).Business intelligence is the set of tools, processes, practices and people that are used to take advantage of information to support decision making in the organizations. Cloud computing is a new paradigm for offering computing resources that work on demand, are scalable and are charged by the time they are used. Organizations can save large amounts of money and effort using this approach. This document identifies the main challenges companies encounter while working on business intelligence applications in the cloud, such as security, availability, performance, integration, regulatory issues, and constraints on network bandwidth. All these challenges are addressed with a systems thinking approach, and several solutions are offered that can be applied according to the organization's needs. An evaluations of the main vendors of cloud computing technology is presented, so that business intelligence developers identify the available tools and companies they can depend on to migrate or build applications in the cloud. It is demonstrated how business intelligence applications can increase their availability with a cloud computing approach, by decreasing the mean time to recovery (handled by the cloud service provider) and increasing the mean time to failure (achieved by the introduction of more redundancy on the hardware). Innovative mechanisms are discussed in order to improve cloud applications, such as private, public and hybrid clouds, column-oriented databases, in-memory databases and the Data Warehouse 2.0 architecture. Finally, it is shown how the project management for a business intelligence application can be facilitated with a cloud computing approach. Design structure matrices are dramatically simplified by avoiding unnecessary iterations while sizing, validating, and testing hardware and software resources.by Eumir P. Reyes.S.M.in System Design and Managemen
Adaptive query parallelization in multi-core column stores
With the rise of multi-core CPU platforms, their optimal utilization
for in-memory OLAP workloads using column store databases has
become one of the biggest challenges. Some of the inherent limi-
tations in the achievable query parallelism are due to the degree of
parallelism dependency on the data skew, the overheads incurred by
thread coordination, and the hardware resource limits. Finding the
right balance between the degree of parallelism and the multi-core
utilizati
- …