40,965 research outputs found

    Efficient Incremental View Maintenance for Data Warehousing

    Get PDF
    Data warehousing and on-line analytical processing (OLAP) are essential elements for decision support applications. Since most OLAP queries are complex and are often executed over huge volumes of data, the solution in practice is to employ materialized views to improve query performance. One important issue for utilizing materialized views is to maintain the view consistency upon source changes. However, most prior work focused on simple SQL views with distributive aggregate functions, such as SUM and COUNT. This dissertation proposes to consider broader types of views than previous work. First, we study views with complex aggregate functions such as variance and regression. Such statistical functions are of great importance in practice. We propose a workarea function model and design a generic framework to tackle incremental view maintenance and answering queries using views for such functions. We have implemented this approach in a prototype system of IBM DB2. An extensive performance study shows significant performance gains by our techniques. Second, we consider materialized views with PIVOT and UNPIVOT operators. Such operators are widely used for OLAP applications and for querying sparse datasets. We demonstrate that the efficient maintenance of views with PIVOT and UNPIVOT operators requires more generalized operators, called GPIVOT and GUNPIVOT. We formally define and prove the query rewriting rules and propagation rules for such operators. We also design a novel view maintenance framework for applying these rules to obtain an efficient maintenance plan. Extensive performance evaluations reveal the effectiveness of our techniques. Third, materialized views are often integrated from multiple data sources. Due to source autonomicity and dynamicity, concurrency may occur during view maintenance. We propose a generic concurrency control framework to solve such maintenance anomalies. This solution extends previous work in that it solves the anomalies under both source data and schema changes and thus achieves full source autonomicity. We have implemented this technique in a data warehouse prototype developed at WPI. The extensive performance study shows that our techniques put little extra overhead on existing concurrent data update processing techniques while allowing for this new functionality

    Incremental Query Evaluation in a Ring of Databases

    Get PDF
    This article approaches the incremental view maintenance problem from an algebraic perspective. The algebraic structure of a ring of databases is constructed and extended to form a powerful aggregate query calculus. The query calculus inherits the key properties of rings, such as distributivity and the existence of an additive inverse. As a consequence, the calculus has a normal form of polynomials and is closed under a universal difference operator. This difference operator allows to express the so-called delta queries of the incremental view maintenance literature, but also deltas to the deltas (second deltas), deltas to second deltas (third deltas), and so on. The k-th delta of a query of polynomial degree k is purely a function of the update, not of the database. This gives rise to a multi-layered incremental view maintenance scheme in which a view is maintained using a hierarchy of auxiliary materialized views of k-th deltas. What is gained by this hierarchy is that the work required to keep all views fresh given an update is extremely simple. The method allows to eliminate expensive query operators such as joins and aggregate sums entirely from programs that perform incremental view maintenance. The main result is that, for non-nested queries, each individual aggregate value can be incrementally maintained using a constant amount of work. This is not possible for nonincremental evaluation and provides a complexity separation between the incremental and nonincremental query evaluation problems. As a byproduct, we obtain a query language that is significant in its own right. It is an algebraic language in which queries, like in relational calculus, are built up from base objects (generalized relations) using an extremely small set of connectives -- addition, multiplication, and aggregation. It is based on a family of algebraic structures developed in this article -- called avalanche (semi)rings -- which algebraizes range-restriction. Thus these structures guarantee finite query results in the presence of inequalities, without making use of an explicit selection operation. The entire language behaves like a polynomial ring of relations and thus makes algebraic manipulation very easy. As a simple algebraic language of interesting expressive power -- relational algebra with SQL-style aggregation and unlimited nesting -- it is a natural internal representation language for query processors and compilers

    Using Fuzzy Linguistic Representations to Provide Explanatory Semantics for Data Warehouses

    Get PDF
    A data warehouse integrates large amounts of extracted and summarized data from multiple sources for direct querying and analysis. While it provides decision makers with easy access to such historical and aggregate data, the real meaning of the data has been ignored. For example, "whether a total sales amount 1,000 items indicates a good or bad sales performance" is still unclear. From the decision makers' point of view, the semantics rather than raw numbers which convey the meaning of the data is very important. In this paper, we explore the use of fuzzy technology to provide this semantics for the summarizations and aggregates developed in data warehousing systems. A three layered data warehouse semantic model, consisting of quantitative (numerical) summarization, qualitative (categorical) summarization, and quantifier summarization, is proposed for capturing and explicating the semantics of warehoused data. Based on the model, several algebraic operators are defined. We also extend the SQL language to allow for flexible queries against such enhanced data warehouses

    Incremental View Maintenance For Collection Programming

    Get PDF
    In the context of incremental view maintenance (IVM), delta query derivation is an essential technique for speeding up the processing of large, dynamic datasets. The goal is to generate delta queries that, given a small change in the input, can update the materialized view more efficiently than via recomputation. In this work we propose the first solution for the efficient incrementalization of positive nested relational calculus (NRC+) on bags (with integer multiplicities). More precisely, we model the cost of NRC+ operators and classify queries as efficiently incrementalizable if their delta has a strictly lower cost than full re-evaluation. Then, we identify IncNRC+; a large fragment of NRC+ that is efficiently incrementalizable and we provide a semantics-preserving translation that takes any NRC+ query to a collection of IncNRC+ queries. Furthermore, we prove that incremental maintenance for NRC+ is within the complexity class NC0 and we showcase how recursive IVM, a technique that has provided significant speedups over traditional IVM in the case of flat queries [25], can also be applied to IncNRC+.Comment: 24 pages (12 pages plus appendix
    • …
    corecore