72 research outputs found

    The Design of a Web Snapshot Management System for Decision Support Applications

    Get PDF
    Database snapshots that are defined and/or delivered via the Web are called web snapshots. This paper addresses the requirements for web snapshot management. A web snapshot management system is proposed; its architecture and functions of the major components are described; new commands are defined to perform web snapshot management activities

    Data warehouse stream view update with hash filter.

    Get PDF
    A data warehouse usually contains large amounts of information representing an integration of base data from one or more external data sources over a long period of time to provide fast-query response time. It stores materialized views which provide aggregation (SUM, MIX, MIN, COUNT and AVG) on some measure attributes of interest for data warehouse users. The process of updating materialized views in response to the modification of the base data is called materialized view maintenance. Some data warehouse application domains, like stock markets, credit cards, automated banking and web log domains depend on data sources updated as continuous streams of data. In particular, electronic stock trading markets such as the NASDAQ, generate large volumes of data, in bursts that are up to 4,200 messages per second. This thesis proposes a new view maintenance algorithm (StreamVup), which improves on semi join methods by using hash filters. The new algorithm first, reduce the amount of bytes transported through the network for streams tuples, and secondly reduces the cost of join operations during view update by eliminating the recompution of view updates caused by newly arriving duplicate tuples. (Abstract shortened by UMI.)Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .I85. Source: Masters Abstracts International, Volume: 42-05, page: 1753. Adviser: C. I. Ezeife. Thesis (M.Sc.)--University of Windsor (Canada), 2003

    Data warehouse stream view update with multiple streaming.

    Get PDF
    The main objective of data warehousing is to store information representing an integration of base data from single or multiple data sources over an extended period of time. To provide fast access to the data, regardless of the availability of the data source, data warehouses often use materialized views. Materialized views are able to provide aggregation on some attributes to help Decision Support Systems. Updating materialized views in response to modifications in the base data is called materialized view maintenance. In some applications, for example, the stock market and banking systems, the source data is updated so frequently that we can consider them as a continuous stream of data. To keep the materialized view updated with respect to changes in the base tables in a traditional way will cause query response times to increase. This thesis proposes a new view maintenance algorithm for multiple streaming which improves semi-join methods and hash filter methods. Our proposed algorithm is able to update a view which joins two base tables where both of the base tables are in the form of data streams (always changing). By using a timestamp, building updategrams in parallel and by optimizing the joining cost between two data sources it can reduce the query response time or execution time significantly.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .A336. Source: Masters Abstracts International, Volume: 44-03, page: 1391. Thesis (M.Sc.)--University of Windsor (Canada), 2005

    Deferred Maintenance of Disk-Based Random Samples

    Get PDF
    Random sampling is a well-known technique for approximate processing of large datasets. We introduce a set of algorithms for incremental maintenance of large random samples on secondary storage. We show that the sample maintenance cost can be reduced by refreshing the sample in a deferred manner. We introduce a novel type of log file which follows the intuition that only a “sample” of the operations on the base data has to be considered to maintain a random sample in a statistically correct way. Additionally, we develop a deferred refresh algorithm which updates the sample by using fast sequential disk access only, and which does not require any main memory. We conducted an extensive set of experiments and found, that our algorithms reduce maintenance cost by several orders of magnitude

    Efficient Incremental View Maintenance for Data Warehousing

    Get PDF
    Data warehousing and on-line analytical processing (OLAP) are essential elements for decision support applications. Since most OLAP queries are complex and are often executed over huge volumes of data, the solution in practice is to employ materialized views to improve query performance. One important issue for utilizing materialized views is to maintain the view consistency upon source changes. However, most prior work focused on simple SQL views with distributive aggregate functions, such as SUM and COUNT. This dissertation proposes to consider broader types of views than previous work. First, we study views with complex aggregate functions such as variance and regression. Such statistical functions are of great importance in practice. We propose a workarea function model and design a generic framework to tackle incremental view maintenance and answering queries using views for such functions. We have implemented this approach in a prototype system of IBM DB2. An extensive performance study shows significant performance gains by our techniques. Second, we consider materialized views with PIVOT and UNPIVOT operators. Such operators are widely used for OLAP applications and for querying sparse datasets. We demonstrate that the efficient maintenance of views with PIVOT and UNPIVOT operators requires more generalized operators, called GPIVOT and GUNPIVOT. We formally define and prove the query rewriting rules and propagation rules for such operators. We also design a novel view maintenance framework for applying these rules to obtain an efficient maintenance plan. Extensive performance evaluations reveal the effectiveness of our techniques. Third, materialized views are often integrated from multiple data sources. Due to source autonomicity and dynamicity, concurrency may occur during view maintenance. We propose a generic concurrency control framework to solve such maintenance anomalies. This solution extends previous work in that it solves the anomalies under both source data and schema changes and thus achieves full source autonomicity. We have implemented this technique in a data warehouse prototype developed at WPI. The extensive performance study shows that our techniques put little extra overhead on existing concurrent data update processing techniques while allowing for this new functionality

    Self Maintenance of Materialized XQuery Views via Query Containment and Re-Writing

    Get PDF
    In recent years XML, the eXtensible Markup Language has become the de-facto standard for publishing and exchanging information on the web and in enterprise data integration systems. Materialized views are often used in information integration systems to present a unified schema for efficient querying of distributed and possibly heterogenous data sources. On similar lines, ACE-XQ, an XQuery based semantic caching system shows the significant performance gains achieved by caching query results (as materialized views) and using these materialized views along with query containment techniques for answering future queries over distributed XML data sources. To keep data in these materialized views of ACE-XQ up-to-date, the view must be maintained i.e. whenever the base data changes, the corresponding cached data in the materialized view must also be updated. This thesis builds on the query containment ideas of ACE-XQ and proposes an efficient approach for self-maintenance of materialized views. Our experimental results illustrate the significant performance improvement achieved by this strategy over view re-computation for a variety of situations

    A solution for synchronous incremental maintenance of materialized views based on SQL recursive query

    Get PDF
    Materialized views are excessively stored query execution results in the database. They can be used to partially or completely answer queries which will be further appeared instead of re-executing query from the scratch. There is a large number of published works that address the maintenance, especially incremental update, of materialized views and query rewriting for using those ones. Some of them support materialized views based on recursive query in datalog language. Although most of datalog queries can be transferred into SQL queries and vise versa but it is not the case for recursive queries. Recursive queries in the data log try to find all possible transitive closures. Recursive queries in SQL (Common Table Expression – CTE) return direct links but not transitive closures. In this paper, we propose efficient methods for incremental update of materialized views based on CTE; and then propose an algorithm for generating source codes in C language for any input SQL recursive queries. The synthesized source codes implement our proposed incremental update algorithms according to inserted/deleted/updated record set in the base tables. This paper focuses mainly on the recursive queries whose execution results are directed tree-structured data. The two cases of tree node are considered. In the first case, a child node has only one parent node and in the second case, a child node can have many parent nodes. Those two cases represent the two types of relationships between entities in real world, that are one–to–many and many–to–many, respectively. For the one–to–many relationships, the relationship data is accompanied with the records describing the child using some fields. Those fields are set as null in deleting a concrete relationship. For the many–to–many relationships, it is stored in a separate table and the concrete relationships are removed by deleting describing records from that table. Considering of enforcing referential integrity may help to reduce the searching space and therefore, help to improve the performance. However, the set of tree nodes or tree edges can be manipulated. All those combinations lead to different algorithms. The experimental results are provided and discussed to confirm the effectiveness of our proposed method

    A solution for synchronous incremental maintenance of materialized views based on SQL recursive query

    Get PDF
    Materialized views are excessively stored query execution results in the database. They can be used to partially or completely answer queries which will be further appeared instead of re-executing query from the scratch. There is a large number of published works that address the maintenance, especially incremental update, of materialized views and query rewriting for using those ones. Some of them support materialized views based on recursive query in datalog language. Although most of datalog queries can be transferred into SQL queries and vise versa but it is not the case for recursive queries. Recursive queries in the data log try to find all possible transitive closures. Recursive queries in SQL (Common Table Expression – CTE) return direct links but not transitive closures. In this paper, we propose efficient methods for incremental update of materialized views based on CTE; and then propose an algorithm for generating source codes in C language for any input SQL recursive queries. The synthesized source codes implement our proposed incremental update algorithms according to inserted/deleted/updated record set in the base tables. This paper focuses mainly on the recursive queries whose execution results are directed tree-structured data. The two cases of tree node are considered. In the first case, a child node has only one parent node and in the second case, a child node can have many parent nodes. Those two cases represent the two types of relationships between entities in real world, that are one–to–many and many–to–many, respectively. For the one–to–many relationships, the relationship data is accompanied with the records describing the child using some fields. Those fields are set as null in deleting a concrete relationship. For the many–to–many relationships, it is stored in a separate table and the concrete relationships are removed by deleting describing records from that table. Considering of enforcing referential integrity may help to reduce the searching space and therefore, help to improve the performance. However, the set of tree nodes or tree edges can be manipulated. All those combinations lead to different algorithms. The experimental results are provided and discussed to confirm the effectiveness of our proposed method

    Algebraic incremental maintenance of XML views

    Get PDF
    International audienceMaterialized views can bring important performance benefits when querying XML documents. In the presence of XML document changes, materialized views need to be updated to faithfully reflect the changed document. In this work, we present an algebraic approach for propagating source updates to XML materialized views expressed in a powerful XML tree pattern formalism. Our approach differs from the state of the art in the area in two important ways. First, it relies on set-oriented, algebraic operations, to be contrasted with node-based previous approaches. Second, it exploits state-of-the-art features of XML stores and XML query evaluation engines, notably XML structural identifiers and associated structural join algorithms. We present algorithms for determining how updates should be propagated to views, and highlight the benefits of our approach over existing algorithms through a series of experiments

    Declarative Ajax Web Applications through SQL++ on a Unified Application State

    Full text link
    Implementing even a conceptually simple web application requires an inordinate amount of time. FORWARD addresses three problems that reduce developer productivity: (a) Impedance mismatch across the multiple languages used at different tiers of the application architecture. (b) Distributed data access across the multiple data sources of the application (SQL database, user input of the browser page, session data in the application server, etc). (c) Asynchronous, incremental modification of the pages, as performed by Ajax actions. FORWARD belongs to a novel family of web application frameworks that attack impedance mismatch by offering a single unifying language. FORWARD's language is SQL++, a minimally extended SQL. FORWARD's architecture is based on two novel cornerstones: (a) A Unified Application State (UAS), which is a virtual database over the multiple data sources. The UAS is accessed via distributed SQL++ queries, therefore resolving the distributed data access problem. (b) Declarative page specifications, which treat the data displayed by pages as rendered SQL++ page queries. The resulting pages are automatically incrementally modified by FORWARD. User input on the page becomes part of the UAS. We show that SQL++ captures the semi-structured nature of web pages and subsumes the data models of two important data sources of the UAS: SQL databases and JavaScript components. We show that simple markup is sufficient for creating Ajax displays and for modeling user input on the page as UAS data sources. Finally, we discuss the page specification syntax and semantics that are needed in order to avoid race conditions and conflicts between the user input and the automated Ajax page modifications. FORWARD has been used in the development of eight commercial and academic applications. An alpha-release web-based IDE (itself built in FORWARD) enables development in the cloud.Comment: Proceedings of the 14th International Symposium on Database Programming Languages (DBPL 2013), August 30, 2013, Riva del Garda, Trento, Ital
    corecore