380 research outputs found

    Detecting Redundancy in Data Warehouse Evolution

    Full text link

    Data warehouse stream view update with multiple streaming.

    Get PDF
    The main objective of data warehousing is to store information representing an integration of base data from single or multiple data sources over an extended period of time. To provide fast access to the data, regardless of the availability of the data source, data warehouses often use materialized views. Materialized views are able to provide aggregation on some attributes to help Decision Support Systems. Updating materialized views in response to modifications in the base data is called materialized view maintenance. In some applications, for example, the stock market and banking systems, the source data is updated so frequently that we can consider them as a continuous stream of data. To keep the materialized view updated with respect to changes in the base tables in a traditional way will cause query response times to increase. This thesis proposes a new view maintenance algorithm for multiple streaming which improves semi-join methods and hash filter methods. Our proposed algorithm is able to update a view which joins two base tables where both of the base tables are in the form of data streams (always changing). By using a timestamp, building updategrams in parallel and by optimizing the joining cost between two data sources it can reduce the query response time or execution time significantly.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .A336. Source: Masters Abstracts International, Volume: 44-03, page: 1391. Thesis (M.Sc.)--University of Windsor (Canada), 2005

    Data warehouse stream view update with hash filter.

    Get PDF
    A data warehouse usually contains large amounts of information representing an integration of base data from one or more external data sources over a long period of time to provide fast-query response time. It stores materialized views which provide aggregation (SUM, MIX, MIN, COUNT and AVG) on some measure attributes of interest for data warehouse users. The process of updating materialized views in response to the modification of the base data is called materialized view maintenance. Some data warehouse application domains, like stock markets, credit cards, automated banking and web log domains depend on data sources updated as continuous streams of data. In particular, electronic stock trading markets such as the NASDAQ, generate large volumes of data, in bursts that are up to 4,200 messages per second. This thesis proposes a new view maintenance algorithm (StreamVup), which improves on semi join methods by using hash filters. The new algorithm first, reduce the amount of bytes transported through the network for streams tuples, and secondly reduces the cost of join operations during view update by eliminating the recompution of view updates caused by newly arriving duplicate tuples. (Abstract shortened by UMI.)Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .I85. Source: Masters Abstracts International, Volume: 42-05, page: 1753. Adviser: C. I. Ezeife. Thesis (M.Sc.)--University of Windsor (Canada), 2003

    Automatic physical database design : recommending materialized views

    Get PDF
    This work discusses physical database design while focusing on the problem of selecting materialized views for improving the performance of a database system. We first address the satisfiability and implication problems for mixed arithmetic constraints. The results are used to support the construction of a search space for view selection problems. We proposed an approach for constructing a search space based on identifying maximum commonalities among queries and on rewriting queries using views. These commonalities are used to define candidate views for materialization from which an optimal or near-optimal set can be chosen as a solution to the view selection problem. Using a search space constructed this way, we address a specific instance of the view selection problem that aims at minimizing the view maintenance cost of multiple materialized views using multi-query optimization techniques. Further, we study this same problem in the context of a commercial database management system in the presence of memory and time restrictions. We also suggest a heuristic approach for maintaining the views while guaranteeing that the restrictions are satisfied. Finally, we consider a dynamic version of the view selection problem where the workload is a sequence of query and update statements. In this case, the views can be created (materialized) and dropped during the execution of the workload. We have implemented our approaches to the dynamic view selection problem and performed extensive experimental testing. Our experiments show that our approaches perform in most cases better than previous ones in terms of effectiveness and efficiency

    The XFM view adaptation mechanism: An essential component for XML data warehouses

    Get PDF
    In the past few years, with many organisations providing web services for business and communication purposes, large volumes of XML transactions take place on a daily basis. In many cases, organisations maintain these transactions in their native XML format due to its flexibility for xchanging data between heterogeneous systems. This XML data provides an important resource for decision support systems. As a consequence, XML technology has slowly been included within decision support systems of data warehouse systems. The problem encountered is that existing native XML database systems suffer from poor performance in terms of managing data volume and response time for complex analytical queries. Although materialised XML views can be used to improve the performance for XML data warehouses, update problems then become the bottleneck of using materialised views. Specifically, synchronising materialised views in the face of changing view definitions, remains a significant issue. In this dissertation, we provide a method for XML-based data warehouses to manage updates caused by the change of view definitions (view redefinitions), which is referred to as the view adaptation problem. In our approach, views are defined using XPath and then modelled using a set of novel algebraic operators and fragments. XPath views are integrated into a single view graph called the XML Fragment Materialisation (XFM) View Graph, where common parts between different views are shared and appear only once in the graph. Fragments within the view graph can be selected for materialisation to facilitate the view adaptation process. While changes are applied, our view adaptation algorithms can quickly determine what part of the XFM view graph is affected. The adaptation algorithms then perform a structural adaptation to update the view graph, followed by data adaptation to update materialised fragments

    A Strategy for Reducing I/O and Improving Query Processing Time in an Oracle Data Warehouse Environment

    Get PDF
    In the current information age as the saying goes, time is money. For the modern information worker, decisions must often be made quickly. Every extra minute spent waiting for critical data could mean the difference between financial gain and financial ruin. Despite the importance of timely data retrieval, many organizations lack even a basic strategy for improving the performance of their data warehouse based reporting systems. This project explores the idea that a strategy making use of three database performance improvement techniques can reduce I/O (input/output operations) and improve query processing time in an information system designed for reporting. To demonstrate that these performance improvement goals can be achieved, queries were run on ordinary tables and then on tables utilizing the performance improvement techniques. The I/O statistics and processing times for the queries were compared to measure the amount of performance improvement. The measurements were also used to explain how these techniques may be more or less effective under certain circumstances, such as when a particular type of query is run. The collected I/O and time based measurements showed a varying degree of improvement for each technique based on the query used. A need to match the types of queries commonly run on the system to the performance improvement technique being implemented was found to be an important consideration. The results indicated that in a reporting environment these performance improvement techniques have the potential to reduce I/O and improve query performance

    10381 Summary and Abstracts Collection -- Robust Query Processing

    Get PDF
    Dagstuhl seminar 10381 on robust query processing (held 19.09.10 - 24.09.10) brought together a diverse set of researchers and practitioners with a broad range of expertise for the purpose of fostering discussion and collaboration regarding causes, opportunities, and solutions for achieving robust query processing. The seminar strove to build a unified view across the loosely-coupled system components responsible for the various stages of database query processing. Participants were chosen for their experience with database query processing and, where possible, their prior work in academic research or in product development towards robustness in database query processing. In order to pave the way to motivate, measure, and protect future advances in robust query processing, seminar 10381 focused on developing tests for measuring the robustness of query processing. In these proceedings, we first review the seminar topics, goals, and results, then present abstracts or notes of some of the seminar break-out sessions. We also include, as an appendix, the robust query processing reading list that was collected and distributed to participants before the seminar began, as well as summaries of a few of those papers that were contributed by some participants

    Emerging model spedies driven by transciptomics

    Get PDF
    This work is focused on 'emerging model species', i.e. question-driven model species which have sufficient molecular resources to investigate a specific phenomenon in molecular biology, developmental biology, molecular ecology and evolution or related molecular fields. This thesis shows how transcriptomic data can be generated, analyzed, and used to investigate such phenomena of interest even in species lacking a reference genome. The initial ButterflyBase resource has proven to be useful to researchers of species without a reference genome but is limited to the Lepidoptera and supports only the older Sanger sequencing technologies. Thanks to Next Generation Sequencing, transcriptome sequencing is more cost effective but the bottleneck of transcriptomic projects is now the bioinformatic analysis and data mining/dissemination. Therefore, this work continues with presenting novel and innovative approaches which effectively overcome this bottleneck. The est2assembly software produces deeply annotated reference transcriptomes stored in the Chado database. The Drupal Bioinformatic Server Framework and genes4all provide species-neutral and an innovative approach in building standardized online databases and associated web services. All public insect mRNA data were analyzed with est2assembly and genes4all to produce the InsectaCentral. With InsectaCentral, a powerful resource is now available to assist molecular biology in any question-driven model insect species. The software presented here was developed according to specifications of the General Model Organism Database (GMOD) community. All software specifications are species-neutral and can be seamlessly deployed to assist any research community. Further through a case studies chapter, it becomes apparent that the transcriptomic approach is more cost-effective than a genomic approach and therefore sequence-driven evolutionary biology will benefit faster with this field

    Efficient Process Data Warehousing

    Get PDF
    This dissertation presents a data processing architecture for efficient data warehousing from historical data sources. The present work has three primary contributions. The first contribution is the development of a generalized process data warehousing (PDW) architecture that includes multilayer data processing steps to transform raw data streams into useful information that facilitates data-driven decision making. The second contribution is exploring the applicability of the proposed architecture to the case of sparse process data. We have tested the proposed approach in a medical monitoring system, which takes physiological data and predicts the clinical setting in which the data is most likely to be seen. We have performed a set of experiments with real clinical data (from Children’s Hospital of Pittsburgh) that demonstrate the high utility of the present approach. The third contribution is exploring the applicability of the proposed PDW architecture to the case of redundant process data. We have designed and developed a conflict-aware data fusion strategy for the efficient aggregation of historical data. We have elaborated a simulation-based study of the tradeoffs between the data fusion solutions and data accuracy, and have also evaluated the solutions to a large-scale integrated framework (Tycho data) that includes historical data from heterogeneous sources in different subject areas. Finally, we propose and have evaluated a state sequence recovery (SSR) framework, which integrates work from two previous studies, which are both sparse and redundant studies. Our experimental results are based on several algorithms that have been developed and tested in different simulation set-up scenarios under both normal and exponential data distributions
    corecore