142 research outputs found

    Concepts and Techniques for Flexible and Effective Music Data Management

    Get PDF

    A Strategy for Reducing I/O and Improving Query Processing Time in an Oracle Data Warehouse Environment

    Get PDF
    In the current information age as the saying goes, time is money. For the modern information worker, decisions must often be made quickly. Every extra minute spent waiting for critical data could mean the difference between financial gain and financial ruin. Despite the importance of timely data retrieval, many organizations lack even a basic strategy for improving the performance of their data warehouse based reporting systems. This project explores the idea that a strategy making use of three database performance improvement techniques can reduce I/O (input/output operations) and improve query processing time in an information system designed for reporting. To demonstrate that these performance improvement goals can be achieved, queries were run on ordinary tables and then on tables utilizing the performance improvement techniques. The I/O statistics and processing times for the queries were compared to measure the amount of performance improvement. The measurements were also used to explain how these techniques may be more or less effective under certain circumstances, such as when a particular type of query is run. The collected I/O and time based measurements showed a varying degree of improvement for each technique based on the query used. A need to match the types of queries commonly run on the system to the performance improvement technique being implemented was found to be an important consideration. The results indicated that in a reporting environment these performance improvement techniques have the potential to reduce I/O and improve query performance


    Get PDF
    Data warehouses store large amounts of data usually accessed by complex decision making queries with many selection, join and aggregation operations. To optimize the performance of the data warehouse, the administrator has to make a physical design. During physical designphase, the Data Warehouse Administrator has to select some optimization techniques to speed up queries. He must make many choices as optimization techniques to perform,their selection algorithms, parametersof these algorithms and the attributes and tables used by some of these techniques. We describe in this paper the nature of the difficulties encountered by the administrator during physical design. We subsequently present a tool which helps the administrator to make the right choicesfor optimization. We demonstrate the interactive use of this tool using a relational data warehouse created and populated from the APB-1 Benchmark

    A Nine Month Progress Report on an Investigation into Mechanisms for Improving Triple Store Performance

    No full text
    This report considers the requirement for fast, efficient, and scalable triple stores as part of the effort to produce the Semantic Web. It summarises relevant information in the major background field of Database Management Systems (DBMS), and provides an overview of the techniques currently in use amongst the triple store community. The report concludes that for individuals and organisations to be willing to provide large amounts of information as openly-accessible nodes on the Semantic Web, storage and querying of the data must be cheaper and faster than it is currently. Experiences from the DBMS field can be used to maximise triple store performance, and suggestions are provided for lines of investigation in areas of storage, indexing, and query optimisation. Finally, work packages are provided describing expected timetables for further study of these topics

    PatchIndex: exploiting approximate constraints in distributed databases

    Get PDF
    Cloud data warehouse systems lower the barrier to access data analytics. These applications often lack a database administrator and integrate data from various sources, potentially leading to data not satisfying strict constraints. Automatic schema optimization in self-managing databases is difficult in these environments without prior data cleaning steps. In this paper, we focus on constraint discovery as a subtask of schema optimization. Perfect constraints might not exist in these unclean datasets due to a small set of values violating the constraints. Therefore, we introduce the concept of a generic PatchIndex structure, which handles exceptions to given constraints and enables database systems to define these approximate constraints. We apply the concept to the environment of distributed databases, providing parallel index creation approaches and optimization techniques for parallel queries using PatchIndexes. Furthermore, we describe heuristics for automatic discovery of PatchIndex candidate columns and prove the performance benefit of using PatchIndexes in our evaluation

    A Generalized Approach to Optimization of Relational Data Warehouses Using Hybrid Greedy and Genetic Algorithms

    Get PDF
    As far as we know, in the open scientific literature, there is no generalized framework for the optimization of relational data warehouses which includes view and index selection and vertical view fragmentation. In this paper we are offering such a framework. We propose a formalized multidimensional model, based on relational schemas, which provides complete vertical view fragmentation and presents an approach of the transformation of a fragmented snowflake schema to a defragmented star schema through the process of denormalization. We define the generalized system of relational data warehouses optimization by including vertical fragmentation of the implementation schema (F), indexes (I) and view selection (S) for materialization. We consider Genetic Algorithm as an optimization method and introduce the technique of "recessive bits" for handling the infeasible solutions that are obtained by a Genetic Algorithm. We also present two novel hybrid algorithms, i.e. they are combination of Greedy and Genetic Algorithms. Finally, we present our experimental results and show improvements of the performance and benefits of the generalized approach (SFI) and show that our novel algorithms significantly improve the efficiency of the optimization process for different input parameters

    Auto-administration des entrepôts de données complexes

    Get PDF
    National audienceLes requêtes définies sur les entrepôts de données sont souvent compliquées et utilisent plusieurs opérations de jointure qui sont coûteuses en terme de temps de calcul. Dans le cadre de l'entreposage de données complexes, les adaptations apportées aux schémas classiques d'entrepôts induisent des jointures supplémentaires lors des accès aux données. Ce coût devient encore plus important quand les requêtes opèrent sur de très grands volumes de données. Il est donc primordial de réduire ce temps de calcul. Pour cela, les administrateurs d'entrepôts de données utilisent en général des techniques d'indexation comme les index de jointure en étoile ou les index \textit{bitmap} de jointure. Cela demeure néanmoins complexe et fastidieux. La solution que nous proposons s'inscrit dans une optique d'auto-administration des entrepôts de données. Dans ce cadre, nous proposons une stratégie de sélection automatique d'index. Pour cela, nous avons recouru à une technique de fouille de données, plus particulièrement la recherche de motifs fréquents, pour déterminer un ensemble d'index candidats à partir d'une charge donnée. Nous proposons ensuite des modèles de coût permettant de sélectionner parmi les index ceux qui engendrent le meilleur profit. Ces modèles de coût évaluent en particulier le temps d'accès aux données à travers des index \textit{bitmap} de jointure, ainsi que le coût de maintenance et de stockage de ces index