Search CORE

40 research outputs found

Heuristic Algorithms for Designing a Data Warehouse with SPJ Views

Author: N. Roussopoulos
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

An overview of data warehouse design approaches and tecbniques

Author: Gutiérrez Alejandro
Marotta Adriana
Publication venue: UR. FI – INCO.
Publication date
Field of study

A Data Warehouse (DW) is a database that stores information oriented to satisfy decision-making requests. It ia a database with some particular features concerning the data it contains and its utilisation. The features of DWs cause the DW design process and strategies to be different frome the ones for OLTP Systems. This work presents a brief description of different approaches and techniques that address the DW design problem

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Automatic physical database design : recommending materialized views

Author: Xu Wugang
Publication venue: Digital Commons @ NJIT
Publication date: 31/08/2007
Field of study

This work discusses physical database design while focusing on the problem of selecting materialized views for improving the performance of a database system. We first address the satisfiability and implication problems for mixed arithmetic constraints. The results are used to support the construction of a search space for view selection problems. We proposed an approach for constructing a search space based on identifying maximum commonalities among queries and on rewriting queries using views. These commonalities are used to define candidate views for materialization from which an optimal or near-optimal set can be chosen as a solution to the view selection problem. Using a search space constructed this way, we address a specific instance of the view selection problem that aims at minimizing the view maintenance cost of multiple materialized views using multi-query optimization techniques. Further, we study this same problem in the context of a commercial database management system in the presence of memory and time restrictions. We also suggest a heuristic approach for maintaining the views while guaranteeing that the restrictions are satisfied. Finally, we consider a dynamic version of the view selection problem where the workload is a sequence of query and update statements. In this case, the views can be created (materialized) and dropped during the execution of the workload. We have implemented our approaches to the dynamic view selection problem and performed extensive experimental testing. Our experiments show that our approaches perform in most cases better than previous ones in terms of effectiveness and efficiency

Digital Commons @ New Jersey Institute of Technology (NJIT)

Una plataforma basada en metadata para cálculo de vistas en sistemas de información multi-fuentes

Author: Peralta Verónika
Publication venue
Publication date: 29/10/2012
Field of study

Un Sistema de Información Multi- Fuente (MSIS) se compone de un conjunto de fuentes de datos independientes y un conjunto de vistas o consultas que definen los requerimientos de los usuarios. Sus diferencias con los sistemas de información clásicos introducen nuevas actividades de diseño y motiva el desarrollo de nuevas técnicas. En este artículo estudiamos un caso particular de un MSIS: un Data Warehouse (DW) y proponemos un meta-modelo para representar su metadata desde dos puntos de vistas: la representación de los esquemas y las relaciones inter-esquema que permiten calcular una vista a partir de los datos fuentes. El meta-modelo es el centro de una plataforma general para desarrollo de MSIS. La plataforma permite la fácil integración de herramientas de diseño y mantenimiento a través de un modelo de datos común que centraliza el flujo de datos y las rutinas de control de integridad entre las herramientas.Eje: Bases de DatosRed de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Automatic assistants for database exploration

Author: Sellam T.H.J. (Thibault)
Publication venue
Publication date: 03/11/2016
Field of study

CWI's Institutional Repository

Metadata-Aware Query Processing over Data Streams

Author: Ding Luping
Publication venue: Digital WPI
Publication date: 22/04/2008
Field of study

Many modern applications need to process queries over potentially infinite data streams to provide answers in real-time. This dissertation proposes novel techniques to optimize CPU and memory utilization in stream processing by exploiting metadata on streaming data or queries. It focuses on four topics: 1) exploiting stream metadata to optimize SPJ query operators via operator configuration, 2) exploiting stream metadata to optimize SPJ query plans via query-rewriting, 3) exploiting workload metadata to optimize parameterized queries via indexing, and 4) exploiting event constraints to optimize event stream processing via run-time early termination. The first part of this dissertation proposes algorithms for one of the most common and expensive query operators, namely join, to at runtime identify and purge no-longer-needed data from the state based on punctuations. Exploitations of the combination of punctuation and commonly-used window constraints are also studied. Extensive experimental evaluations demonstrate both reduction on memory usage and improvements on execution time due to the proposed strategies. The second part proposes herald-driven runtime query plan optimization techniques. We identify four query optimization techniques, design a lightweight algorithm to efficiently detect the optimization opportunities at runtime upon receiving heralds. We propose a novel execution paradigm to support multiple concurrent logical plans by maintaining one physical plan. Extensive experimental study confirms that our techniques significantly reduce query execution times. The third part deals with the shared execution of parameterized queries instantiated from a query template. We design a lightweight index mechanism to provide multiple access paths to data to facilitate a wide range of parameterized queries. To withstand workload fluctuations, we propose an index tuning framework to tune the index configurations in a timely manner. Extensive experimental evaluations demonstrate the effectiveness of the proposed strategies. The last part proposes event query optimization techniques by exploiting event constraints such as exclusiveness or ordering relationships among events extracted from workflows. Significant performance gains are shown to be achieved by our proposed constraint-aware event processing techniques

DigitalCommons@WPI

Delta-based Storage and Querying for Versioned Datasets

Author: Chavan Amit
Publication venue
Publication date: 01/01/2018
Field of study

Data-driven methods and products are becoming increasingly common in a variety of communities, leading to a huge diversity of datasets being continuously generated, modified, and analyzed. An increasingly important consideration for the underlying data management systems is that, all of these datasets and their versions over time need to be stored and queried for a variety of reasons including, but not limited to, reproducibility, collaboration, provenance, auditing, introspective analysis, and backups. However, most solutions today resort to highly ad hoc and manual version management and sharing techniques, that leads to friction when managing collaborative data science workflows, while also introducing inefficiencies. In this dissertation, we introduce a framework for dataset version management, and address the systems building, operator design, and optimization challenges involved in building a dataset version control system. We describe the various challenges and solutions in the context of our system, called DEX, that we have developed to support increasingly complex version management tasks. We show how to use delta-encoding, a key component in managing redundancy, to provide efficient storage and retrieval for the thousands of dataset versions, and develop a formalism to understand the various trade-offs in a principled manner. We study the storage--recreation trade-off in detail and provide a suite of inexpensive heuristics to obtain high-quality solutions under different settings. In order to provide a rich interface to specify version management tasks, we design a new query language, called VQUEL, with the ability to query dataset versions and provenance in a unified manner. We study how assumptions on the delta format can help in the design of a logical algebra, which we then use to execute increasingly complex queries efficiently. A key characteristic of our query execution methods is that the computational cost is primarily dependent on the size and the number of deltas in the expression (typically small), and not the input dataset versions (which can be very large). Finally, we demonstrate the effectiveness of our developed techniques by extensive evaluation of DEX on a mixture of real-world and synthetic datasets

Digital Repository at the University of Maryland

i3MAGE: Incremental, Interactive, Inter-Model Mapping Generation

Author: Pinkel Christoph
Publication venue
Publication date: 01/01/2016
Field of study

Data integration is a highly important prerequisite for most enterprise data analyses. While hard in general, a particular concern is about human effort for designing a global integration schema, authoring queries against that schema, and creating mappings to connect data sources with the global schema. Ontology-based data integration (OBDI), which employs ontologies as a target model, reduces the effort for schema design and usage. On the other side, it requires mappings that are particularly difficult to create. Architects who work with OBDI hence need systems to support the process of mapping development. One key type of tooling to support mapping development is automatic or semi-automatic generation of mapping suggestions. While many such tools exist in the wider sphere of data integration, few are built to work in the case of OBDI, where the inter-model gap between relational input schemata and a target ontology has to be bridged. Among those that support OBDI at all, none so far are fully optimized for this specific case by performing a truly inter-model matching while also leveraging distinct but corresponding aspects of both models. We propose i3MAGE, an approach and a system for automatic and semi-automatic generation of mappings in OBDI. The system is built on generic inter-model matching, and it is optimized in various ways for matching relational source schemata to target ontology schemata. To be truly semi-automatic in every respect, i3MAGE works both incrementally, building mappings pay-as-you-go, and interactively in exchange with a human user. We introduce a specialized benchmark and evaluate i3MAGE against a number of other approaches. In addition, we provide examples, where i3MAGE can be deployed in holistic data integration environments

MAnnheim DOCument Server

Scalable Integration View Computation and Maintenance with Parallel, Adaptive and Grouping Techniques

Author: Liu Bin
Publication venue: Digital WPI
Publication date: 19/08/2005
Field of study

Materialized integration views constructed by integrating data from multiple distributed data sources help to achieve better access, reliable performance, and high availability for a wide range of applications. In this dissertation, we propose parallel, adaptive, and grouping techniques to address scalability challenges in high-performance integration view computation and maintenance due to increasingly large data sources and high rates of source updates. State-of-the-art parallel integration view computation makes the common assumption that the maximal pipelined parallelism leads to superior performance. We instead propose segmented bushy parallel processing that combines pipelined parallelism with alternate forms of parallelism to achieve an overall more effective strategy. Experimental studies conducted over a cluster of high-performance PCs confirm that the proposed strategy has an on average of 50\% improvement in terms of total processing time in comparison to existing solutions. Run-time adaptation becomes critical for parallel integration view computation due to its long running and memory intensive nature. We investigate two types of state level adaptations, namely, state spill and state relocation, to address the run-time memory shortage. We propose lazy-disk and active-disk approaches that integrate both adaptations to maximize run-time query throughput in a memory constrained environment. We also propose global throughput-oriented state adaptation strategies for computation plans with multiple state intensive operators. Extensive experiments confirm the effectiveness of our proposed adaptation solutions. Once results have been computed and materialized, it\u27s typically more efficient to maintain them incrementally instead of full recomputation. However, state-of-the-art incremental view maintenance require O(

n^2

) maintenance queries with n being the number of data sources that the view is defined upon. Moreover, they do not exploit view definitions and data source processing capabilities to further improve view maintenance performance. We propose novel grouping maintenance algorithms that dramatically reduce the number of maintenance queries to (O(n)). A cost-based view maintenance framework has been proposed to generate optimized maintenance plans tuned to particular environmental settings. Extensive experimental studies verify the effectiveness of our maintenance algorithms as well as the maintenance framework

DigitalCommons@WPI

Recommended from our members

Developing a data quality scorecard that measures data quality in a data warehouse

Author: Grillo Aderibigbe
Publication venue: Brunel University London
Publication date: 01/01/2018
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe main purpose of this thesis is to develop a data quality scorecard (DQS) that aligns the data quality needs of the Data warehouse stakeholder group with selected data quality dimensions. To comprehend the research domain, a general and systematic literature review (SLR) was carried out, after which the research scope was established. Using Design Science Research (DSR) as the methodology to structure the research, three iterations were carried out to achieve the research aim highlighted in this thesis. In the first iteration, as DSR was used as a paradigm, the artefact was build from the results of the general and systematic literature review conduct. A data quality scorecard (DQS) was conceptualised. The result of the SLR and the recommendations for designing an effective scorecard provided the input for the development of the DQS. Using a System Usability Scale (SUS), to validate the usability of the DQS, the results of the first iteration suggest that the DW stakeholders found the DQS useful. The second iteration was conducted to further evaluate the DQS through a run through in the FMCG domain and then conducting a semi-structured interview. The thematic analysis of the semi-structured interviews demonstrated that the stakeholder's participants‘ found the DQS to be transparent; an additional reporting tool; Integrates; easy to use; consistent; and increases confidence in the data. However, the timeliness data dimension was found to be redundant, necessitating a modification to the DQS. The third iteration was conducted with similar steps as the second iteration but with the modified DQS in the oil and gas domain. The results from the third iteration suggest that DQS is a useful tool that is easy to use on a daily basis. The research contributes to theory by demonstrating a novel approach to DQS design This was achieved by ensuring the design of the DQS aligns with the data quality concern areas of the DW stakeholders and the data quality dimensions. Further, this research lay a good foundation for the future by establishing a DQS model that can be used as a base for further development

Brunel University Research Archive