235 research outputs found

    Integrating data warehouses with web data : a survey

    Get PDF
    This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML technologies that are currently being used to integrate, store, query, and retrieve Web data and their application to DWs. The paper reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces the problem of dealing with semistructured data in a DW. It studies Web data repositories, the design of multidimensional databases for XML data sources, and the XML extensions of OnLine Analytical Processing techniques. The paper addresses the application of information retrieval technology in a DW to exploit text-rich document collections. The authors hope that the paper will help to discover the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as to identify open research line

    Computer-Aided Warehouse Engineering (CAWE): Leveraging MDA and ADM for the Development of Data Warehouses

    Get PDF
    During the last decade, data warehousing has reached a high maturity and is a well-accepted technology in decision support systems. Nevertheless, development and maintenance are still tedious tasks since the systems grow over time and complex architectures have been established. The paper at hand adopts the concepts of Model Driven Architecture (MDA) and Architecture Driven Modernization (ADM) taken from the software engineering discipline to the data warehousing discipline. We show the works already available, outline further research directions and give hints for implementation of Computer-Aided Warehouse Engineering systems

    Personnalisation de Systèmes OLAP Annotés

    Get PDF
    National audienceThis paper deals with personalization of annotated OLAP systems. Data constellation is extended to support annotations and user preferences. Annotations reflect the decision-maker experience whereas user preferences enable users to focus on the most interesting data. User preferences allow annotated contextual recommendations helping the decision-maker during his/her multidimensional navigations

    A BPMN-Based Design and Maintenance Framework for ETL Processes

    Get PDF
    Business Intelligence (BI) applications require the design, implementation, and maintenance of processes that extract, transform, and load suitable data for analysis. The development of these processes (known as ETL) is an inherently complex problem that is typically costly and time consuming. In a previous work, we have proposed a vendor-independent language for reducing the design complexity due to disparate ETL languages tailored to specific design tools with steep learning curves. Nevertheless, the designer still faces two major issues during the development of ETL processes: (i) how to implement the designed processes in an executable language, and (ii) how to maintain the implementation when the organization data infrastructure evolves. In this paper, we propose a model-driven framework that provides automatic code generation capability and ameliorate maintenance support of our ETL language. We present a set of model-to-text transformations able to produce code for different ETL commercial tools as well as model-to-model transformations that automatically update the ETL models with the aim of supporting the maintenance of the generated code according to data source evolution. A demonstration using an example is conducted as an initial validation to show that the framework covering modeling, code generation and maintenance could be used in practice

    Pattern tree-based XOLAP rollup operator for XML complex hierarchies

    Full text link
    With the rise of XML as a standard for representing business data, XML data warehousing appears as a suitable solution for decision-support applications. In this context, it is necessary to allow OLAP analyses on XML data cubes. Thus, XQuery extensions are needed. To define a formal framework and allow much-needed performance optimizations on analytical queries expressed in XQuery, defining an algebra is desirable. However, XML-OLAP (XOLAP) algebras from the literature still largely rely on the relational model. Hence, we propose in this paper a rollup operator based on a pattern tree in order to handle multidimensional XML data expressed within complex hierarchies

    A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment

    Get PDF
    In today’s scenario, Extraction–transformation– loading (ETL) tools have become important pieces of software responsible for integrating heterogeneous information from several sources. The task of carrying out the ETL process is potentially a complex, hard and time consuming. Organisations now –a-days are concerned about vast qualities of data. The data quality is concerned with technical issues in data warehouse environment. Research in last few decades has laid more stress on data quality issues in a data warehouse ETL process. The data quality can be ensured cleaning the data prior to loading the data into a warehouse. Since the data is collected from various sources, it comes in various formats. The standardization of formats and cleaning such data becomes the need of clean data warehouse environment. Data quality attributes like accuracy, correctness, consistency, timeliness are required for a Knowledge discovery process. The present state -of –the- art purpose of the research work is to deal on data quality issues at all the aforementioned stages of data warehousing 1) Data sources, 2) Data integration 3) Data staging, 4) Data warehouse modelling and schematic design and to formulate descriptive classification of these causes. The discovered knowledge is used to repair the data deficiencies. This work proposes a framework for quality of extraction transformation and loading of data into a warehouse
    • …
    corecore