4 research outputs found

    Comparison of distributed technologies for defining a distributable and interoperable ETL framework

    Get PDF
    Extraction, Transformation and Loading (ETL) are major functionalities in data warehousing. Lack of component distribution and interoperability are the main problems in the ETL area, because ETL components are tightly-coupled in the traditional ETL framework. This paper explores and discusses five popular distributed technologies for the purpose of highlighting the best technology that is capable of overcoming the distribution and interoperability gaps of the traditional ETL framework. Based on the comparison of distributed technologies discussed in this paper, several benefits can be obtained when SOA is used in redefining the ETL framework such as: distribution, interoperability, reusability, portability, and compatibility with legacy systems. These advantages and other SOA specifications are discussed in this paper

    Framework for Interoperable and Distributed Extraction-Transformation-Loading (ETL) Based on Service Oriented Architecture

    Get PDF
    Extraction. Transformation and Loading (ETL) are the major functionalities in data warehouse (DW) solutions. Lack of component distribution and interoperability is a gap that leads to many problems in the ETL domain, which is due to tightly-coupled components in the current ETL framework. This research discusses how to distribute the Extraction, Transformation and Loading components so as to achieve distribution and interoperability of these ETL components. In addition, it shows how the ETL framework can be extended. To achieve that, Service Oriented Architecture (SOA) is adopted to address the mentioned missing features of distribution and interoperability by restructuring the current ETL framework. This research contributes towards the field of ETL by adding the distribution and inter- operability concepts to the ETL framework. This Ieads to contributions towards the area of data warehousing and business intelligence, because ETL is a core concept in this area. The Design Science Approach (DSA) and Scrum methodologies were adopted for achieving the research goals. The integration of DSA and Scrum provides the suitable methods for achieving the research objectives. The new ETL framework is realized by developing and testing a prototype that is based on the new ETL framework. This prototype is successfully evaluated using three case studies that are conducted using the data and tools of three different organizations. These organizations use data warehouse solutions for the purpose of generating statistical reports that help their top management to take decisions. Results of the case studies show that distribution and interoperability can be achieved by using the new ETL framework

    A Grid Services-Oriented Architecture for Efficient Operation of Distributed Data Warehouses on Globus

    No full text
    International audienceData warehouses store large volumes of data according to a multidimensional model that provides a fast access for online analysis. The constant growth in quantity and complexity of data stored in data warehouses has led to a variety of data warehouse applications on distributed systems. The main benefits of these architectures are parallelized query execution and higher storage capacities. Computing grids in particular are built to combine a large number of heterogeneous distributed resources. Their lack of centralized control however conflicts with the centralized structure of classical data warehouses. Autonomous data management on grid nodes requires efficient communication during query evaluation. The architecture we present supports a global data localization method with the help of a specialized catalog service. Our work is based on a model for unique identification and efficient local indexing of the warehouse data. Local indexes integrate computable aggregates for maximum utilization of locally materialized data in order to facilitate cost-optimized query execution. The grid services implementing these functionalities are deployed on the GGM project's test environment

    A Grid Services-Oriented Architecture for Efficient Operation of Distributed Data Warehouses on Globus

    No full text
    International audienceData warehouses store large volumes of data according to a multidimensional model that provides a fast access for online analysis. The constant growth in quantity and complexity of data stored in data warehouses has led to a variety of data warehouse applications on distributed systems. The main benefits of these architectures are parallelized query execution and higher storage capacities. Computing grids in particular are built to combine a large number of heterogeneous distributed resources. Their lack of centralized control however conflicts with the centralized structure of classical data warehouses. Autonomous data management on grid nodes requires efficient communication during query evaluation. The architecture we present supports a global data localization method with the help of a specialized catalog service. Our work is based on a model for unique identification and efficient local indexing of the warehouse data. Local indexes integrate computable aggregates for maximum utilization of locally materialized data in order to facilitate cost-optimized query execution. The grid services implementing these functionalities are deployed on the GGM project's test environment
    corecore