6 research outputs found

    Enhancing XML Data Warehouse Query Performance by Fragmentation

    Full text link
    XML data warehouses form an interesting basis for decision-support applications that exploit heterogeneous data from multiple sources. However, XML-native database systems currently suffer from limited performances in terms of manageable data volume and response time for complex analytical queries. Fragmenting and distributing XML data warehouses (e.g., on data grids) allow to address both these issues. In this paper, we work on XML warehouse fragmentation. In relational data warehouses, several studies recommend the use of derived horizontal fragmentation. Hence, we propose to adapt it to the XML context. We particularly focus on the initial horizontal fragmentation of dimensions' XML documents and exploit two alternative algorithms. We experimentally validate our proposal and compare these alternatives with respect to a unified XML warehouse model we advocate for

    Framework for Interoperable and Distributed Extraction-Transformation-Loading (ETL) Based on Service Oriented Architecture

    Get PDF
    Extraction. Transformation and Loading (ETL) are the major functionalities in data warehouse (DW) solutions. Lack of component distribution and interoperability is a gap that leads to many problems in the ETL domain, which is due to tightly-coupled components in the current ETL framework. This research discusses how to distribute the Extraction, Transformation and Loading components so as to achieve distribution and interoperability of these ETL components. In addition, it shows how the ETL framework can be extended. To achieve that, Service Oriented Architecture (SOA) is adopted to address the mentioned missing features of distribution and interoperability by restructuring the current ETL framework. This research contributes towards the field of ETL by adding the distribution and inter- operability concepts to the ETL framework. This Ieads to contributions towards the area of data warehousing and business intelligence, because ETL is a core concept in this area. The Design Science Approach (DSA) and Scrum methodologies were adopted for achieving the research goals. The integration of DSA and Scrum provides the suitable methods for achieving the research objectives. The new ETL framework is realized by developing and testing a prototype that is based on the new ETL framework. This prototype is successfully evaluated using three case studies that are conducted using the data and tools of three different organizations. These organizations use data warehouse solutions for the purpose of generating statistical reports that help their top management to take decisions. Results of the case studies show that distribution and interoperability can be achieved by using the new ETL framework

    Vertical Fragmentation for Database Using FPClose Algorithm

    Get PDF
    Vertical fragmentation technique is used to enhance the performance of database system and reduce the number of access to irrelevant instances by splitting a table or relation into different fragments vertically. The partitioning design can be derived using FPClose algorithm, which is a data mining algorithm used to extract the frequent closed itemsets in a dataset. A new design approach is implemented to perform fragmentation. A benchmark with different minimum support levels is tested. The obtained results from FPClose algorithm are compared with the Apriori algorithm

    Intership Report on data merging at the bank of Portugal Internship Experience at the Bank of Portugal: A Comprehensive Dive into Full Stack Development - Leveraging Modern Technology to Innovate Financial Infrastructure and Enhance User Experience

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThis report details my full-stack development internship experiences at the Bank of Portugal, with a particular emphasis on the creation of a website intended to increase operational effectiveness in the DAS Department. My main contributions met a clear need, which was the absence of a reliable platform that could manage and combine data from many sources. I was actively involved in creating functionality for the Django applications Integrator and BAII using Django, a high-level Python web framework. Several problems were addressed by the distinctive features I planned and programmed, including daily data extraction from several SQL databases, entity error detection, data merging, and user-friendly interfaces for data manipulation. A feature that enables the attribution of litigation to certain entities was also developed. The outcomes of the developed features have proven to be useful, giving the Institutional Intervention Area, the Sanctioning Action Area, the Illicit Financial Activity Investigation Area, and the Money Laundering Preventive Supervision Area for Capital and Financing of Terrorism tools to carry out their duties more effectively. The full-stack development approaches' advancement and use in the banking industry, notably in data management and web application development, have been aided by this internship experience

    The XFM view adaptation mechanism: An essential component for XML data warehouses

    Get PDF
    In the past few years, with many organisations providing web services for business and communication purposes, large volumes of XML transactions take place on a daily basis. In many cases, organisations maintain these transactions in their native XML format due to its flexibility for xchanging data between heterogeneous systems. This XML data provides an important resource for decision support systems. As a consequence, XML technology has slowly been included within decision support systems of data warehouse systems. The problem encountered is that existing native XML database systems suffer from poor performance in terms of managing data volume and response time for complex analytical queries. Although materialised XML views can be used to improve the performance for XML data warehouses, update problems then become the bottleneck of using materialised views. Specifically, synchronising materialised views in the face of changing view definitions, remains a significant issue. In this dissertation, we provide a method for XML-based data warehouses to manage updates caused by the change of view definitions (view redefinitions), which is referred to as the view adaptation problem. In our approach, views are defined using XPath and then modelled using a set of novel algebraic operators and fragments. XPath views are integrated into a single view graph called the XML Fragment Materialisation (XFM) View Graph, where common parts between different views are shared and appear only once in the graph. Fragments within the view graph can be selected for materialisation to facilitate the view adaptation process. While changes are applied, our view adaptation algorithms can quickly determine what part of the XFM view graph is affected. The adaptation algorithms then perform a structural adaptation to update the view graph, followed by data adaptation to update materialised fragments

    Enhancing XML data warehouse query performance by fragmentation

    No full text
    corecore