97 research outputs found

    A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses

    Get PDF
    In data warehousing, Extract, Transform, and Load (ETL) processes are in charge of extracting the data from the data sources that will be contained in the data warehouse. Their design and maintenance is thus a cornerstone in any data warehouse development project. Due to their relevance, the quality of these processes should be formally assessed early in the development in order to avoid populating the data warehouse with incorrect data. To this end, this paper presents a set of measures with which to evaluate the structural complexity of ETL process models at the conceptual level. This study is, moreover, accompanied by the application of formal frameworks and a family of experiments whose aim is to theoretical and empirically validate the proposed measures, respectively. Our experiments show that the use of these measures can aid designers to predict the effort associated with the maintenance tasks of ETL processes and to make ETL process models more usable. Our work is based on Unified Modeling Language (UML) activity diagrams for modeling ETL processes, and on the Framework for the Modeling and Evaluation of Software Processes (FMESP) framework for the definition and validation of the measures.In data warehousing, Extract, Transform, and Load (ETL) processes are in charge of extracting the data from the data sources that will be contained in the data warehouse. Their design and maintenance is thus a cornerstone in any data warehouse development project. Due to their relevance, the quality of these processes should be formally assessed early in the development in order to avoid populating the data warehouse with incorrect data. To this end, this paper presents a set of measures with which to evaluate the structural complexity of ETL process models at the conceptual level. This study is, moreover, accompanied by the application of formal frameworks and a family of experiments whose aim is to theoretical and empirically validate the proposed measures, respectively. Our experiments show that the use of these measures can aid designers to predict the effort associated with the maintenance tasks of ETL processes and to make ETL process models more usable. Our work is based on Unified Modeling Language (UML) activity diagrams for modeling ETL processes, and on the Framework for the Modeling and Evaluation of Software Processes (FMESP) framework for the definition and validation of the measures

    A BPMN-Based Design and Maintenance Framework for ETL Processes

    Get PDF
    Business Intelligence (BI) applications require the design, implementation, and maintenance of processes that extract, transform, and load suitable data for analysis. The development of these processes (known as ETL) is an inherently complex problem that is typically costly and time consuming. In a previous work, we have proposed a vendor-independent language for reducing the design complexity due to disparate ETL languages tailored to specific design tools with steep learning curves. Nevertheless, the designer still faces two major issues during the development of ETL processes: (i) how to implement the designed processes in an executable language, and (ii) how to maintain the implementation when the organization data infrastructure evolves. In this paper, we propose a model-driven framework that provides automatic code generation capability and ameliorate maintenance support of our ETL language. We present a set of model-to-text transformations able to produce code for different ETL commercial tools as well as model-to-model transformations that automatically update the ETL models with the aim of supporting the maintenance of the generated code according to data source evolution. A demonstration using an example is conducted as an initial validation to show that the framework covering modeling, code generation and maintenance could be used in practice

    Flexible Integration and Efficient Analysis of Multidimensional Datasets from the Web

    Get PDF
    If numeric data from the Web are brought together, natural scientists can compare climate measurements with estimations, financial analysts can evaluate companies based on balance sheets and daily stock market values, and citizens can explore the GDP per capita from several data sources. However, heterogeneities and size of data remain a problem. This work presents methods to query a uniform view - the Global Cube - of available datasets from the Web and builds on Linked Data query approaches

    Enhancing Data Warehouse management through semi-automatic data integration and complex graph generation

    Get PDF
    2013 - 2014Strategic information is one of the main assets for many organizations and, in the next future, it will become increasingly more important to enable the decisionmakers answer questions about their business, such as how to increase their profitability. A proper decision-making process is benefited by information that is frequently scattered among several heterogeneous databases. Such databases may come from several organization systems and even from external sources. As a result, organization managers have to deal with the issue of integrating several databases from independent data sources containing semantic differences and no specific or canonical concept description. Data Warehouse Systems were born to integrate such kind of heterogeneous data in order to be successively extracted and analyzed according to the manager’s needs and business plans. Besides being difficult and onerous to design, integrate and build, Data Warehouse Systems present another issue related to the difficulty to represent multidimensional information typical of the result of OLAP operations, such as aggregations on data cubes, extraction of sub-cubes or rotations of the data axis, through easy to understand views... [edited by author]XIII n.s

    DACTyL:towards providing the missing link between clinical and telehealth data

    Get PDF
    This document conveys the findings of the Data Analytics, Clinical, Telehealth, Link (DACTyL) project. This nine-month project started at January 2013 and was conducted at Philips Research in the Care Management Solution group and as part of the Data Analysis for Home Healthcare (DA4HH) project. The DA4HH charter is to perform and support retrospective analyses of data from Home Healthcare products, such as Motiva telehealth. These studies will provide valid insights in actual clinical aspects, usage and behavior of installed products and services. The insights will help to improve service offerings, create clinical algorithms for better outcome, and validate and substantiate claims on efficacy and cost-effectiveness. The current DACTyL project aims at developing and implementing an architecture and infrastructure to meet the most demanding need from Motiva telehealth customers on return on investment (ROI). These customers are hospitals that offer Motiva telehealth to their patients. In order to provide the Motiva service cost-effectively, they need to have insight into the actual cost, benefit and resource utilization when it comes to Motiva deployment compared to their usual routine care. Additional stakeholders for these ROI-related data are Motiva customer consultants and research scientists from Philips for strengthening their messaging and service deliveries to arrive at better patient care

    Interoperability of Enterprise Software and Applications

    Get PDF

    Flexible Integration and Efficient Analysis of Multidimensional Datasets from the Web

    Get PDF
    If numeric data from the Web are brought together, natural scientists can compare climate measurements with estimations, financial analysts can evaluate companies based on balance sheets and daily stock market values, and citizens can explore the GDP per capita from several data sources. However, heterogeneities and size of data remain a problem. This work presents methods to query a uniform view - the Global Cube - of available datasets from the Web and builds on Linked Data query approaches

    Exploring the use of routine healthcare data through process mining to inform the management of musculoskeletal diseases

    Get PDF
    Healthcare informatics can help address some of the challenges faced by both healthcare providers and patients. The medical domain is characterised by inherently complex and intricate issues, data can often be of poor quality and novel techniques are required. Process mining is a discipline that uses techniques to extract insights from event data, generated during the execution of processes. It has had good results in various branches of medical science but applications to musculoskeletal diseases remain largely unexplored. This research commenced with a review of the healthcare and technical literature and applied a variety of process mining techniques in order to investigate approaches to the healthcare plans of patients with musculoskeletal conditions. The analysis involved three datasets from: 1) a private hospital in Boston, US, where data was used to create disease trajectory models. Results suggest the method may be of interest to healthcare researchers, as it enables a more rapid modelling and visualisation; 2) a mobile healthcare application for patients receiving physiotherapy in Sheffield, UK, where data was used to identify possible indicators for health outcomes. After evaluation of the results, it was found that the indicators identified may be down to chance; and 3) the population of Wales to explore knee pain surgery pathways. Results suggest that process mining is an effective technique. This work demonstrates how routine healthcare data can be analysed using process mining techniques to provide insights that may benefit patients suffering with musculoskeletal conditions. This thesis explores how strict criteria for analysis can be performed. The work is intended to expand the breadth of process mining methods available to the data science community and has contributed by making recommendations for service utilisation within physiotherapy at Sheffield Hospital and helped to define a roadmap for a leading healthcare software company
    corecore