397 research outputs found

    Quality measures for ETL processes: from goals to implementation

    Get PDF
    Extraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of business process management, mainly focusing on functional requirements and performance optimization. However, the quality dimension has not yet been thoroughly investigated, and there is a need for a more human-centric approach to bring them closer to business-users requirements. In this paper, we take a first step towards this direction by defining a sound model for ETL process quality characteristics and quantitative measures for each characteristic, based on existing literature. Our model shows dependencies among quality characteristics and can provide the basis for subsequent analysis using goal modeling techniques. We showcase the use of goal modeling for ETL process design through a use case, where we employ the use of a goal model that includes quantitative components (i.e., indicators) for evaluation and analysis of alternative design decisions.Peer ReviewedPostprint (author's final draft

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    Frequent patterns in ETL workflows: An empirical approach

    Get PDF
    The complexity of Business Intelligence activities has driven the proposal of several approaches for the effective modeling of Extract-Transform-Load (ETL) processes, based on the conceptual abstraction of their operations. Apart from fostering automation and maintainability, such modeling also provides the building blocks to identify and represent frequently recurring patterns. Despite some existing work on classifying ETL components and functionality archetypes, the issue of systematically mining such patterns and their connection to quality attributes such as performance has not yet been addressed. In this work, we propose a methodology for the identification of ETL structural patterns. We logically model the ETL workflows using labeled graphs and employ graph algorithms to identify candidate patterns and to recognize them on different workflows. We showcase our approach through a use case that is applied on implemented ETL processes from the TPC-DI specification and we present mined ETL patterns. Decomposing ETL processes to identified patterns, our approach provides a stepping stone for the automatic translation of ETL logical models to their conceptual representation and to generate fine-grained cost models at the granularity level of patterns.Peer ReviewedPostprint (author's final draft

    A BPMN-Based Design and Maintenance Framework for ETL Processes

    Get PDF
    Business Intelligence (BI) applications require the design, implementation, and maintenance of processes that extract, transform, and load suitable data for analysis. The development of these processes (known as ETL) is an inherently complex problem that is typically costly and time consuming. In a previous work, we have proposed a vendor-independent language for reducing the design complexity due to disparate ETL languages tailored to specific design tools with steep learning curves. Nevertheless, the designer still faces two major issues during the development of ETL processes: (i) how to implement the designed processes in an executable language, and (ii) how to maintain the implementation when the organization data infrastructure evolves. In this paper, we propose a model-driven framework that provides automatic code generation capability and ameliorate maintenance support of our ETL language. We present a set of model-to-text transformations able to produce code for different ETL commercial tools as well as model-to-model transformations that automatically update the ETL models with the aim of supporting the maintenance of the generated code according to data source evolution. A demonstration using an example is conducted as an initial validation to show that the framework covering modeling, code generation and maintenance could be used in practice

    Concepção de Sistemas ETL Seguros e Confiáveis em Alloy

    Get PDF
    Over the last few years, several proposals have been presented for supporting conceptual and logical modelling of data warehousing populating processes - ETL processes. However, these processes usually have a high degree of specificity, which entails very complex data requirements and elaborate processing routines – often difficult to validate. In ETL process modelling, the use of the Alloy specification language introduces an innovative formalism to the traditional approaches, maintaining the flexibility for handling the specific behaviours of an ETL process. Additionally, Alloy specifications can be analysed and validated, offering greater confidence in its correctness, which is essential for the success of complex software products. In this paper we present and discuss how to specify and validate ETL processes - blocks of operations and their dependencies - using Alloy, inspired by advances in this area of research, which show the potential of using a formal language in the ETL process modelling domain.Ao longo dos últimos anos foram apresentadas diversas propostas para suporte à modelação conceptual e lógica de processos de povoamento de data warehouses - processos de ETL. Todavia, estes processos apresentam usualmente um grau de especificidade elevado, acarretando requisitos de dados bastante complexos e rotinas de transformação muito elaboradas, cuja correção é frequentemente de difícil validação. Na modelação de processos de ETL, a utilização da linguagem de especificação Alloy introduz um formalismo inovador perante as abordagens tradicionalmente utilizadas, mantendo a flexibilidade necessária para lidar com comportamentos específicos dos processos ETL. Adicionalmente, as especificações criadas podem ser analisadas e validadas, oferecendo maior confiança quanto à sua correção, uma característica imprescindível no sucesso de produtos de software complexos. Neste artigo, inspirados pelos avanços registados nesta área de trabalho, apresentamos e discutimos formas de especificar e validar processos de ETL - blocos de operações e as suas dependências - utilizando a linguagem Alloy.Este trabalho foi suportado pelo COMPETE: POCI-01-0145-FEDER-007043, by FCT – Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/2013

    Towards a Modeling Method for Managing Node.js Projects and Dependencies

    Get PDF
    This paper proposes a domain-specific and technology-specific modeling method for managing Node.js projects. It addresses the challenge of managing dependencies in the NPM and REST ecosystems, while also providing a specialized workflow model type as a process-centric view on a software project. With the continuous growth of the Node.js environment, managing complex projects that use this technology can be chaotic, especially when it comes to planning dependencies and module integration. The deprecation of a module can lead to serious crisis regarding the projects where that module was used; consequently, traceability of deprecation propagation becomes a key requirements in Node.js project management. The modeling method introduced in this paper provides a diagrammatic solution to managing module and API dependencies in a Node.js project. It is deployed as a modeling tool that can also generate REST API documentation and Node.js project configuration files that can be executed to install the graphically designed dependencies

    Sustainability Reporting Process Model using Business Intelligence

    Get PDF
    Sustainability including the reporting requirements is one of the most relevant topics for companies. In recent years, many software providers have launched new software tools targeting companies committed to implementing sustainability reporting. But it’s not only companies willing to use their Business Intelligence (BI) solution, there are also basic principles such as the single source of truth and tendencies to combine sustainability reporting with the financial reporting (Integrated Reporting) The IT integration of sustainability reporting has received limited attention by scientific research and can be facilitated using BI systems. This has to be done both to anticipate the economic demand for integrated reporting from an IT perspective as well as for ensuring the reporting of revisable data. Through the adaption of BI systems, necessary environmental and social changes can be addressed rather than merely displaying sustainability data from additional, detached systems or generic spreadsheet applications. This thesis presents research in the two domains sustainability reporting and Business Intelligence and provides a method to support companies willing to implement sustainability reporting with BI. SureBI presented within this thesis is developed to address experts from both sustainability and BI. At first BI is researched from a IT and project perspective and a novel BI reporting process is developed. Then, sustainability reporting is researched focusing on the reporting content and a sustainability reporting process is derived. Based on these two reporting processes SureBI is developed, a step-by-step process method, aiming to guide companies through the process of implementing sustainability reporting using their BI environment. Concluding, an evaluation and implementation assesses the suitability and correctness of the process model and exemplarily implements crucial IT tasks of the process. The novel combination of these two topics indicates challenges from both fields. In case of BI, users face problems regarding historically grown systems and lacking implementation strategies. In case of sustainability, the mostly voluntary manner of this reporting leads to an uncertainty as to which indicators have to be reported. The resulting SureBI addresses and highlights these challenges and provides methods for the addressing and prioritization of new stakeholders, the prioritization of the reporting content and describes possibilities to integrate the high amount of estimation figures using BI. Results prove that sustainability reporting could and should be implemented using existing BI solutions
    corecore