235 research outputs found

    Quality measures for ETL processes: from goals to implementation

    Get PDF
    Extraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of business process management, mainly focusing on functional requirements and performance optimization. However, the quality dimension has not yet been thoroughly investigated, and there is a need for a more human-centric approach to bring them closer to business-users requirements. In this paper, we take a first step towards this direction by defining a sound model for ETL process quality characteristics and quantitative measures for each characteristic, based on existing literature. Our model shows dependencies among quality characteristics and can provide the basis for subsequent analysis using goal modeling techniques. We showcase the use of goal modeling for ETL process design through a use case, where we employ the use of a goal model that includes quantitative components (i.e., indicators) for evaluation and analysis of alternative design decisions.Peer ReviewedPostprint (author's final draft

    Concepção de Sistemas ETL Seguros e Confiáveis em Alloy

    Get PDF
    Over the last few years, several proposals have been presented for supporting conceptual and logical modelling of data warehousing populating processes - ETL processes. However, these processes usually have a high degree of specificity, which entails very complex data requirements and elaborate processing routines – often difficult to validate. In ETL process modelling, the use of the Alloy specification language introduces an innovative formalism to the traditional approaches, maintaining the flexibility for handling the specific behaviours of an ETL process. Additionally, Alloy specifications can be analysed and validated, offering greater confidence in its correctness, which is essential for the success of complex software products. In this paper we present and discuss how to specify and validate ETL processes - blocks of operations and their dependencies - using Alloy, inspired by advances in this area of research, which show the potential of using a formal language in the ETL process modelling domain.Ao longo dos últimos anos foram apresentadas diversas propostas para suporte à modelação conceptual e lógica de processos de povoamento de data warehouses - processos de ETL. Todavia, estes processos apresentam usualmente um grau de especificidade elevado, acarretando requisitos de dados bastante complexos e rotinas de transformação muito elaboradas, cuja correção é frequentemente de difícil validação. Na modelação de processos de ETL, a utilização da linguagem de especificação Alloy introduz um formalismo inovador perante as abordagens tradicionalmente utilizadas, mantendo a flexibilidade necessária para lidar com comportamentos específicos dos processos ETL. Adicionalmente, as especificações criadas podem ser analisadas e validadas, oferecendo maior confiança quanto à sua correção, uma característica imprescindível no sucesso de produtos de software complexos. Neste artigo, inspirados pelos avanços registados nesta área de trabalho, apresentamos e discutimos formas de especificar e validar processos de ETL - blocos de operações e as suas dependências - utilizando a linguagem Alloy.Este trabalho foi suportado pelo COMPETE: POCI-01-0145-FEDER-007043, by FCT – Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/2013

    Frequent patterns in ETL workflows: An empirical approach

    Get PDF
    The complexity of Business Intelligence activities has driven the proposal of several approaches for the effective modeling of Extract-Transform-Load (ETL) processes, based on the conceptual abstraction of their operations. Apart from fostering automation and maintainability, such modeling also provides the building blocks to identify and represent frequently recurring patterns. Despite some existing work on classifying ETL components and functionality archetypes, the issue of systematically mining such patterns and their connection to quality attributes such as performance has not yet been addressed. In this work, we propose a methodology for the identification of ETL structural patterns. We logically model the ETL workflows using labeled graphs and employ graph algorithms to identify candidate patterns and to recognize them on different workflows. We showcase our approach through a use case that is applied on implemented ETL processes from the TPC-DI specification and we present mined ETL patterns. Decomposing ETL processes to identified patterns, our approach provides a stepping stone for the automatic translation of ETL logical models to their conceptual representation and to generate fine-grained cost models at the granularity level of patterns.Peer ReviewedPostprint (author's final draft

    A BPMN-Based Design and Maintenance Framework for ETL Processes

    Get PDF
    Business Intelligence (BI) applications require the design, implementation, and maintenance of processes that extract, transform, and load suitable data for analysis. The development of these processes (known as ETL) is an inherently complex problem that is typically costly and time consuming. In a previous work, we have proposed a vendor-independent language for reducing the design complexity due to disparate ETL languages tailored to specific design tools with steep learning curves. Nevertheless, the designer still faces two major issues during the development of ETL processes: (i) how to implement the designed processes in an executable language, and (ii) how to maintain the implementation when the organization data infrastructure evolves. In this paper, we propose a model-driven framework that provides automatic code generation capability and ameliorate maintenance support of our ETL language. We present a set of model-to-text transformations able to produce code for different ETL commercial tools as well as model-to-model transformations that automatically update the ETL models with the aim of supporting the maintenance of the generated code according to data source evolution. A demonstration using an example is conducted as an initial validation to show that the framework covering modeling, code generation and maintenance could be used in practice

    Sustainability Reporting Process Model using Business Intelligence

    Get PDF
    Sustainability including the reporting requirements is one of the most relevant topics for companies. In recent years, many software providers have launched new software tools targeting companies committed to implementing sustainability reporting. But it’s not only companies willing to use their Business Intelligence (BI) solution, there are also basic principles such as the single source of truth and tendencies to combine sustainability reporting with the financial reporting (Integrated Reporting) The IT integration of sustainability reporting has received limited attention by scientific research and can be facilitated using BI systems. This has to be done both to anticipate the economic demand for integrated reporting from an IT perspective as well as for ensuring the reporting of revisable data. Through the adaption of BI systems, necessary environmental and social changes can be addressed rather than merely displaying sustainability data from additional, detached systems or generic spreadsheet applications. This thesis presents research in the two domains sustainability reporting and Business Intelligence and provides a method to support companies willing to implement sustainability reporting with BI. SureBI presented within this thesis is developed to address experts from both sustainability and BI. At first BI is researched from a IT and project perspective and a novel BI reporting process is developed. Then, sustainability reporting is researched focusing on the reporting content and a sustainability reporting process is derived. Based on these two reporting processes SureBI is developed, a step-by-step process method, aiming to guide companies through the process of implementing sustainability reporting using their BI environment. Concluding, an evaluation and implementation assesses the suitability and correctness of the process model and exemplarily implements crucial IT tasks of the process. The novel combination of these two topics indicates challenges from both fields. In case of BI, users face problems regarding historically grown systems and lacking implementation strategies. In case of sustainability, the mostly voluntary manner of this reporting leads to an uncertainty as to which indicators have to be reported. The resulting SureBI addresses and highlights these challenges and provides methods for the addressing and prioritization of new stakeholders, the prioritization of the reporting content and describes possibilities to integrate the high amount of estimation figures using BI. Results prove that sustainability reporting could and should be implemented using existing BI solutions

    Framework BPMN para a Modelação de Processos de ETL

    Get PDF
    O Extract-Transform-Load (ETL) é um componente crítico nos Sistemas de Data Warehousing (SDW) sendo responsável por extrair, transformar e carregar dados para apoiar os requisitos de tomada de decisão. Devido à complexidade da gestão dos dados, estes processos consomem grande parte dos recursos necessários na implementação dos SDW. Sendo um componente crítico que pode comprometer a adequação do sistema, se não fornecer garantias na qualidade de dados, a confiança no sistema é comprometida. Apesar da sua importância, o desenvolvimento de sistemas de ETL é essencialmente ad-hoc, o que não contribui para garantir o seguimento de práticas sólidas que garantam a coerência e coesão do desenvolvimento dos sistemas. Nos últimos anos, a Business Process Model and Notation (BPMN) tem sido proposta e utilizada para suportar os modelos conceptuais de ETL. O BPMN é uma linguagem expressiva que permite diferentes abordagens para representar os requisitos de povoamento dos processos de ETL. Neste trabalho, é explorada a utilização de BPMN para modelação conceptual de ETL, analisando as abordagens existentes e propondo um conjunto de diretrizes para utilizar o BPMN de uma forma mais consistente.The Extract-Transform-Load (ETL) is a critical component in Data Warehousing Systems (SDW) being responsible for extracting, transforming, and loading data to support decision-making requirements. Due to the complexity of data management, these processes consume a large part of the resources needed in the implementation of SDW. Being a critical component that can compromise the suitability of the system, if it does not provide guarantees in data quality, trust in the system is compromised. Although its importance, the development of ETL systems is essentially ad-hoc, which does not contribute to guaranteeing the follow-up of solid practices that guarantee the coherence and cohesion of the development of the systems. In recent years, the Business Process Model and Notation (BPMN) has been proposed and used to support the conceptual models of ETL. BPMN is an expressive language that allows different approaches to represent the population requirements of ETL processes. In this work, the use of BPMN for conceptual modeling of ETL is explored, analyzing the existing approaches, and proposing a set of guidelines to use BPMN in a standardized way

    BPMN4sML: A BPMN Extension for Serverless Machine Learning. Technology Independent and Interoperable Modeling of Machine Learning Workflows and their Serverless Deployment Orchestration

    Full text link
    Machine learning (ML) continues to permeate all layers of academia, industry and society. Despite its successes, mental frameworks to capture and represent machine learning workflows in a consistent and coherent manner are lacking. For instance, the de facto process modeling standard, Business Process Model and Notation (BPMN), managed by the Object Management Group, is widely accepted and applied. However, it is short of specific support to represent machine learning workflows. Further, the number of heterogeneous tools for deployment of machine learning solutions can easily overwhelm practitioners. Research is needed to align the process from modeling to deploying ML workflows. We analyze requirements for standard based conceptual modeling for machine learning workflows and their serverless deployment. Confronting the shortcomings with respect to consistent and coherent modeling of ML workflows in a technology independent and interoperable manner, we extend BPMN's Meta-Object Facility (MOF) metamodel and the corresponding notation and introduce BPMN4sML (BPMN for serverless machine learning). Our extension BPMN4sML follows the same outline referenced by the Object Management Group (OMG) for BPMN. We further address the heterogeneity in deployment by proposing a conceptual mapping to convert BPMN4sML models to corresponding deployment models using TOSCA. BPMN4sML allows technology-independent and interoperable modeling of machine learning workflows of various granularity and complexity across the entire machine learning lifecycle. It aids in arriving at a shared and standardized language to communicate ML solutions. Moreover, it takes the first steps toward enabling conversion of ML workflow model diagrams to corresponding deployment models for serverless deployment via TOSCA.Comment: 105 pages 3 tables 33 figure

    Semantic Bridging between Conceptual Modeling Standards and Agile Software Projects Conceptualizations

    Get PDF
    Software engineering benefitted from modeling standards (e.g. UML, BPMN), but Agile Software Project Management tends to marginalize most forms of documentation including diagrammatic modeling, focusing instead on the tracking of a project\u27s backlog and related issues. Limited means are available for annotating Jira items with diagrams, however not on a granular and semantically traceable level. Business processes tend to get lost on the way between process analysis (if any) and backlog items; UML design decisions are often disconnected from the issue tracking environment. This paper proposes domain-specific conceptual modeling to obtain a diagrammatic view on a Jira project, motivated by past conceptualizations of the agile paradigm while also offering basic interoperability with Jira to switch between environments and views. The underlying conceptualization extends conceptual modeling languages (BPMN, UML) with an agile project management perspective to enrich contextual traceability of a project\u27s elements while ensuring that data structures handled by Jira can be captured and exposed to Jira if needed. Therefore, concepts underlying the typical software development project management are integrated with established modeling concepts and tailored (with metamodeling means) for the domain-specificity of agile project management. A Design Science approach was pursued to develop a modeling method artifact, resulting in a domain-specific modeling tool for software project managers that want to augment agile practices and enrich issue annotation
    corecore