283 research outputs found

    A BPMN-Based Design and Maintenance Framework for ETL Processes

    Get PDF
    Business Intelligence (BI) applications require the design, implementation, and maintenance of processes that extract, transform, and load suitable data for analysis. The development of these processes (known as ETL) is an inherently complex problem that is typically costly and time consuming. In a previous work, we have proposed a vendor-independent language for reducing the design complexity due to disparate ETL languages tailored to specific design tools with steep learning curves. Nevertheless, the designer still faces two major issues during the development of ETL processes: (i) how to implement the designed processes in an executable language, and (ii) how to maintain the implementation when the organization data infrastructure evolves. In this paper, we propose a model-driven framework that provides automatic code generation capability and ameliorate maintenance support of our ETL language. We present a set of model-to-text transformations able to produce code for different ETL commercial tools as well as model-to-model transformations that automatically update the ETL models with the aim of supporting the maintenance of the generated code according to data source evolution. A demonstration using an example is conducted as an initial validation to show that the framework covering modeling, code generation and maintenance could be used in practice

    Quality measures for ETL processes: from goals to implementation

    Get PDF
    Extraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of business process management, mainly focusing on functional requirements and performance optimization. However, the quality dimension has not yet been thoroughly investigated, and there is a need for a more human-centric approach to bring them closer to business-users requirements. In this paper, we take a first step towards this direction by defining a sound model for ETL process quality characteristics and quantitative measures for each characteristic, based on existing literature. Our model shows dependencies among quality characteristics and can provide the basis for subsequent analysis using goal modeling techniques. We showcase the use of goal modeling for ETL process design through a use case, where we employ the use of a goal model that includes quantitative components (i.e., indicators) for evaluation and analysis of alternative design decisions.Peer ReviewedPostprint (author's final draft

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    Towards a Modeling Method for Managing Node.js Projects and Dependencies

    Get PDF
    This paper proposes a domain-specific and technology-specific modeling method for managing Node.js projects. It addresses the challenge of managing dependencies in the NPM and REST ecosystems, while also providing a specialized workflow model type as a process-centric view on a software project. With the continuous growth of the Node.js environment, managing complex projects that use this technology can be chaotic, especially when it comes to planning dependencies and module integration. The deprecation of a module can lead to serious crisis regarding the projects where that module was used; consequently, traceability of deprecation propagation becomes a key requirements in Node.js project management. The modeling method introduced in this paper provides a diagrammatic solution to managing module and API dependencies in a Node.js project. It is deployed as a modeling tool that can also generate REST API documentation and Node.js project configuration files that can be executed to install the graphically designed dependencies

    Frequent patterns in ETL workflows: An empirical approach

    Get PDF
    The complexity of Business Intelligence activities has driven the proposal of several approaches for the effective modeling of Extract-Transform-Load (ETL) processes, based on the conceptual abstraction of their operations. Apart from fostering automation and maintainability, such modeling also provides the building blocks to identify and represent frequently recurring patterns. Despite some existing work on classifying ETL components and functionality archetypes, the issue of systematically mining such patterns and their connection to quality attributes such as performance has not yet been addressed. In this work, we propose a methodology for the identification of ETL structural patterns. We logically model the ETL workflows using labeled graphs and employ graph algorithms to identify candidate patterns and to recognize them on different workflows. We showcase our approach through a use case that is applied on implemented ETL processes from the TPC-DI specification and we present mined ETL patterns. Decomposing ETL processes to identified patterns, our approach provides a stepping stone for the automatic translation of ETL logical models to their conceptual representation and to generate fine-grained cost models at the granularity level of patterns.Peer ReviewedPostprint (author's final draft

    LOD object content specification for manufacturers within the UK using the IDM standard

    Get PDF
    UK manufacturers are gradually embracing the adoption of Level 2 Building Information Modelling (BIM) standards (3D models and embedded data) within their product model elements. However, these are not always well defined due to inaccuracies related to the scope and the content of the model attributes. Product Data Templates (PDTs) are currently being created as a solution to provide structured model element data to manufacturer’s clients. However, defining PDTs data has been particularly challenging for manufacturers, as there is a scarcity of content knowledge which includes BIM uses (i.e. electrical design) and processes (i.e. cable tray sizing) that support client’s lifecycle processes. Similarly, few studies have investigated the Level of Development (LOD) that manufacturers should use to create their model element product data. In this paper, we therefore propose a generic industry approach to create and maintain model element product data at different LODs using the Information Delivery Manual (IDM) and we evaluate it for future improvement. The IDM can capture processes at the informational (i.e. attributes), behavioural (i.e. project stage), organisational (i.e. actor), and functional (i.e. business rules) level. A case study on Made to Stock Products for the Design use has been created to drawn recommendations for the behavioural and informational IDM perspective. In order implement the LOD on an industry basis and for its ease of use, we recommend matching the IDM Exchange models to a LOD graphical standard and keeping the BPMN free of stage bindings. This issue should be further studied for standardisation purposes. The benefit of this approach is that manufacturers could use the IDM to create product model element data in relation to their client’s processes at different LODs for its inclusion within BIM Information Systems (IS)

    Sustainability Reporting Process Model using Business Intelligence

    Get PDF
    Sustainability including the reporting requirements is one of the most relevant topics for companies. In recent years, many software providers have launched new software tools targeting companies committed to implementing sustainability reporting. But it’s not only companies willing to use their Business Intelligence (BI) solution, there are also basic principles such as the single source of truth and tendencies to combine sustainability reporting with the financial reporting (Integrated Reporting) The IT integration of sustainability reporting has received limited attention by scientific research and can be facilitated using BI systems. This has to be done both to anticipate the economic demand for integrated reporting from an IT perspective as well as for ensuring the reporting of revisable data. Through the adaption of BI systems, necessary environmental and social changes can be addressed rather than merely displaying sustainability data from additional, detached systems or generic spreadsheet applications. This thesis presents research in the two domains sustainability reporting and Business Intelligence and provides a method to support companies willing to implement sustainability reporting with BI. SureBI presented within this thesis is developed to address experts from both sustainability and BI. At first BI is researched from a IT and project perspective and a novel BI reporting process is developed. Then, sustainability reporting is researched focusing on the reporting content and a sustainability reporting process is derived. Based on these two reporting processes SureBI is developed, a step-by-step process method, aiming to guide companies through the process of implementing sustainability reporting using their BI environment. Concluding, an evaluation and implementation assesses the suitability and correctness of the process model and exemplarily implements crucial IT tasks of the process. The novel combination of these two topics indicates challenges from both fields. In case of BI, users face problems regarding historically grown systems and lacking implementation strategies. In case of sustainability, the mostly voluntary manner of this reporting leads to an uncertainty as to which indicators have to be reported. The resulting SureBI addresses and highlights these challenges and provides methods for the addressing and prioritization of new stakeholders, the prioritization of the reporting content and describes possibilities to integrate the high amount of estimation figures using BI. Results prove that sustainability reporting could and should be implemented using existing BI solutions
    • …
    corecore