1,098 research outputs found

    A Domain Specific Model for Generating ETL Workflows from Business Intents

    Get PDF
    Extract-Transform-Load (ETL) tools have provided organizations with the ability to build and maintain workflows (consisting of graphs of data transformation tasks) that can process the flood of digital data. Currently, however, the specification of ETL workflows is largely manual, human time intensive, and error prone. As these workflows become increasingly complex, the users that build and maintain them must retain an increasing amount of knowledge specific to how to produce solutions to business objectives using their domain\u27s ETL workflow system. A program that can reduce the human time and expertise required to define such workflows, producing accurate ETL solutions with fewer errors would therefore be valuable. This dissertation presents a means to automate the specification of ETL workflows using a domain-specific modeling language. To provide such a solution, the knowledge relevant to the construction of ETL workflows for the operations and objectives of a given domain is identified and captured. The approach provides a rich model of ETL workflow capable of representing such knowledge. This knowledge representation is leveraged by a domain-specific modeling language which maps declarative statements into workflow requirements. Users are then provided with the ability to assertionally express the intents that describe a desired ETL solution at a high-level of abstraction, from which procedural workflows satisfying the intent specification are automatically generated using a planner

    Model-Driven Development of Complex and Data-Intensive Integration Processes

    Get PDF
    Due to the changing scope of data management from centrally stored data towards the management of distributed and heterogeneous systems, the integration takes place on different levels. The lack of standards for information integration as well as application integration resulted in a large number of different integration models and proprietary solutions. With the aim of a high degree of portability and the reduction of development efforts, the model-driven development—following the Model-Driven Architecture (MDA)—is advantageous in this context as well. Hence, in the GCIP project (Generation of Complex Integration Processes), we focus on the model-driven generation and optimization of integration tasks using a process-based approach. In this paper, we contribute detailed generation aspects and finally discuss open issues and further challenges

    Quality measures for ETL processes: from goals to implementation

    Get PDF
    Extraction transformation loading (ETL) processes play an increasingly important role for the support of modern business operations. These business processes are centred around artifacts with high variability and diverse lifecycles, which correspond to key business entities. The apparent complexity of these activities has been examined through the prism of business process management, mainly focusing on functional requirements and performance optimization. However, the quality dimension has not yet been thoroughly investigated, and there is a need for a more human-centric approach to bring them closer to business-users requirements. In this paper, we take a first step towards this direction by defining a sound model for ETL process quality characteristics and quantitative measures for each characteristic, based on existing literature. Our model shows dependencies among quality characteristics and can provide the basis for subsequent analysis using goal modeling techniques. We showcase the use of goal modeling for ETL process design through a use case, where we employ the use of a goal model that includes quantitative components (i.e., indicators) for evaluation and analysis of alternative design decisions.Peer ReviewedPostprint (author's final draft

    International conference on software engineering and knowledge engineering: Session chair

    Get PDF
    The Thirtieth International Conference on Software Engineering and Knowledge Engineering (SEKE 2018) will be held at the Hotel Pullman, San Francisco Bay, USA, from July 1 to July 3, 2018. SEKE2018 will also be dedicated in memory of Professor Lofti Zadeh, a great scholar, pioneer and leader in fuzzy sets theory and soft computing. The conference aims at bringing together experts in software engineering and knowledge engineering to discuss on relevant results in either software engineering or knowledge engineering or both. Special emphasis will be put on the transference of methods between both domains. The theme this year is soft computing in software engineering & knowledge engineering. Submission of papers and demos are both welcome

    Big Data Analytics for QoS Prediction Through Probabilistic Model Checking

    Get PDF
    As competitiveness increases, being able to guaranting QoS of delivered services is key for business success. It is thus of paramount importance the ability to continuously monitor the workflow providing a service and to timely recognize breaches in the agreed QoS level. The ideal condition would be the possibility to anticipate, thus predict, a breach and operate to avoid it, or at least to mitigate its effects. In this paper we propose a model checking based approach to predict QoS of a formally described process. The continous model checking is enabled by the usage of a parametrized model of the monitored system, where the actual value of parameters is continuously evaluated and updated by means of big data tools. The paper also describes a prototype implementation of the approach and shows its usage in a case study.Comment: EDCC-2014, BIG4CIP-2014, Big Data Analytics, QoS Prediction, Model Checking, SLA compliance monitorin

    Cost-Based Optimization of Integration Flows

    Get PDF
    Integration flows are increasingly used to specify and execute data-intensive integration tasks between heterogeneous systems and applications. There are many different application areas such as real-time ETL and data synchronization between operational systems. For the reasons of an increasing amount of data, highly distributed IT infrastructures, and high requirements for data consistency and up-to-dateness of query results, many instances of integration flows are executed over time. Due to this high load and blocking synchronous source systems, the performance of the central integration platform is crucial for an IT infrastructure. To tackle these high performance requirements, we introduce the concept of cost-based optimization of imperative integration flows that relies on incremental statistics maintenance and inter-instance plan re-optimization. As a foundation, we introduce the concept of periodical re-optimization including novel cost-based optimization techniques that are tailor-made for integration flows. Furthermore, we refine the periodical re-optimization to on-demand re-optimization in order to overcome the problems of many unnecessary re-optimization steps and adaptation delays, where we miss optimization opportunities. This approach ensures low optimization overhead and fast workload adaptation
    • …
    corecore