2 research outputs found

    Cost-Based Optimization of Integration Flows

    Get PDF
    Integration flows are increasingly used to specify and execute data-intensive integration tasks between heterogeneous systems and applications. There are many different application areas such as real-time ETL and data synchronization between operational systems. For the reasons of an increasing amount of data, highly distributed IT infrastructures, and high requirements for data consistency and up-to-dateness of query results, many instances of integration flows are executed over time. Due to this high load and blocking synchronous source systems, the performance of the central integration platform is crucial for an IT infrastructure. To tackle these high performance requirements, we introduce the concept of cost-based optimization of imperative integration flows that relies on incremental statistics maintenance and inter-instance plan re-optimization. As a foundation, we introduce the concept of periodical re-optimization including novel cost-based optimization techniques that are tailor-made for integration flows. Furthermore, we refine the periodical re-optimization to on-demand re-optimization in order to overcome the problems of many unnecessary re-optimization steps and adaptation delays, where we miss optimization opportunities. This approach ensures low optimization overhead and fast workload adaptation

    On-demand re-optimization of integration flows

    Get PDF
    Integration flows are used to propagate data between heterogeneous operational systems or to consolidate data into data warehouse infrastructures. In order to meet the increasing need of up-to-date information, many messages are exchanged over time. The efficiency of those integration flows is therefore crucial to handle the high load of messages and to reduce message latency. State-of-the-art strategies to address this performance bottleneck are based on incremental statistic maintenance and periodic cost-based re-optimization. This also achieves adaptation to unknown statistics and changing workload characteristics, which is important since integration flows are deployed for long time horizons. However, the major drawbacks of periodic re-optimization are many unnecessary re-optimization steps and missed optimization opportunities due to adaptation delays. In this paper, we therefore propose the novel concept of on-demand re-optimization. We exploit optimality conditions from the optimizer in order to (1) monitor optimality of the current plan, and (2) trigger directed re-optimization only if necessary. Furthermore, we introduce the PlanOptimalityTree as a compact representation of optimality conditions that enables efficient monitoring and exploitation of these conditions. As a result and in contrast to existing work, re-optimization is immediately triggered but only if a new plan is certain to be found. Our experiments show that we achieve near-optimal re-optimization overhead and fast workload adaptation
    corecore