6,088 research outputs found

    Multi-objective scheduling for real-time data warehouses

    Get PDF
    The issue of write-read contention is one of the most prevalent problems when deploying real-time data warehouses. With increasing load, updates are increasingly delayed and previously fast queries tend to be slowed down considerably. However, depending on the user requirements, we can improve the response time or the data quality by scheduling the queries and updates appropriately. If both criteria are to be considered simultaneously, we are faced with a so-called multi-objective optimization problem. We transformed this problem into a knapsack problem with additional inequalities and solved it efficiently. Based on our solution, we developed a scheduling approach that provides the optimal schedule with regard to the user requirements at any given point in time. We evaluated our scheduling in an extensive experimental study, where we compared our approach with the respective optimal schedule policies of each single optimization objective

    Age-Optimal Information Updates in Multihop Networks

    Full text link
    The problem of reducing the age-of-information has been extensively studied in the single-hop networks. In this paper, we minimize the age-of-information in general multihop networks. If the packet transmission times over the network links are exponentially distributed, we prove that a preemptive Last Generated First Served (LGFS) policy results in smaller age processes at all nodes of the network (in a stochastic ordering sense) than any other causal policy. In addition, for arbitrary general distributions of packet transmission times, the non-preemptive LGFS policy is shown to minimize the age processes at all nodes of the network among all non-preemptive work-conserving policies (again in a stochastic ordering sense). It is surprising that such simple policies can achieve optimality of the joint distribution of the age processes at all nodes even under arbitrary network topologies, as well as arbitrary packet generation and arrival times. These optimality results not only hold for the age processes, but also for any non-decreasing functional of the age processes.Comment: arXiv admin note: text overlap with arXiv:1603.0618

    Continuous Workflows: From Model to Enactment System

    Get PDF
    Workflows are actively being used in both business and scientific domains to automate processes and facilitate collaboration. A workflow management (or enactment) system (WfMS) defines, creates and manages the execution of workflows on one or more workflow engines, which are able to interpret workflow definitions, allocate resources, interact with workflow participants and, where required, invoke the needed tools (e.g., databases, job schedulers, etc.) and applications. Traditional WfMSs and workflow design processes view the workflow as a one-time interaction with the various data sources, i.e., when a workflow is invoked, its steps are executed once and in-order. The fundamental underlying assumption has been that data sources are passive and all interactions are structured along the request/reply (query) model. Hence, traditional WfMS cannot effectively support business or scientific monitoring applications that require the processing of data streams such as those generated by sensing devices as well as mobile and web applications. It is the hypothesis of this dissertation that Workflow Management Systems can be extended to support data stream semantics to enable monitoring applications. This includes the ability to apply flexible bounds on unbounded data streams and the ability to facilitate on-the-fly processing of bounded bundles of data (window semantics). To support this hypothesis this dissertation has produced new specifications, a design, an implementation and a thorough evaluation of a novel Continuous Workflows (CWf) model, which is backwards compatible with currently available workflow models. The CWf model was implemented in a CONtinuous workFLow ExeCution Engine, CONFLuEnCE, as an extension of Kepler, which is a popular scientific WfMS. The applicability of the CWf model in both scientific and business applications was demonstrated by utilizing CONFLuEnCE in Astroshelf to support live annotations (i.e., monitoring of astronomical data), and to support supply chain monitoring and management. The implementation of CONFLuEnCE led to the realization that different applications have different performance requirements and hence an integrated workflow scheduling framework is essential. Towards meeting this need, STAFiLOS, a Stream FLOw Scheduling framework for Continuous Workflows, was designed and implemented, within CONFLuEnCE. The performance of STAFiLOS was evaluated using the Linear Road Benchmark for continuous workflows

    SCHEDULING OF UPDATES IN DATA WAREHOUSES

    Get PDF
    ABSTRACT A stream warehouse enables queries that seamlessly range from realtime alerting and diagnostics to long-term data mining. Continuously loading data from many different and uncontrolled sources into a real-time stream warehouse introduces a new consistency problem: users want results in as timely a fashion as possible, but "stable" results often require lengthy synchronization delays. In this paper we develop a theory of temporal consistency for stream warehouses that allows for multiple consistency levels. We model the streaming warehouse update problem as a scheduling problem, where jobs correspond to processes that load new data into tables, and whose objective is to minimize data staleness over time

    Optimising HYBRIDJOIN to Process Semi-Stream Data in Near-real-time Data Warehousing

    Get PDF
    Near-real-time data warehousing plays an essential role for decision making in organizations where latest data is to be fed from various data sources on near-real-time basis. The stream of sales data coming from data sources needs to be transformed to the data warehouse format using disk-based master data. This transformation process is a challenging task due to slow disk access rate as compare to the fast stream data. For this purpose, an adaptive semi-stream join algorithm called HYBRIDJOIN (Hybrid Join) is presented in the literature. The algorithm uses a single buffer to load partitions from the master data. Therefore, the algorithm has to wait until the next disk partition overwrites the existing partition in the buffer. As the cost of loading the disk partition into the buffer is a major cost in the total algorithm’s processing cost, this leaves the performance of the algorithm sub-optimal. This paper presents optimisation of existing HYBRIDJOIN by introducing another buffer. This enables the algorithm to load the second buffer while the first one is under join execution. This reduces the time that the algorithm wait for loading of master data partition and consequently, this improves the performance of the algorithm significantly
    • …
    corecore