298 research outputs found

    Model-driven Scheduling for Distributed Stream Processing Systems

    Full text link
    Distributed Stream Processing frameworks are being commonly used with the evolution of Internet of Things(IoT). These frameworks are designed to adapt to the dynamic input message rate by scaling in/out.Apache Storm, originally developed by Twitter is a widely used stream processing engine while others includes Flink, Spark streaming. For running the streaming applications successfully there is need to know the optimal resource requirement, as over-estimation of resources adds extra cost.So we need some strategy to come up with the optimal resource requirement for a given streaming application. In this article, we propose a model-driven approach for scheduling streaming applications that effectively utilizes a priori knowledge of the applications to provide predictable scheduling behavior. Specifically, we use application performance models to offer reliable estimates of the resource allocation required. Further, this intuition also drives resource mapping, and helps narrow the estimated and actual dataflow performance and resource utilization. Together, this model-driven scheduling approach gives a predictable application performance and resource utilization behavior for executing a given DSPS application at a target input stream rate on distributed resources.Comment: 54 page

    Workload Schedulers - Genesis, Algorithms and Comparisons

    Get PDF
    In this article we provide brief descriptions of three classes of schedulers: Operating Systems Process Schedulers, Cluster Systems, Jobs Schedulers and Big Data Schedulers. We describe their evolution from early adoptions to modern implementations, considering both the use and features of algorithms. In summary, we discuss differences between all presented classes of schedulers and discuss their chronological development. In conclusion, we highlight similarities in the focus of scheduling strategies design, applicable to both local and distributed systems

    CPU Energy-Aware Parallel Real-Time Scheduling

    Get PDF
    Both energy-efficiency and real-time performance are critical requirements in many embedded systems applications such as self-driving car, robotic system, disaster response, and security/safety control. These systems entail a myriad of real-time tasks, where each task itself is a parallel task that can utilize multiple computing units at the same time. Driven by the increasing demand for parallel tasks, multi-core embedded processors are inevitably evolving to many-core. Existing work on real-time parallel tasks mostly focused on real-time scheduling without addressing energy consumption. In this paper, we address hard real-time scheduling of parallel tasks while minimizing their CPU energy consumption on multicore embedded systems. Each task is represented as a directed acyclic graph (DAG) with nodes indicating different threads of execution and edges indicating their dependencies. Our technique is to determine the execution speeds of the nodes of the DAGs to minimize the overall energy consumption while meeting all task deadlines. It incorporates a frequency optimization engine and the dynamic voltage and frequency scaling (DVFS) scheme into the classical real-time scheduling policies (both federated and global) and makes them energy-aware. The contributions of this paper thus include the first energy-aware online federated scheduling and also the first energy-aware global scheduling of DAGs. Evaluation using synthetic workload through simulation shows that our energy-aware real-time scheduling policies can achieve up to 68% energy-saving compared to classical (energy-unaware) policies. We have also performed a proof of concept system evaluation using physical hardware demonstrating the energy efficiency through our proposed approach

    Adaptive space-time sharing with SCOJO.

    Get PDF
    Coscheduling is a technique used to improve the performance of parallel computer applications under time sharing, i.e., to provide better response times than standard time sharing or space sharing. Dynamic coscheduling and gang scheduling are two main forms of coscheduling. In SCOJO (Share-based Job Coscheduling), we have introduced our own original framework to employ loosely coordinated dynamic coscheduling and a dynamic directory service in support of scheduling cross-site jobs in grid scheduling. SCOJO guarantees effective CPU shares by taking coscheduling effects into consideration and supports both time and CPU share reservation for cross-site job. However, coscheduling leads to high memory pressure and still involves problems like fragmentation and context-switch overhead, especially when applying higher multiprogramming levels. As main part of this thesis, we employ gang scheduling as more directly suitable approach for combined space-time sharing and extend SCOJO for clusters to incorporate adaptive space sharing into gang scheduling. We focus on taking advantage of moldable and malleable characteristics of realistic job mixes to dynamically adapt to varying system workloads and flexibly reduce fragmentation. In addition, our adaptive scheduling approach applies standard job-scheduling techniques like a priority and aging system, backfilling or easy backfilling. We demonstrate by the results of a discrete-event simulation that this dynamic adaptive space-time sharing approach can deliver better response times and bounded relative response times even with a lower multiprogramming level than traditional gang scheduling.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .H825. Source: Masters Abstracts International, Volume: 43-01, page: 0237. Adviser: A. Sodan. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    Multi-resource management in embedded real-time systems

    Get PDF
    This thesis addresses the problem of online multi-resource management in embedded real-time systems. It focuses on three research questions. The first question concentrates on how to design an efficient hierarchical scheduling framework for supporting independent development and analysis of component based systems, to provide temporal isolation between components. The second question investigates how to change the mapping of resources to tasks and components during run-time efficiently and predictably, and how to analyze the latency of such a system mode change in systems comprised of several scalable components. The third question deals with the scheduling and analysis of a set of parallel-tasks with real-time constraints which require simultaneous access to several different resources. For providing temporal isolation we chose a reservation-based approach. We first focused on processor reservations, where timed events play an important role. Common examples are task deadlines, periodic release of tasks, budget replenishment and budget depletion. Efficient timer management is therefore essential. We investigated the overheads in traditional timer management techniques and presented a mechanism called Relative Timed Event Queues (RELTEQ), which provides an expressive set of primitives at a low processor and memory overhead. We then leveraged RELTEQ to create an efficient, modular and extensible design for enhancing a real-time operating system with periodic tasks, polling, idling periodic and deferrable servers, and a two-level fixed-priority Hierarchical Scheduling Framework (HSF). The HSF design provides temporal isolation and supports independent development of components by separating the global and local scheduling, and allowing each server to define a dedicated scheduler. Furthermore, the design addresses the system overheads inherent to an HSF and prevents undesirable interference between components. It limits the interference of inactive servers on the system level by means of wakeup events and a combination of inactive server queues with a stopwatch queue. Our implementation is modular and requires only a few modifications of the underlying operating system. We then investigated scalable components operating in a memory-constrained system. We first showed how to reduce the memory requirements in a streaming multimedia application, based on a particular priority assignment of the different components along the processing chain. Then we investigated adapting the resource provisions to tasks during runtime, referred to as mode changes. We presented a novel mode change protocol called Swift Mode Changes, which relies on Fixed Priority with Deferred preemption Scheduling to reduce the mode change latency bound compared to existing protocols based on Fixed Priority Preemptive Scheduling. We then presented a new partitioned parallel-task scheduling algorithm called Parallel-SRP (PSRP), which generalizes MSRP for multiprocessors, and the corresponding schedulability analysis for the problem of multi-resource scheduling of parallel tasks with real-time constraints. We showed that the algorithm is deadlock-free, derived a maximum bound on blocking, and used this bound as a basis for a schedulability test. We then demonstrated how PSRP can exploit the inherent parallelism of a platform comprised of multiple heterogeneous resources. Finally, we presented Grasp, which is a visualization toolset aiming to provide insight into the behavior of complex real-time systems. Its flexible plugin infrastructure allows for easy extension with custom visualization and analysis techniques for automatic trace verification. Its capabilities include the visualization of hierarchical multiprocessor systems, including partitioned and global multiprocessor scheduling with migrating tasks and jobs, communication between jobs via shared memory and message passing, and hierarchical scheduling in combination with multiprocessor scheduling. For tracing distributed systems with asynchronous local clocks Grasp also supports the synchronization of traces from different processors during the visualization and analysis

    Doctor of Philosophy

    Get PDF
    dissertationDataflow pipeline models are widely used in visualization systems. Despite recent advancements in parallel architecture, most systems still support only a single CPU or a small collection of CPUs such as a SMP workstation. Even for systems that are specifically tuned towards parallel visualization, their execution models only provide support for data-parallelism while ignoring taskparallelism and pipeline-parallelism. With the recent popularization of machines equipped with multicore CPUs and multi-GPU units, these visualization systems are undoubtedly falling further behind in reaching maximum efficiency. On the other hand, there exist several libraries that can schedule program executions on multiple CPUs and/or multiple GPUs. However, due to differences in executing a task graph and a pipeline along with their APIs being considerably low-level, it still remains a challenge to integrate these run-time libraries into current visualization systems. Thus, there is a need for a redesigned dataflow architecture to fully support and exploit the power of highly parallel machines in large-scale visualization. The new design must be able to schedule executions on heterogeneous platforms while at the same time supporting arbitrarily large datasets through the use of streaming data structures. The primary goal of this dissertation work is to develop a parallel dataflow architecture for streaming large-scale visualizations. The framework includes supports for platforms ranging from multicore processors to clusters consisting of thousands CPUs and GPUs. We achieve this in our system by introducing the notion of Virtual Processing Elements and Task-Oriented Modules along with a highly customizable scheduler that controls the assignment of tasks to elements dynamically. This creates an intuitive way to maintain multiple CPU/GPU kernels yet still provide coherency and synchronization across module executions. We have implemented these techniques into HyperFlow which is made of an API with all basic dataflow constructs described in the dissertation, and a distributed run-time library that can be used to deploy those pipelines on multicore, multi-GPU and cluster-based platforms

    Evaluation of the parallel computational capabilities of embedded platforms for critical systems

    Get PDF
    Modern critical systems need higher performance which cannot be delivered by the simple architectures used so far. Latest embedded architectures feature multi-cores and GPUs, which can be used to satisfy this need. In this thesis we parallelise relevant applications from multiple critical domains represented in the GPU4S benchmark suite, and perform a comparison of the parallel capabilities of candidate platforms for use in critical systems. In particular, we port the open source GPU4S Bench benchmarking suite in the OpenMP programming model, and we benchmark the candidate embedded heterogeneous multi-core platforms of the H2020 UP2DATE project, NVIDIA TX2, NVIDIA Xavier and Xilinx Zynq Ultrascale+, in order to drive the selection of the research platform which will be used in the next phases of the project. Our result indicate that in terms of CPU and GPU performance, the NVIDIA Xavier is the highest performing platform

    Deterministic Scheduling of Real-Time Tasks on Heterogeneous Multicore Platforms

    Get PDF
    In recent years, the problem of real-time scheduling has increasingly become more important as well as more complicated. The former is due to the proliferation of safety critical systems into our day-to-day life; such as autonomous vehicles, fueled by the recent advances in artificial intelligence. The latter is caused by the increasing demand for high performance which is driving the adoption of highly integrated complex heterogeneous system-on-chip (SoC) processors to deliver the performance while meeting strict size, weight, power (SWaP) and cost constraints. Motivated by these trends, this dissertation tackles the following main question: how can we guarantee predictable real-time execution on heterogeneous multicore SoCs while preserving high utilization? The fundamental problem in preserving the determinism of the real-time system realized on a heterogeneous multicore SoC is ensuring that the worst-case execution time (WCET) of each task, measured in isolation, will stay within a reasonable bound during the actual execution of the system. The primary challenge in achieving this goal---tightly bounding task WCETs---is that the execution time of a task can be highly non-deterministic, often varying significantly depending on which tasks are co-scheduled and how they contend on various shared hardware resources in the memory hierarchy. The particular scheduling requirements (e.g., non-preemption) of the different computing resources (e.g., integrated GPU) in the heterogeneous SoC and the possible cross-contention among their workloads can also exacerbate this problem. In light of these considerations, this dissertation presents new real-time scheduling techniques for predictable and efficient scheduling of mixed criticality workloads on heterogeneous SoCs. The contributions of this dissertation include the following: 1) A novel CPU-GPU scheduling framework that ensures predictable execution of critical GPU kernels on integrated CPU-GPU platforms. 2) A novel gang scheduling framework which guarantees deterministic execution of parallel real-time tasks on the multicore CPU cluster of a heterogeneous SoC. 3) Optimal and heuristic algorithms for gang formation that increase real-time schedulability under the RT-Gang framework and their extension to incorporate scheduling on accelerators in a heterogeneous SoC. 4) Concrete evaluation results using simulated tasksets as well as real-world workloads that demonstrate the analytical and practical benefits of the proposed techniques

    A flexible simulation framework for processor scheduling algorithms in multicore systems.

    Get PDF
    In traditional uniprocessor systems, processor scheduling is the responsibility of the operating system. In high performance computing (HPC) domains that largely involve parallel processors, the responsibility of scheduling is usually left to the applications. So far, parallel computing has been confined to a small group of specialized HPC users. In this context, the hardware, operating system, and the applications have been mostly designed independently with minimal interactions. As the multicore processors are becoming the norm, parallel programming is expected to emerge as the mainstream software development approach. This new trend poses several challenges including performance, power management, system utilization, and predictable response. Such a demand is hard to meet without the cooperation from hardware, operating system, and applications. Particularly, an efficient scheduling of cores to the application threads is fundamentally important in assuring the above mentioned characteristics. We believe, operating system requires to take a larger responsibility in ensuring efficient multicore scheduling of application threads. To study the performance of a new scheduling algorithm for the future multicore systems with hundreds and thousands of cores, we need a flexible scheduling simulation testbed. Designing such a multicore scheduling simulation testbed and illustrating its functionality by studying some well known scheduling algorithms Linux and Solaris are the main contributions of this thesis. In addition to studying Linux and Solaris scheduling algorithms, we demonstrate the power, flexibility, and use of the proposed scheduling testbed by simulating two popular gang scheduling algorithms - adaptive first-come-first-served (AFCFS) and largest gang first served (LGFS). As a result of this performance study, we designed a new gang scheduling algorithm and we compared its performance with AFCFS. The proposed scheduling simulation testbed is developed using Java and expected to be released for public use.The original print copy of this thesis may be available here: http://wizard.unbc.ca/record=b180562
    corecore