432 research outputs found
Learning Scheduling Algorithms for Data Processing Clusters
Efficiently scheduling data processing jobs on distributed compute clusters
requires complex algorithms. Current systems, however, use simple generalized
heuristics and ignore workload characteristics, since developing and tuning a
scheduling policy for each workload is infeasible. In this paper, we show that
modern machine learning techniques can generate highly-efficient policies
automatically. Decima uses reinforcement learning (RL) and neural networks to
learn workload-specific scheduling algorithms without any human instruction
beyond a high-level objective such as minimizing average job completion time.
Off-the-shelf RL techniques, however, cannot handle the complexity and scale of
the scheduling problem. To build Decima, we had to develop new representations
for jobs' dependency graphs, design scalable RL models, and invent RL training
methods for dealing with continuous stochastic job arrivals. Our prototype
integration with Spark on a 25-node cluster shows that Decima improves the
average job completion time over hand-tuned scheduling heuristics by at least
21%, achieving up to 2x improvement during periods of high cluster load
Model-driven Scheduling for Distributed Stream Processing Systems
Distributed Stream Processing frameworks are being commonly used with the
evolution of Internet of Things(IoT). These frameworks are designed to adapt to
the dynamic input message rate by scaling in/out.Apache Storm, originally
developed by Twitter is a widely used stream processing engine while others
includes Flink, Spark streaming. For running the streaming applications
successfully there is need to know the optimal resource requirement, as
over-estimation of resources adds extra cost.So we need some strategy to come
up with the optimal resource requirement for a given streaming application. In
this article, we propose a model-driven approach for scheduling streaming
applications that effectively utilizes a priori knowledge of the applications
to provide predictable scheduling behavior. Specifically, we use application
performance models to offer reliable estimates of the resource allocation
required. Further, this intuition also drives resource mapping, and helps
narrow the estimated and actual dataflow performance and resource utilization.
Together, this model-driven scheduling approach gives a predictable application
performance and resource utilization behavior for executing a given DSPS
application at a target input stream rate on distributed resources.Comment: 54 page
Generalizing List Scheduling for Stochastic Soft Real-time Parallel Applications
Advanced architecture processors provide features such as caches and branch prediction that result in improved, but variable, execution time of software. Hard real-time systems require tasks to complete within timing constraints. Consequently, hard real-time systems are typically designed conservatively through the use of tasks? worst-case execution times (WCET) in order to compute deterministic schedules that guarantee task?s execution within giving time constraints. This use of pessimistic execution time assumptions provides real-time guarantees at the cost of decreased performance and resource utilization. In soft real-time systems, however, meeting deadlines is not an absolute requirement (i.e., missing a few deadlines does not severely degrade system performance or cause catastrophic failure). In such systems, a guaranteed minimum probability of completing by the deadline is sufficient. Therefore, there is considerable latitude in such systems for improving resource utilization and performance as compared with hard real-time systems, through the use of more realistic execution time assumptions. Given probability distribution functions (PDFs) representing tasks? execution time requirements, and tasks? communication and precedence requirements, represented as a directed acyclic graph (DAG), this dissertation proposes and investigates algorithms for constructing non-preemptive stochastic schedules. New PDF manipulation operators developed in this dissertation are used to compute tasks? start and completion time PDFs during schedule construction. PDFs of the schedules? completion times are also computed and used to systematically trade the probability of meeting end-to-end deadlines for schedule length and jitter in task completion times. Because of the NP-hard nature of the non-preemptive DAG scheduling problem, the new stochastic scheduling algorithms extend traditional heuristic list scheduling and genetic list scheduling algorithms for DAGs by using PDFs instead of fixed time values for task execution requirements. The stochastic scheduling algorithms also account for delays caused by communication contention, typically ignored in prior DAG scheduling research. Extensive experimental results are used to demonstrate the efficacy of the new algorithms in constructing stochastic schedules. Results also show that through the use of the techniques developed in this dissertation, the probability of meeting deadlines can be usefully traded for performance and jitter in soft real-time systems
A Study Of Genetic Representation Schemes For Scheduling Soft Real-Time Systems
This research presents a hybrid algorithm that combines List Scheduling (LS) with a Genetic Algorithm (GA) for constructing non-preemptive schedules for soft real-time parallel applications represented as directed acyclic graphs (DAGs). The execution time requirements of the applications\u27 tasks are assumed to be stochastic and are represented as probability distribution functions. The performance in terms of schedule lengths for three different genetic representation schemes are evaluated and compared for a number of different DAGs. The approaches presented in this research produce shorter schedules than HLFET, a popular LS approach for all of the sample problems. Of the three genetic representation schemes investigated, PosCT, the technique that allows the GA to learn which tasks to delay in order to allow other tasks to complete produced the shortest schedules for a majority of the sample DAGs
Response-Time Analysis of Limited-Preemptive Parallel DAG Tasks Under Global Scheduling
Most recurrent real-time applications can be modeled as a set of sequential code segments (or blocks) that must be (repeatedly) executed in a specific order. This paper provides a schedulability analysis for such systems modeled as a set of parallel DAG tasks executed under any limited-preemptive global job-level fixed priority scheduling policy. More precisely, we derive response-time bounds for a set of jobs subject to precedence constraints, release jitter, and execution-time uncertainty, which enables support for a wide variety of parallel, limited-preemptive execution models (e.g., periodic DAG tasks, transactional tasks, generalized multi-frame tasks, etc.). Our analysis explores the space of all possible schedules using a powerful new state abstraction and state-pruning technique. An empirical evaluation shows the analysis to identify between 10 to 90 percentage points more schedulable task sets than the state-of-the-art schedulability test for limited-preemptive sporadic DAG tasks. It scales to systems of up to 64 cores with 20 DAG tasks. Moreover, while our analysis is almost as accurate as the state-of-the-art exact schedulability test based on model checking (for sequential non-preemptive tasks), it is three orders of magnitude faster and hence capable of analyzing task sets with more than 60 tasks on 8 cores in a few seconds
- …