19,206 research outputs found
Learning Scheduling Algorithms for Data Processing Clusters
Efficiently scheduling data processing jobs on distributed compute clusters
requires complex algorithms. Current systems, however, use simple generalized
heuristics and ignore workload characteristics, since developing and tuning a
scheduling policy for each workload is infeasible. In this paper, we show that
modern machine learning techniques can generate highly-efficient policies
automatically. Decima uses reinforcement learning (RL) and neural networks to
learn workload-specific scheduling algorithms without any human instruction
beyond a high-level objective such as minimizing average job completion time.
Off-the-shelf RL techniques, however, cannot handle the complexity and scale of
the scheduling problem. To build Decima, we had to develop new representations
for jobs' dependency graphs, design scalable RL models, and invent RL training
methods for dealing with continuous stochastic job arrivals. Our prototype
integration with Spark on a 25-node cluster shows that Decima improves the
average job completion time over hand-tuned scheduling heuristics by at least
21%, achieving up to 2x improvement during periods of high cluster load
An Expressive Language and Efficient Execution System for Software Agents
Software agents can be used to automate many of the tedious, time-consuming
information processing tasks that humans currently have to complete manually.
However, to do so, agent plans must be capable of representing the myriad of
actions and control flows required to perform those tasks. In addition, since
these tasks can require integrating multiple sources of remote information ?
typically, a slow, I/O-bound process ? it is desirable to make execution as
efficient as possible. To address both of these needs, we present a flexible
software agent plan language and a highly parallel execution system that enable
the efficient execution of expressive agent plans. The plan language allows
complex tasks to be more easily expressed by providing a variety of operators
for flexibly processing the data as well as supporting subplans (for
modularity) and recursion (for indeterminate looping). The executor is based on
a streaming dataflow model of execution to maximize the amount of operator and
data parallelism possible at runtime. We have implemented both the language and
executor in a system called THESEUS. Our results from testing THESEUS show that
streaming dataflow execution can yield significant speedups over both
traditional serial (von Neumann) as well as non-streaming dataflow-style
execution that existing software and robot agent execution systems currently
support. In addition, we show how plans written in the language we present can
represent certain types of subtasks that cannot be accomplished using the
languages supported by network query engines. Finally, we demonstrate that the
increased expressivity of our plan language does not hamper performance;
specifically, we show how data can be integrated from multiple remote sources
just as efficiently using our architecture as is possible with a
state-of-the-art streaming-dataflow network query engine
Hyperprofile-based Computation Offloading for Mobile Edge Networks
In recent studies, researchers have developed various computation offloading
frameworks for bringing cloud services closer to the user via edge networks.
Specifically, an edge device needs to offload computationally intensive tasks
because of energy and processing constraints. These constraints present the
challenge of identifying which edge nodes should receive tasks to reduce
overall resource consumption. We propose a unique solution to this problem
which incorporates elements from Knowledge-Defined Networking (KDN) to make
intelligent predictions about offloading costs based on historical data. Each
server instance can be represented in a multidimensional feature space where
each dimension corresponds to a predicted metric. We compute features for a
"hyperprofile" and position nodes based on the predicted costs of offloading a
particular task. We then perform a k-Nearest Neighbor (kNN) query within the
hyperprofile to select nodes for offloading computation. This paper formalizes
our hyperprofile-based solution and explores the viability of using machine
learning (ML) techniques to predict metrics useful for computation offloading.
We also investigate the effects of using different distance metrics for the
queries. Our results show various network metrics can be modeled accurately
with regression, and there are circumstances where kNN queries using Euclidean
distance as opposed to rectilinear distance is more favorable.Comment: 5 pages, NSF REU Site publicatio
- …