11,286 research outputs found
Learning Scheduling Algorithms for Data Processing Clusters
Efficiently scheduling data processing jobs on distributed compute clusters
requires complex algorithms. Current systems, however, use simple generalized
heuristics and ignore workload characteristics, since developing and tuning a
scheduling policy for each workload is infeasible. In this paper, we show that
modern machine learning techniques can generate highly-efficient policies
automatically. Decima uses reinforcement learning (RL) and neural networks to
learn workload-specific scheduling algorithms without any human instruction
beyond a high-level objective such as minimizing average job completion time.
Off-the-shelf RL techniques, however, cannot handle the complexity and scale of
the scheduling problem. To build Decima, we had to develop new representations
for jobs' dependency graphs, design scalable RL models, and invent RL training
methods for dealing with continuous stochastic job arrivals. Our prototype
integration with Spark on a 25-node cluster shows that Decima improves the
average job completion time over hand-tuned scheduling heuristics by at least
21%, achieving up to 2x improvement during periods of high cluster load
Hyperprofile-based Computation Offloading for Mobile Edge Networks
In recent studies, researchers have developed various computation offloading
frameworks for bringing cloud services closer to the user via edge networks.
Specifically, an edge device needs to offload computationally intensive tasks
because of energy and processing constraints. These constraints present the
challenge of identifying which edge nodes should receive tasks to reduce
overall resource consumption. We propose a unique solution to this problem
which incorporates elements from Knowledge-Defined Networking (KDN) to make
intelligent predictions about offloading costs based on historical data. Each
server instance can be represented in a multidimensional feature space where
each dimension corresponds to a predicted metric. We compute features for a
"hyperprofile" and position nodes based on the predicted costs of offloading a
particular task. We then perform a k-Nearest Neighbor (kNN) query within the
hyperprofile to select nodes for offloading computation. This paper formalizes
our hyperprofile-based solution and explores the viability of using machine
learning (ML) techniques to predict metrics useful for computation offloading.
We also investigate the effects of using different distance metrics for the
queries. Our results show various network metrics can be modeled accurately
with regression, and there are circumstances where kNN queries using Euclidean
distance as opposed to rectilinear distance is more favorable.Comment: 5 pages, NSF REU Site publicatio
A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing
The overwhelmingly increasing amount of stored data has spurred researchers
seeking different methods in order to optimally take advantage of it which
mostly have faced a response time problem as a result of this enormous size of
data. Most of solutions have suggested materialization as a favourite solution.
However, such a solution cannot attain Real- Time answers anyhow. In this paper
we propose a framework illustrating the barriers and suggested solutions in the
way of achieving Real-Time OLAP answers that are significantly used in decision
support systems and data warehouses
Efficient Multi-way Theta-Join Processing Using MapReduce
Multi-way Theta-join queries are powerful in describing complex relations and
therefore widely employed in real practices. However, existing solutions from
traditional distributed and parallel databases for multi-way Theta-join queries
cannot be easily extended to fit a shared-nothing distributed computing
paradigm, which is proven to be able to support OLAP applications over immense
data volumes. In this work, we study the problem of efficient processing of
multi-way Theta-join queries using MapReduce from a cost-effective perspective.
Although there have been some works using the (key,value) pair-based
programming model to support join operations, efficient processing of multi-way
Theta-join queries has never been fully explored. The substantial challenge
lies in, given a number of processing units (that can run Map or Reduce tasks),
mapping a multi-way Theta-join query to a number of MapReduce jobs and having
them executed in a well scheduled sequence, such that the total processing time
span is minimized. Our solution mainly includes two parts: 1) cost metrics for
both single MapReduce job and a number of MapReduce jobs executed in a certain
order; 2) the efficient execution of a chain-typed Theta-join with only one
MapReduce job. Comparing with the query evaluation strategy proposed in [23]
and the widely adopted Pig Latin and Hive SQL solutions, our method achieves
significant improvement of the join processing efficiency.Comment: VLDB201
- …