42,180 research outputs found
Experience-driven Networking: A Deep Reinforcement Learning based Approach
Modern communication networks have become very complicated and highly
dynamic, which makes them hard to model, predict and control. In this paper, we
develop a novel experience-driven approach that can learn to well control a
communication network from its own experience rather than an accurate
mathematical model, just as a human learns a new skill (such as driving,
swimming, etc). Specifically, we, for the first time, propose to leverage
emerging Deep Reinforcement Learning (DRL) for enabling model-free control in
communication networks; and present a novel and highly effective DRL-based
control framework, DRL-TE, for a fundamental networking problem: Traffic
Engineering (TE). The proposed framework maximizes a widely-used utility
function by jointly learning network environment and its dynamics, and making
decisions under the guidance of powerful Deep Neural Networks (DNNs). We
propose two new techniques, TE-aware exploration and actor-critic-based
prioritized experience replay, to optimize the general DRL framework
particularly for TE. To validate and evaluate the proposed framework, we
implemented it in ns-3, and tested it comprehensively with both representative
and randomly generated network topologies. Extensive packet-level simulation
results show that 1) compared to several widely-used baseline methods, DRL-TE
significantly reduces end-to-end delay and consistently improves the network
utility, while offering better or comparable throughput; 2) DRL-TE is robust to
network changes; and 3) DRL-TE consistently outperforms a state-ofthe-art DRL
method (for continuous control), Deep Deterministic Policy Gradient (DDPG),
which, however, does not offer satisfying performance.Comment: 9 pages, 12 figures, paper is accepted as a conference paper at IEEE
Infocom 201
Learning Scheduling Algorithms for Data Processing Clusters
Efficiently scheduling data processing jobs on distributed compute clusters
requires complex algorithms. Current systems, however, use simple generalized
heuristics and ignore workload characteristics, since developing and tuning a
scheduling policy for each workload is infeasible. In this paper, we show that
modern machine learning techniques can generate highly-efficient policies
automatically. Decima uses reinforcement learning (RL) and neural networks to
learn workload-specific scheduling algorithms without any human instruction
beyond a high-level objective such as minimizing average job completion time.
Off-the-shelf RL techniques, however, cannot handle the complexity and scale of
the scheduling problem. To build Decima, we had to develop new representations
for jobs' dependency graphs, design scalable RL models, and invent RL training
methods for dealing with continuous stochastic job arrivals. Our prototype
integration with Spark on a 25-node cluster shows that Decima improves the
average job completion time over hand-tuned scheduling heuristics by at least
21%, achieving up to 2x improvement during periods of high cluster load
Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning
Recent advances in combining deep neural network architectures with
reinforcement learning techniques have shown promising potential results in
solving complex control problems with high dimensional state and action spaces.
Inspired by these successes, in this paper, we build two kinds of reinforcement
learning algorithms: deep policy-gradient and value-function based agents which
can predict the best possible traffic signal for a traffic intersection. At
each time step, these adaptive traffic light control agents receive a snapshot
of the current state of a graphical traffic simulator and produce control
signals. The policy-gradient based agent maps its observation directly to the
control signal, however the value-function based agent first estimates values
for all legal control signals. The agent then selects the optimal control
action with the highest value. Our methods show promising results in a traffic
network simulated in the SUMO traffic simulator, without suffering from
instability issues during the training process
- …