25,105 research outputs found
Applications of Deep Reinforcement Learning in Communications and Networking: A Survey
This paper presents a comprehensive literature review on applications of deep
reinforcement learning in communications and networking. Modern networks, e.g.,
Internet of Things (IoT) and Unmanned Aerial Vehicle (UAV) networks, become
more decentralized and autonomous. In such networks, network entities need to
make decisions locally to maximize the network performance under uncertainty of
network environment. Reinforcement learning has been efficiently used to enable
the network entities to obtain the optimal policy including, e.g., decisions or
actions, given their states when the state and action spaces are small.
However, in complex and large-scale networks, the state and action spaces are
usually large, and the reinforcement learning may not be able to find the
optimal policy in reasonable time. Therefore, deep reinforcement learning, a
combination of reinforcement learning with deep learning, has been developed to
overcome the shortcomings. In this survey, we first give a tutorial of deep
reinforcement learning from fundamental concepts to advanced models. Then, we
review deep reinforcement learning approaches proposed to address emerging
issues in communications and networking. The issues include dynamic network
access, data rate control, wireless caching, data offloading, network security,
and connectivity preservation which are all important to next generation
networks such as 5G and beyond. Furthermore, we present applications of deep
reinforcement learning for traffic routing, resource sharing, and data
collection. Finally, we highlight important challenges, open issues, and future
research directions of applying deep reinforcement learning.Comment: 37 pages, 13 figures, 6 tables, 174 reference paper
Tactics of Adversarial Attack on Deep Reinforcement Learning Agents
We introduce two tactics to attack agents trained by deep reinforcement
learning algorithms using adversarial examples, namely the strategically-timed
attack and the enchanting attack. In the strategically-timed attack, the
adversary aims at minimizing the agent's reward by only attacking the agent at
a small subset of time steps in an episode. Limiting the attack activity to
this subset helps prevent detection of the attack by the agent. We propose a
novel method to determine when an adversarial example should be crafted and
applied. In the enchanting attack, the adversary aims at luring the agent to a
designated target state. This is achieved by combining a generative model and a
planning algorithm: while the generative model predicts the future states, the
planning algorithm generates a preferred sequence of actions for luring the
agent. A sequence of adversarial examples is then crafted to lure the agent to
take the preferred sequence of actions. We apply the two tactics to the agents
trained by the state-of-the-art deep reinforcement learning algorithm including
DQN and A3C. In 5 Atari games, our strategically timed attack reduces as much
reward as the uniform attack (i.e., attacking at every time step) does by
attacking the agent 4 times less often. Our enchanting attack lures the agent
toward designated target states with a more than 70% success rate. Videos are
available at http://yenchenlin.me/adversarial_attack_RL/Comment: To Appear at IJCAI 2017. Project website:
http://yenchenlin.me/adversarial_attack_RL
Two Can Play That Game: An Adversarial Evaluation of a Cyber-alert Inspection System
Cyber-security is an important societal concern. Cyber-attacks have increased
in numbers as well as in the extent of damage caused in every attack. Large
organizations operate a Cyber Security Operation Center (CSOC), which form the
first line of cyber-defense. The inspection of cyber-alerts is a critical part
of CSOC operations. A recent work, in collaboration with Army Research Lab, USA
proposed a reinforcement learning (RL) based approach to prevent the
cyber-alert queue length from growing large and overwhelming the defender.
Given the potential deployment of this approach to CSOCs run by US defense
agencies, we perform a red team (adversarial) evaluation of this approach.
Further, with the recent attacks on learning systems, it is even more important
to test the limits of this RL approach. Towards that end, we learn an
adversarial alert generation policy that is a best response to the defender
inspection policy. Surprisingly, we find the defender policy to be quite robust
to the best response of the attacker. In order to explain this observation, we
extend the earlier RL model to a game model and show that there exists defender
policies that can be robust against any adversarial policy. We also derive a
competitive baseline from the game theory model and compare it to the RL
approach. However, we go further to exploit assumptions made in the MDP in the
RL model and discover an attacker policy that overwhelms the defender. We use a
double oracle approach to retrain the defender with episodes from this
discovered attacker policy. This made the defender robust to the discovered
attacker policy and no further harmful attacker policies were discovered.
Overall, the adversarial RL and double oracle approach in RL are general
techniques that are applicable to other RL usage in adversarial environments
Deep Neural Networks in High Frequency Trading
The ability to give precise and fast prediction for the price movement of
stocks is the key to profitability in High Frequency Trading. The main
objective of this paper is to propose a novel way of modeling the high
frequency trading problem using Deep Neural Networks at its heart and to argue
why Deep Learning methods can have a lot of potential in the field of High
Frequency Trading. The paper goes on to analyze the model's performance based
on it's prediction accuracy as well as prediction speed across full-day trading
simulations.Comment: Submitted in IEEE Transactions on Neural Networks and Learning
Systems. Copyright 2018 IEE
Mitigation of Policy Manipulation Attacks on Deep Q-Networks with Parameter-Space Noise
Recent developments have established the vulnerability of deep reinforcement
learning to policy manipulation attacks via intentionally perturbed inputs,
known as adversarial examples. In this work, we propose a technique for
mitigation of such attacks based on addition of noise to the parameter space of
deep reinforcement learners during training. We experimentally verify the
effect of parameter-space noise in reducing the transferability of adversarial
examples, and demonstrate the promising performance of this technique in
mitigating the impact of whitebox and blackbox attacks at both test and
training times.Comment: arXiv admin note: substantial text overlap with arXiv:1701.04143,
arXiv:1712.0934
Towards Understanding Chinese Checkers with Heuristics, Monte Carlo Tree Search, and Deep Reinforcement Learning
The game of Chinese Checkers is a challenging traditional board game of
perfect information that differs from other traditional games in two main
aspects: first, unlike Chess, all checkers remain indefinitely in the game and
hence the branching factor of the search tree does not decrease as the game
progresses; second, unlike Go, there are also no upper bounds on the depth of
the search tree since repetitions and backward movements are allowed.
Therefore, even in a restricted game instance, the state-space of the game can
still be unbounded, making it challenging for a computer program to excel. In
this work, we present an approach that effectively combines the use of
heuristics, Monte Carlo tree search, and deep reinforcement learning for
building a Chinese Checkers agent without the use of any human game-play data.
Experiment results show that our agent is competent under different scenarios
and reaches the level of experienced human players
Deep Reinforcement learning for real autonomous mobile robot navigation in indoor environments
Deep Reinforcement Learning has been successfully applied in various computer
games [8]. However, it is still rarely used in real-world applications,
especially for the navigation and continuous control of real mobile robots
[13]. Previous approaches lack safety and robustness and/or need a structured
environment. In this paper we present our proof of concept for autonomous
self-learning robot navigation in an unknown environment for a real robot
without a map or planner. The input for the robot is only the fused data from a
2D laser scanner and a RGB-D camera as well as the orientation to the goal. The
map of the environment is unknown. The output actions of an Asynchronous
Advantage Actor-Critic network (GA3C) are the linear and angular velocities for
the robot. The navigator/controller network is pretrained in a high-speed,
parallel, and self-implemented simulation environment to speed up the learning
process and then deployed to the real robot. To avoid overfitting, we train
relatively small networks, and we add random Gaussian noise to the input laser
data. The sensor data fusion with the RGB-D camera allows the robot to navigate
in real environments with real 3D obstacle avoidance and without the need to
fit the environment to the sensory capabilities of the robot. To further
increase the robustness, we train on environments of varying difficulties and
run 32 training instances simultaneously. Video: supplementary File / YouTube,
Code: GitHubComment: 7 pages, repor
Adaptive Power System Emergency Control using Deep Reinforcement Learning
Power system emergency control is generally regarded as the last safety net
for grid security and resiliency. Existing emergency control schemes are
usually designed off-line based on either the conceived "worst" case scenario
or a few typical operation scenarios. These schemes are facing significant
adaptiveness and robustness issues as increasing uncertainties and variations
occur in modern electrical grids. To address these challenges, for the first
time, this paper developed novel adaptive emergency control schemes using deep
reinforcement learning (DRL), by leveraging the high-dimensional feature
extraction and non-linear generalization capabilities of DRL for complex power
systems. Furthermore, an open-source platform named RLGC has been designed for
the first time to assist the development and benchmarking of DRL algorithms for
power system control. Details of the platform and DRL-based emergency control
schemes for generator dynamic braking and under-voltage load shedding are
presented. Extensive case studies performed in both two-area four-machine
system and IEEE 39-Bus system have demonstrated the excellent performance and
robustness of the proposed schemes.Comment: 12 page
Whatever Does Not Kill Deep Reinforcement Learning, Makes It Stronger
Recent developments have established the vulnerability of deep Reinforcement
Learning (RL) to policy manipulation attacks via adversarial perturbations. In
this paper, we investigate the robustness and resilience of deep RL to
training-time and test-time attacks. Through experimental results, we
demonstrate that under noncontiguous training-time attacks, Deep Q-Network
(DQN) agents can recover and adapt to the adversarial conditions by reactively
adjusting the policy. Our results also show that policies learned under
adversarial perturbations are more robust to test-time attacks. Furthermore, we
compare the performance of -greedy and parameter-space noise
exploration methods in terms of robustness and resilience against adversarial
perturbations.Comment: arXiv admin note: text overlap with arXiv:1701.0414
Deep Model Predictive Control with Online Learning for Complex Physical Systems
The control of complex systems is of critical importance in many branches of
science, engineering, and industry. Controlling an unsteady fluid flow is
particularly important, as flow control is a key enabler for technologies in
energy (e.g., wind, tidal, and combustion), transportation (e.g., planes,
trains, and automobiles), security (e.g., tracking airborne contamination), and
health (e.g., artificial hearts and artificial respiration). However, the
high-dimensional, nonlinear, and multi-scale dynamics make real-time feedback
control infeasible. Fortunately, these high-dimensional systems exhibit
dominant, low-dimensional patterns of activity that can be exploited for
effective control in the sense that knowledge of the entire state of a system
is not required. Advances in machine learning have the potential to
revolutionize flow control given its ability to extract principled, low-rank
feature spaces characterizing such complex systems. We present a novel deep
learning model predictive control (DeepMPC) framework that exploits low-rank
features of the flow in order to achieve considerable improvements to control
performance. Instead of predicting the entire fluid state, we use a recurrent
neural network (RNN) to accurately predict the control relevant quantities of
the system. The RNN is then embedded into a MPC framework to construct a
feedback loop, and incoming sensor data is used to perform online updates to
improve prediction accuracy. The results are validated using varying fluid flow
examples of increasing complexity
- …