1,327 research outputs found
Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization
Dynamic portfolio optimization is the process of sequentially allocating
wealth to a collection of assets in some consecutive trading periods, based on
investors' return-risk profile. Automating this process with machine learning
remains a challenging problem. Here, we design a deep reinforcement learning
(RL) architecture with an autonomous trading agent such that, investment
decisions and actions are made periodically, based on a global objective, with
autonomy. In particular, without relying on a purely model-free RL agent, we
train our trading agent using a novel RL architecture consisting of an infused
prediction module (IPM), a generative adversarial data augmentation module
(DAM) and a behavior cloning module (BCM). Our model-based approach works with
both on-policy or off-policy RL algorithms. We further design the back-testing
and execution engine which interact with the RL agent in real time. Using
historical {\em real} financial market data, we simulate trading with practical
constraints, and demonstrate that our proposed model is robust, profitable and
risk-sensitive, as compared to baseline trading strategies and model-free RL
agents from prior work
DRAG: Deep Reinforcement Learning Based Base Station Activation in Heterogeneous Networks
Heterogeneous Network (HetNet), where Small cell Base Stations (SBSs) are
densely deployed to offload traffic from macro Base Stations (BSs), is
identified as a key solution to meet the unprecedented mobile traffic demand.
The high density of SBSs are designed for peak traffic hours and consume an
unnecessarily large amount of energy during off-peak time. In this paper, we
propose a deep reinforcement-learning based SBS activation strategy that
activates the optimal subset of SBSs to significantly lower the energy
consumption without compromising the quality of service. In particular, we
formulate the SBS on/off switching problem into a Markov Decision Process that
can be solved by Actor Critic (AC) reinforcement learning methods. To avoid
prohibitively high computational and storage costs of conventional
tabular-based approaches, we propose to use deep neural networks to approximate
the policy and value functions in the AC approach. Moreover, to expedite the
training process, we adopt a Deep Deterministic Policy Gradient (DDPG) approach
together with a novel action refinement scheme. Through extensive numerical
simulations, we show that the proposed scheme greatly outperforms the existing
methods in terms of both energy efficiency and computational efficiency. We
also show that the proposed scheme can scale to large system with polynomial
complexities in both storage and computation.Comment: 12 pages, 13 figure
A Model-Based Reinforcement Learning Approach for a Rare Disease Diagnostic Task
In this work, we present our various contributions to the objective of
building a decision support tool for the diagnosis of rare diseases. Our goal
is to achieve a state of knowledge where the uncertainty about the patient's
disease is below a predetermined threshold. We aim to reach such states while
minimizing the average number of medical tests to perform. In doing so, we take
into account the need, in many medical applications, to avoid, as much as
possible, any misdiagnosis. To solve this optimization task, we investigate
several reinforcement learning algorithm and make them operable in our
high-dimensional and sparse-reward setting. We also present a way to combine
expert knowledge, expressed as conditional probabilities, with real clinical
data. This is crucial because the scarcity of data in the field of rare
diseases prevents any approach based solely on clinical data. Finally we show
that it is possible to integrate the ontological information about symptoms
while remaining in our probabilistic reasoning. It enables our decision support
tool to process information given at different level of precision by the user.Comment: 24 page
Differential Variable Speed Limits Control for Freeway Recurrent Bottlenecks via Deep Reinforcement learning
Variable speed limits (VSL) control is a flexible way to improve traffic
condition,increase safety and reduce emission. There is an emerging trend of
using reinforcement learning technique for VSL control and recent studies have
shown promising results. Currently, deep learning is enabling reinforcement
learning to develope autonomous control agents for problems that were
previously intractable. In this paper, we propose a more effective deep
reinforcement learning (DRL) model for differential variable speed limits
(DVSL) control, in which the dynamic and different speed limits among lanes can
be imposed. The proposed DRL models use a novel actor-critic architecture which
can learn a large number of discrete speed limits in a continues action space.
Different reward signals, e.g. total travel time, bottleneck speed, emergency
braking, and vehicular emission are used to train the DVSL controller, and
comparison between these reward signals are conducted. We test proposed DRL
baased DVSL controllers on a simulated freeway recurrent bottleneck. Results
show that the efficiency, safety and emissions can be improved by the proposed
method. We also show some interesting findings through the visulization of the
control policies generated from DRL models.Comment: 24 pages, 7 figures, 1 tabl
Applications of Deep Reinforcement Learning in Communications and Networking: A Survey
This paper presents a comprehensive literature review on applications of deep
reinforcement learning in communications and networking. Modern networks, e.g.,
Internet of Things (IoT) and Unmanned Aerial Vehicle (UAV) networks, become
more decentralized and autonomous. In such networks, network entities need to
make decisions locally to maximize the network performance under uncertainty of
network environment. Reinforcement learning has been efficiently used to enable
the network entities to obtain the optimal policy including, e.g., decisions or
actions, given their states when the state and action spaces are small.
However, in complex and large-scale networks, the state and action spaces are
usually large, and the reinforcement learning may not be able to find the
optimal policy in reasonable time. Therefore, deep reinforcement learning, a
combination of reinforcement learning with deep learning, has been developed to
overcome the shortcomings. In this survey, we first give a tutorial of deep
reinforcement learning from fundamental concepts to advanced models. Then, we
review deep reinforcement learning approaches proposed to address emerging
issues in communications and networking. The issues include dynamic network
access, data rate control, wireless caching, data offloading, network security,
and connectivity preservation which are all important to next generation
networks such as 5G and beyond. Furthermore, we present applications of deep
reinforcement learning for traffic routing, resource sharing, and data
collection. Finally, we highlight important challenges, open issues, and future
research directions of applying deep reinforcement learning.Comment: 37 pages, 13 figures, 6 tables, 174 reference paper
Reinforcement Learning
Reinforcement learning (RL) is a general framework for adaptive control,
which has proven to be efficient in many domains, e.g., board games, video
games or autonomous vehicles. In such problems, an agent faces a sequential
decision-making problem where, at every time step, it observes its state,
performs an action, receives a reward and moves to a new state. An RL agent
learns by trial and error a good policy (or controller) based on observations
and numeric reward feedback on the previously performed action. In this
chapter, we present the basic framework of RL and recall the two main families
of approaches that have been developed to learn a good policy. The first one,
which is value-based, consists in estimating the value of an optimal policy,
value from which a policy can be recovered, while the other, called policy
search, directly works in a policy space. Actor-critic methods can be seen as a
policy search technique where the policy value that is learned guides the
policy improvement. Besides, we give an overview of some extensions of the
standard RL framework, notably when risk-averse behavior needs to be taken into
account or when rewards are not available or not known.Comment: Chapter in "A Guided Tour of Artificial Intelligence Research",
Springe
Generating Text with Deep Reinforcement Learning
We introduce a novel schema for sequence to sequence learning with a Deep
Q-Network (DQN), which decodes the output sequence iteratively. The aim here is
to enable the decoder to first tackle easier portions of the sequences, and
then turn to cope with difficult parts. Specifically, in each iteration, an
encoder-decoder Long Short-Term Memory (LSTM) network is employed to, from the
input sequence, automatically create features to represent the internal states
of and formulate a list of potential actions for the DQN. Take rephrasing a
natural sentence as an example. This list can contain ranked potential words.
Next, the DQN learns to make decision on which action (e.g., word) will be
selected from the list to modify the current decoded sequence. The newly
modified output sequence is subsequently used as the input to the DQN for the
next decoding iteration. In each iteration, we also bias the reinforcement
learning's attention to explore sequence portions which are previously
difficult to be decoded. For evaluation, the proposed strategy was trained to
decode ten thousands natural sentences. Our experiments indicate that, when
compared to a left-to-right greedy beam search LSTM decoder, the proposed
method performed competitively well when decoding sentences from the training
set, but significantly outperformed the baseline when decoding unseen
sentences, in terms of BLEU score obtained.Comment: Accepted to the NIPS2015 Deep Reinforcement Learning Worksho
Intelligent Residential Energy Management System using Deep Reinforcement Learning
The rising demand for electricity and its essential nature in today's world
calls for intelligent home energy management (HEM) systems that can reduce
energy usage. This involves scheduling of loads from peak hours of the day when
energy consumption is at its highest to leaner off-peak periods of the day when
energy consumption is relatively lower thereby reducing the system's peak load
demand, which would consequently result in lesser energy bills, and improved
load demand profile. This work introduces a novel way to develop a learning
system that can learn from experience to shift loads from one time instance to
another and achieve the goal of minimizing the aggregate peak load. This paper
proposes a Deep Reinforcement Learning (DRL) model for demand response where
the virtual agent learns the task like humans do. The agent gets feedback for
every action it takes in the environment; these feedbacks will drive the agent
to learn about the environment and take much smarter steps later in its
learning stages. Our method outperformed the state of the art mixed integer
linear programming (MILP) for load peak reduction. The authors have also
designed an agent to learn to minimize both consumers' electricity bills and
utilities' system peak load demand simultaneously. The proposed model was
analyzed with loads from five different residential consumers; the proposed
method increases the monthly savings of each consumer by reducing their
electricity bill drastically along with minimizing the peak load on the system
when time shiftable loads are handled by the proposed method
Sample Efficiency in Sparse Reinforcement Learning: Or Your Money Back
Sparse rewards present a difficult problem in reinforcement learning and may
be inevitable in certain domains with complex dynamics such as real-world
robotics. Hindsight Experience Replay (HER) is a recent replay memory
development that allows agents to learn in sparse settings by altering memories
to show them as successful even though they may not be. While, empirically, HER
has shown some success, it does not provide guarantees around the makeup of
samples drawn from an agent's replay memory. This may result in minibatches
that contain only memories with zero-valued rewards or agents learning an
undesirable policy that completes HER-adjusted goals instead of the actual
goal.
In this paper, we introduce Or Your Money Back (OYMB), a replay memory
sampler designed to work with HER. OYMB improves training efficiency in sparse
settings by providing a direct interface to the agent's replay memory that
allows for control over minibatch makeup, as well as a preferential lookup
scheme that prioritizes real-goal memories before HER-adjusted memories. We
test our approach on five tasks across three unique environments. Our results
show that using HER in combination with OYMB outperforms using HER alone and
leads to agents that learn to complete the real goal more quickly
Personalized Cancer Chemotherapy Schedule: a numerical comparison of performance and robustness in model-based and model-free scheduling methodologies
Reinforcement learning algorithms are gaining popularity in fields in which
optimal scheduling is important, and oncology is not an exception. The complex
and uncertain dynamics of cancer limit the performance of traditional
model-based scheduling strategies like Optimal Control. Motivated by the recent
success of model-free Deep Reinforcement Learning (DRL) in challenging control
tasks and in the design of medical treatments, we use Deep Q-Network (DQN) and
Deep Deterministic Policy Gradient (DDPG) to design a personalized cancer
chemotherapy schedule. We show that both of them succeed in the task and
outperform the Optimal Control solution in the presence of uncertainty.
Furthermore, we show that DDPG can exterminate cancer more efficiently than DQN
presumably due to its continuous action space. Finally, we provide some insight
regarding the amount of samples required for the training.Comment: Minor change
- …