30,323 research outputs found
Optimizing Routerless Network-on-Chip Designs: An Innovative Learning-Based Framework
Machine learning applied to architecture design presents a promising
opportunity with broad applications. Recent deep reinforcement learning (DRL)
techniques, in particular, enable efficient exploration in vast design spaces
where conventional design strategies may be inadequate. This paper proposes a
novel deep reinforcement framework, taking routerless networks-on-chip (NoC) as
an evaluation case study. The new framework successfully resolves problems with
prior design approaches being either unreliable due to random searches or
inflexible due to severe design space restrictions. The framework learns
(near-)optimal loop placement for routerless NoCs with various design
constraints. A deep neural network is developed using parallel threads that
efficiently explore the immense routerless NoC design space with a Monte Carlo
search tree. Experimental results show that, compared with conventional mesh,
the proposed deep reinforcement learning (DRL) routerless design achieves a
3.25x increase in throughput, 1.6x reduction in packet latency, and 5x
reduction in power. Compared with the state-of-the-art routerless NoC, DRL
achieves a 1.47x increase in throughput, 1.18x reduction in packet latency, and
1.14x reduction in average hop count albeit with slightly more power overhead.Comment: 13 pages, 15 figure
Distributed Deep Q-Learning
We propose a distributed deep learning model to successfully learn control
policies directly from high-dimensional sensory input using reinforcement
learning. The model is based on the deep Q-network, a convolutional neural
network trained with a variant of Q-learning. Its input is raw pixels and its
output is a value function estimating future rewards from taking an action
given a system state. To distribute the deep Q-network training, we adapt the
DistBelief software framework to the context of efficiently training
reinforcement learning agents. As a result, the method is completely
asynchronous and scales well with the number of machines. We demonstrate that
the deep Q-network agent, receiving only the pixels and the game score as
inputs, was able to achieve reasonable success on a simple game with minimal
parameter tuning.Comment: Updated figure of distributed deep learning architecture, updated
content throughout paper including dealing with minor grammatical issues and
highlighting differences of our paper with respect to prior work. arXiv admin
note: text overlap with arXiv:1312.5602 by other author
Policy Distillation
Policies for complex visual tasks have been successfully learned with deep
reinforcement learning, using an approach called deep Q-networks (DQN), but
relatively large (task-specific) networks and extensive training are needed to
achieve good performance. In this work, we present a novel method called policy
distillation that can be used to extract the policy of a reinforcement learning
agent and train a new network that performs at the expert level while being
dramatically smaller and more efficient. Furthermore, the same method can be
used to consolidate multiple task-specific policies into a single policy. We
demonstrate these claims using the Atari domain and show that the multi-task
distilled agent outperforms the single-task teachers as well as a
jointly-trained DQN agent.Comment: Submitted to ICLR 201
A Distributed Reinforcement Learning Solution With Knowledge Transfer Capability for A Bike Rebalancing Problem
Rebalancing is a critical service bottleneck for many transportation
services, such as Citi Bike. Citi Bike relies on manual orchestrations of
rebalancing bikes between dispatchers and field agents. Motivated by such
problem and the lack of smart autonomous solutions in this area, this project
explored a new RL architecture called Distributed RL (DiRL) with Transfer
Learning (TL) capability. The DiRL solution is adaptive to changing traffic
dynamics when keeping bike stock under control at the minimum cost. DiRL
achieved a 350% improvement in bike rebalancing autonomously and TL offered a
62.4% performance boost in managing an entire bike network. Lastly, a field
trip to the dispatch office of Chariot, a ride-sharing service, provided
insights to overcome challenges of deploying an RL solution in the real world
Diversity-Driven Exploration Strategy for Deep Reinforcement Learning
Efficient exploration remains a challenging research problem in reinforcement
learning, especially when an environment contains large state spaces, deceptive
local optima, or sparse rewards. To tackle this problem, we present a
diversity-driven approach for exploration, which can be easily combined with
both off- and on-policy reinforcement learning algorithms. We show that by
simply adding a distance measure to the loss function, the proposed methodology
significantly enhances an agent's exploratory behaviors, and thus preventing
the policy from being trapped in local optima. We further propose an adaptive
scaling method for stabilizing the learning process. Our experimental results
in Atari 2600 show that our method outperforms baseline approaches in several
tasks in terms of mean scores and exploration efficiency
Adaptive Genomic Evolution of Neural Network Topologies (AGENT) for State-to-Action Mapping in Autonomous Agents
Neuroevolution is a process of training neural networks (NN) through an
evolutionary algorithm, usually to serve as a state-to-action mapping model in
control or reinforcement learning-type problems. This paper builds on the Neuro
Evolution of Augmented Topologies (NEAT) formalism that allows designing
topology and weight evolving NNs. Fundamental advancements are made to the
neuroevolution process to address premature stagnation and convergence issues,
central among which is the incorporation of automated mechanisms to control the
population diversity and average fitness improvement within the neuroevolution
process. Insights into the performance and efficiency of the new algorithm is
obtained by evaluating it on three benchmark problems from the Open AI platform
and an Unmanned Aerial Vehicle (UAV) collision avoidance problem.Comment: Accepted for presentation in (and publication in the proceedings of)
the 2019 IEEE International Conference on Robotics and Automation (ICRA
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
The ability to act in multiple environments and transfer previous knowledge
to new situations can be considered a critical aspect of any intelligent agent.
Towards this goal, we define a novel method of multitask and transfer learning
that enables an autonomous agent to learn how to behave in multiple tasks
simultaneously, and then generalize its knowledge to new domains. This method,
termed "Actor-Mimic", exploits the use of deep reinforcement learning and model
compression techniques to train a single policy network that learns how to act
in a set of distinct tasks by using the guidance of several expert teachers. We
then show that the representations learnt by the deep policy network are
capable of generalizing to new tasks with no prior expert guidance, speeding up
learning in novel environments. Although our method can in general be applied
to a wide range of problems, we use Atari games as a testing environment to
demonstrate these methods.Comment: Accepted as a conference paper at ICLR 201
Reinforcement Learning-based Application Autoscaling in the Cloud: A Survey
Reinforcement Learning (RL) has demonstrated a great potential for
automatically solving decision-making problems in complex uncertain
environments. RL proposes a computational approach that allows learning through
interaction in an environment with stochastic behavior, where agents take
actions to maximize some cumulative short-term and long-term rewards. Some of
the most impressive results have been shown in Game Theory where agents
exhibited superhuman performance in games like Go or Starcraft 2, which led to
its gradual adoption in many other domains, including Cloud Computing.
Therefore, RL appears as a promising approach for Autoscaling in Cloud since it
is possible to learn transparent (with no human intervention), dynamic (no
static plans), and adaptable (constantly updated) resource management policies
to execute applications. These are three important distinctive aspects to
consider in comparison with other widely used autoscaling policies that are
defined in an ad-hoc way or statically computed as in solutions based on
meta-heuristics. Autoscaling exploits the Cloud elasticity to optimize the
execution of applications according to given optimization criteria, which
demands to decide when and how to scale-up/down computational resources, and
how to assign them to the upcoming processing workload. Such actions have to be
taken considering that the Cloud is a dynamic and uncertain environment.
Motivated by this, many works apply RL to the autoscaling problem in the Cloud.
In this work, we survey exhaustively those proposals from major venues, and
uniformly compare them based on a set of proposed taxonomies. We also discuss
open problems and prospective research in the area.Comment: 40 pages, 9 figure
Decision Making Agent Searching for Markov Models in Near-Deterministic World
Reinforcement learning has solid foundations, but becomes inefficient in
partially observed (non-Markovian) environments. Thus, a learning agent -born
with a representation and a policy- might wish to investigate to what extent
the Markov property holds. We propose a learning architecture that utilizes
combinatorial policy optimization to overcome non-Markovity and to develop
efficient behaviors, which are easy to inherit, tests the Markov property of
the behavioral states, and corrects against non-Markovity by running a
deterministic factored Finite State Model, which can be learned. We illustrate
the properties of architecture in the near deterministic Ms. Pac-Man game. We
analyze the architecture from the point of view of evolutionary, individual,
and social learning.Comment: Draf
Run, skeleton, run: skeletal model in a physics-based simulation
In this paper, we present our approach to solve a physics-based reinforcement
learning challenge "Learning to Run" with objective to train
physiologically-based human model to navigate a complex obstacle course as
quickly as possible. The environment is computationally expensive, has a
high-dimensional continuous action space and is stochastic. We benchmark state
of the art policy-gradient methods and test several improvements, such as layer
normalization, parameter noise, action and state reflecting, to stabilize
training and improve its sample-efficiency. We found that the Deep
Deterministic Policy Gradient method is the most efficient method for this
environment and the improvements we have introduced help to stabilize training.
Learned models are able to generalize to new physical scenarios, e.g. different
obstacle courses.Comment: Corrected typos and spellin
- …