Search CORE

30,323 research outputs found

Optimizing Routerless Network-on-Chip Designs: An Innovative Learning-Based Framework

Author: Chen Lizhong
Lin Ting-Ru
Pedram Massoud
Penney Drew
Publication venue
Publication date: 10/05/2019
Field of study

Machine learning applied to architecture design presents a promising opportunity with broad applications. Recent deep reinforcement learning (DRL) techniques, in particular, enable efficient exploration in vast design spaces where conventional design strategies may be inadequate. This paper proposes a novel deep reinforcement framework, taking routerless networks-on-chip (NoC) as an evaluation case study. The new framework successfully resolves problems with prior design approaches being either unreliable due to random searches or inflexible due to severe design space restrictions. The framework learns (near-)optimal loop placement for routerless NoCs with various design constraints. A deep neural network is developed using parallel threads that efficiently explore the immense routerless NoC design space with a Monte Carlo search tree. Experimental results show that, compared with conventional mesh, the proposed deep reinforcement learning (DRL) routerless design achieves a 3.25x increase in throughput, 1.6x reduction in packet latency, and 5x reduction in power. Compared with the state-of-the-art routerless NoC, DRL achieves a 1.47x increase in throughput, 1.18x reduction in packet latency, and 1.14x reduction in average hop count albeit with slightly more power overhead.Comment: 13 pages, 15 figure

arXiv.org e-Print Archive

Distributed Deep Q-Learning

Author: Chavez Kevin
Hong Augustus
Ong Hao Yi
Publication venue
Publication date: 15/10/2015
Field of study

We propose a distributed deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is based on the deep Q-network, a convolutional neural network trained with a variant of Q-learning. Its input is raw pixels and its output is a value function estimating future rewards from taking an action given a system state. To distribute the deep Q-network training, we adapt the DistBelief software framework to the context of efficiently training reinforcement learning agents. As a result, the method is completely asynchronous and scales well with the number of machines. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to achieve reasonable success on a simple game with minimal parameter tuning.Comment: Updated figure of distributed deep learning architecture, updated content throughout paper including dealing with minor grammatical issues and highlighting differences of our paper with respect to prior work. arXiv admin note: text overlap with arXiv:1312.5602 by other author

arXiv.org e-Print Archive

Policy Distillation

Author: Colmenarejo Sergio Gomez
Desjardins Guillaume
Gulcehre Caglar
Hadsell Raia
Kavukcuoglu Koray
Kirkpatrick James
Mnih Volodymyr
Pascanu Razvan
Rusu Andrei A.
Publication venue
Publication date: 07/01/2016
Field of study

Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance. In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and train a new network that performs at the expert level while being dramatically smaller and more efficient. Furthermore, the same method can be used to consolidate multiple task-specific policies into a single policy. We demonstrate these claims using the Atari domain and show that the multi-task distilled agent outperforms the single-task teachers as well as a jointly-trained DQN agent.Comment: Submitted to ICLR 201

arXiv.org e-Print Archive

A Distributed Reinforcement Learning Solution With Knowledge Transfer Capability for A Bike Rebalancing Problem

Author: Xiao Ian
Publication venue
Publication date: 09/10/2018
Field of study

Rebalancing is a critical service bottleneck for many transportation services, such as Citi Bike. Citi Bike relies on manual orchestrations of rebalancing bikes between dispatchers and field agents. Motivated by such problem and the lack of smart autonomous solutions in this area, this project explored a new RL architecture called Distributed RL (DiRL) with Transfer Learning (TL) capability. The DiRL solution is adaptive to changing traffic dynamics when keeping bike stock under control at the minimum cost. DiRL achieved a 350% improvement in bike rebalancing autonomously and TL offered a 62.4% performance boost in managing an entire bike network. Lastly, a field trip to the dispatch office of Chariot, a ride-sharing service, provided insights to overcome challenges of deploying an RL solution in the real world

arXiv.org e-Print Archive

Diversity-Driven Exploration Strategy for Deep Reinforcement Learning

Author: Chang Yi-Hsiang
Hong Zhang-Wei
Lee Chun-Yi
Shann Tzu-Yun
Su Shih-Yang
Publication venue
Publication date: 28/10/2018
Field of study

Efficient exploration remains a challenging research problem in reinforcement learning, especially when an environment contains large state spaces, deceptive local optima, or sparse rewards. To tackle this problem, we present a diversity-driven approach for exploration, which can be easily combined with both off- and on-policy reinforcement learning algorithms. We show that by simply adding a distance measure to the loss function, the proposed methodology significantly enhances an agent's exploratory behaviors, and thus preventing the policy from being trapped in local optima. We further propose an adaptive scaling method for stabilizing the learning process. Our experimental results in Atari 2600 show that our method outperforms baseline approaches in several tasks in terms of mean scores and exploration efficiency

arXiv.org e-Print Archive

Adaptive Genomic Evolution of Neural Network Topologies (AGENT) for State-to-Action Mapping in Autonomous Agents

Author: Behjat Amir
Chidambaran Sharat
Chowdhury Souma
Publication venue
Publication date: 17/03/2019
Field of study

Neuroevolution is a process of training neural networks (NN) through an evolutionary algorithm, usually to serve as a state-to-action mapping model in control or reinforcement learning-type problems. This paper builds on the Neuro Evolution of Augmented Topologies (NEAT) formalism that allows designing topology and weight evolving NNs. Fundamental advancements are made to the neuroevolution process to address premature stagnation and convergence issues, central among which is the incorporation of automated mechanisms to control the population diversity and average fitness improvement within the neuroevolution process. Insights into the performance and efficiency of the new algorithm is obtained by evaluating it on three benchmark problems from the Open AI platform and an Unmanned Aerial Vehicle (UAV) collision avoidance problem.Comment: Accepted for presentation in (and publication in the proceedings of) the 2019 IEEE International Conference on Robotics and Automation (ICRA

arXiv.org e-Print Archive

Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning

Author: Ba Jimmy Lei
Parisotto Emilio
Salakhutdinov Ruslan
Publication venue
Publication date: 22/02/2016
Field of study

The ability to act in multiple environments and transfer previous knowledge to new situations can be considered a critical aspect of any intelligent agent. Towards this goal, we define a novel method of multitask and transfer learning that enables an autonomous agent to learn how to behave in multiple tasks simultaneously, and then generalize its knowledge to new domains. This method, termed "Actor-Mimic", exploits the use of deep reinforcement learning and model compression techniques to train a single policy network that learns how to act in a set of distinct tasks by using the guidance of several expert teachers. We then show that the representations learnt by the deep policy network are capable of generalizing to new tasks with no prior expert guidance, speeding up learning in novel environments. Although our method can in general be applied to a wide range of problems, we use Atari games as a testing environment to demonstrate these methods.Comment: Accepted as a conference paper at ICLR 201

arXiv.org e-Print Archive

Reinforcement Learning-based Application Autoscaling in the Cloud: A Survey

Author: Garino Carlos García
Garí Yisel
Mateos Cristian
Monge David A.
Pacini Elina
Publication venue
Publication date: 17/11/2020
Field of study

Reinforcement Learning (RL) has demonstrated a great potential for automatically solving decision-making problems in complex uncertain environments. RL proposes a computational approach that allows learning through interaction in an environment with stochastic behavior, where agents take actions to maximize some cumulative short-term and long-term rewards. Some of the most impressive results have been shown in Game Theory where agents exhibited superhuman performance in games like Go or Starcraft 2, which led to its gradual adoption in many other domains, including Cloud Computing. Therefore, RL appears as a promising approach for Autoscaling in Cloud since it is possible to learn transparent (with no human intervention), dynamic (no static plans), and adaptable (constantly updated) resource management policies to execute applications. These are three important distinctive aspects to consider in comparison with other widely used autoscaling policies that are defined in an ad-hoc way or statically computed as in solutions based on meta-heuristics. Autoscaling exploits the Cloud elasticity to optimize the execution of applications according to given optimization criteria, which demands to decide when and how to scale-up/down computational resources, and how to assign them to the upcoming processing workload. Such actions have to be taken considering that the Cloud is a dynamic and uncertain environment. Motivated by this, many works apply RL to the autoscaling problem in the Cloud. In this work, we survey exhaustively those proposals from major venues, and uniformly compare them based on a set of proposed taxonomies. We also discuss open problems and prospective research in the area.Comment: 40 pages, 9 figure

arXiv.org e-Print Archive

Decision Making Agent Searching for Markov Models in Near-Deterministic World

Author: Lorincz Andras
Matuz Gabor
Publication venue
Publication date: 01/03/2011
Field of study

Reinforcement learning has solid foundations, but becomes inefficient in partially observed (non-Markovian) environments. Thus, a learning agent -born with a representation and a policy- might wish to investigate to what extent the Markov property holds. We propose a learning architecture that utilizes combinatorial policy optimization to overcome non-Markovity and to develop efficient behaviors, which are easy to inherit, tests the Markov property of the behavioral states, and corrects against non-Markovity by running a deterministic factored Finite State Model, which can be learned. We illustrate the properties of architecture in the near deterministic Ms. Pac-Man game. We analyze the architecture from the point of view of evolutionary, individual, and social learning.Comment: Draf

arXiv.org e-Print Archive

Run, skeleton, run: skeletal model in a physics-based simulation

Author: Kolesnikov Sergey
Pavlov Mikhail
Plis Sergey M.
Publication venue
Publication date: 28/01/2018
Field of study

In this paper, we present our approach to solve a physics-based reinforcement learning challenge "Learning to Run" with objective to train physiologically-based human model to navigate a complex obstacle course as quickly as possible. The environment is computationally expensive, has a high-dimensional continuous action space and is stochastic. We benchmark state of the art policy-gradient methods and test several improvements, such as layer normalization, parameter noise, action and state reflecting, to stabilize training and improve its sample-efficiency. We found that the Deep Deterministic Policy Gradient method is the most efficient method for this environment and the improvements we have introduced help to stabilize training. Learned models are able to generalize to new physical scenarios, e.g. different obstacle courses.Comment: Corrected typos and spellin

arXiv.org e-Print Archive