57,168 research outputs found
Composable Deep Reinforcement Learning for Robotic Manipulation
Model-free deep reinforcement learning has been shown to exhibit good
performance in domains ranging from video games to simulated robotic
manipulation and locomotion. However, model-free methods are known to perform
poorly when the interaction time with the environment is limited, as is the
case for most real-world robotic tasks. In this paper, we study how maximum
entropy policies trained using soft Q-learning can be applied to real-world
robotic manipulation. The application of this method to real-world manipulation
is facilitated by two important features of soft Q-learning. First, soft
Q-learning can learn multimodal exploration strategies by learning policies
represented by expressive energy-based models. Second, we show that policies
learned with soft Q-learning can be composed to create new policies, and that
the optimality of the resulting policy can be bounded in terms of the
divergence between the composed policies. This compositionality provides an
especially valuable tool for real-world manipulation, where constructing new
policies by composing existing skills can provide a large gain in efficiency
over training from scratch. Our experimental evaluation demonstrates that soft
Q-learning is substantially more sample efficient than prior model-free deep
reinforcement learning methods, and that compositionality can be performed for
both simulated and real-world tasks.Comment: Videos: https://sites.google.com/view/composing-real-world-policies
Centralised rehearsal of decentralised cooperation: Multi-agent reinforcement learning for the scalable coordination of residential energy flexibility
This paper investigates how deep multi-agent reinforcement learning can
enable the scalable and privacy-preserving coordination of residential energy
flexibility. The coordination of distributed resources such as electric
vehicles and heating will be critical to the successful integration of large
shares of renewable energy in our electricity grid and, thus, to help mitigate
climate change. The pre-learning of individual reinforcement learning policies
can enable distributed control with no sharing of personal data required during
execution. However, previous approaches for multi-agent reinforcement
learning-based distributed energy resources coordination impose an ever greater
training computational burden as the size of the system increases. We therefore
adopt a deep multi-agent actor-critic method which uses a \emph{centralised but
factored critic} to rehearse coordination ahead of execution. Results show that
coordination is achieved at scale, with minimal information and communication
infrastructure requirements, no interference with daily activities, and privacy
protection. Significant savings are obtained for energy users, the distribution
network and greenhouse gas emissions. Moreover, training times are nearly 40
times shorter than with a previous state-of-the-art reinforcement learning
approach without the factored critic for 30 homes
Base Station Power Optimization for Green Networks Using Reinforcement Learning
The next generation mobile networks have to provide high data rates, extremely low latency, and support high connection density. To meet these requirements, the number of base stations will have to increase and this increase will lead to an energy consumption issue. Therefore “green” approaches to the network operation will gain importance. Reducing the energy consumption of base stations is essential for going green and also it helps service providers to reduce operational expenses. However, achieving energy savings without degrading the quality of service is a huge challenge. In order to address this issue, we propose a machine learning based intelligent solution that also incorporates a network simulator. We develop a reinforcement-based learning model by using deep deterministic policy gradient algorithm. Our model update frequently the policy of network switches in a way that, packet be forwarded to base stations with an optimized power level. The policies taken by the network controller are evaluated with a network simulator to ensure the energy consumption reduction and quality of service balance. The reinforcement learning model allows us to constantly learn and adapt to the changing situations in the dynamic network environment, hence having a more robust and realistic intelligent network management policy set. Our results demonstrate that energy efficiency can be enhanced by 32% and 67% in dense and sparse scenarios, respectively
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
- …