142 research outputs found
On overfitting and asymptotic bias in batch reinforcement learning with partial observability
This paper provides an analysis of the tradeoff between asymptotic bias
(suboptimality with unlimited data) and overfitting (additional suboptimality
due to limited data) in the context of reinforcement learning with partial
observability. Our theoretical analysis formally characterizes that while
potentially increasing the asymptotic bias, a smaller state representation
decreases the risk of overfitting. This analysis relies on expressing the
quality of a state representation by bounding L1 error terms of the associated
belief states. Theoretical results are empirically illustrated when the state
representation is a truncated history of observations, both on synthetic POMDPs
and on a large-scale POMDP in the context of smartgrids, with real-world data.
Finally, similarly to known results in the fully observable setting, we also
briefly discuss and empirically illustrate how using function approximators and
adapting the discount factor may enhance the tradeoff between asymptotic bias
and overfitting in the partially observable context.Comment: Accepted at the Journal of Artificial Intelligence Research (JAIR) -
31 page
DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization
Visual reinforcement learning (RL) has shown promise in continuous control
tasks. Despite its progress, current algorithms are still unsatisfactory in
virtually every aspect of the performance such as sample efficiency, asymptotic
performance, and their robustness to the choice of random seeds. In this paper,
we identify a major shortcoming in existing visual RL methods that is the
agents often exhibit sustained inactivity during early training, thereby
limiting their ability to explore effectively. Expanding upon this crucial
observation, we additionally unveil a significant correlation between the
agents' inclination towards motorically inactive exploration and the absence of
neuronal activity within their policy networks. To quantify this inactivity, we
adopt dormant ratio as a metric to measure inactivity in the RL agent's
network. Empirically, we also recognize that the dormant ratio can act as a
standalone indicator of an agent's activity level, regardless of the received
reward signals. Leveraging the aforementioned insights, we introduce DrM, a
method that uses three core mechanisms to guide agents'
exploration-exploitation trade-offs by actively minimizing the dormant ratio.
Experiments demonstrate that DrM achieves significant improvements in sample
efficiency and asymptotic performance with no broken seeds (76 seeds in total)
across three continuous control benchmark environments, including DeepMind
Control Suite, MetaWorld, and Adroit. Most importantly, DrM is the first
model-free algorithm that consistently solves tasks in both the Dog and
Manipulator domains from the DeepMind Control Suite as well as three dexterous
hand manipulation tasks without demonstrations in Adroit, all based on pixel
observations
Reactive Stepping for Humanoid Robots using Reinforcement Learning: Application to Standing Push Recovery on the Exoskeleton Atalante
State-of-the-art reinforcement learning is now able to learn versatile
locomotion, balancing and push-recovery capabilities for bipedal robots in
simulation. Yet, the reality gap has mostly been overlooked and the simulated
results hardly transfer to real hardware. Either it is unsuccessful in practice
because the physics is over-simplified and hardware limitations are ignored, or
regularity is not guaranteed, and unexpected hazardous motions can occur. This
paper presents a reinforcement learning framework capable of learning robust
standing push recovery for bipedal robots that smoothly transfer to reality,
providing only instantaneous proprioceptive observations. By combining original
termination conditions and policy smoothness conditioning, we achieve stable
learning, sim-to-real transfer and safety using a policy without memory nor
explicit history. Reward engineering is then used to give insights into how to
keep balance. We demonstrate its performance in reality on the lower-limb
medical exoskeleton Atalante
A minimalist approach to deep multi-task learning
Multi-task learning is critical for real-life applications of machine learning. Modern approaches are characterised by algorithmic complexity, often unjustified, leading to impractical solutions. In contrast, this thesis demonstrates that a minimalistic alternative is possible, showing the attractiveness of simple methods. 'In defence of the Unitary Scalarisation for Deep Multi-task Learning' motivates the rest of the thesis, showing that none of the more complex multi-task optimisers outperforms the simple per-task gradient summation when compared on fair grounds. Furthermore, it proposes a novel look at multi-task optimisers from the regularisation standpoint. The rest of this thesis focuses on deep reinforcement learning, a general framework for sequential decision-making. In particular, we look at the setting when observations (inputs to the model) are represented as graphs, i.e., collections of interconnected nodes. In 'Scaling GNNs to High-Dimensional Continuous Control' and 'The Role of Morphology in Graph-Based Incompatible Control', we learn a single control policy for agents of different morphology by representing the observation set elements as graphs and deploy graph neural networks (including transformers). In the former chapter, we devise a simple method to scale graph networks by freezing some parts of the network to stabilise learning and prevent overfitting. In the latter chapter, we show that graph connectivity might be suboptimal for the downstream task demonstrating that less-constrained transformers perform significantly better without having the graph connectivity information. Finally, in the 'Generalisable Branching Heuristic for a SAT Solver', we apply multi-task reinforcement learning to Boolean satisfiability, a fundamental problem in academia and industrial applications. We demonstrate that Q-learning, a staple reinforcement learning algorithm equipped with graph neural networks for function approximation, can learn a generalisable branching heuristic.
We hope our findings will steer the further development of the field: creating more complex benchmarks, adding assumptions on task similarities and a model capacity, and exploring other objective functions rather than focusing on the average performance across the tasks
Application of Deep Learning Methods in Monitoring and Optimization of Electric Power Systems
This PhD thesis thoroughly examines the utilization of deep learning
techniques as a means to advance the algorithms employed in the monitoring and
optimization of electric power systems. The first major contribution of this
thesis involves the application of graph neural networks to enhance power
system state estimation. The second key aspect of this thesis focuses on
utilizing reinforcement learning for dynamic distribution network
reconfiguration. The effectiveness of the proposed methods is affirmed through
extensive experimentation and simulations.Comment: PhD thesi
- …