12 research outputs found
Simulation-Based Inference for Global Health Decisions
The COVID-19 pandemic has highlighted the importance of in-silico
epidemiological modelling in predicting the dynamics of infectious diseases to
inform health policy and decision makers about suitable prevention and
containment strategies. Work in this setting involves solving challenging
inference and control problems in individual-based models of ever increasing
complexity. Here we discuss recent breakthroughs in machine learning,
specifically in simulation-based inference, and explore its potential as a
novel venue for model calibration to support the design and evaluation of
public health interventions. To further stimulate research, we are developing
software interfaces that turn two cornerstone COVID-19 and malaria epidemiology
models COVID-sim, (https://github.com/mrc-ide/covid-sim/) and OpenMalaria
(https://github.com/SwissTPH/openmalaria) into probabilistic programs, enabling
efficient interpretable Bayesian inference within those simulators
Hierarchical Imitation Learning for Stochastic Environments
Many applications of imitation learning require the agent to generate the
full distribution of behaviour observed in the training data. For example, to
evaluate the safety of autonomous vehicles in simulation, accurate and diverse
behaviour models of other road users are paramount. Existing methods that
improve this distributional realism typically rely on hierarchical policies.
These condition the policy on types such as goals or personas that give rise to
multi-modal behaviour. However, such methods are often inappropriate for
stochastic environments where the agent must also react to external factors:
because agent types are inferred from the observed future trajectory during
training, these environments require that the contributions of internal and
external factors to the agent behaviour are disentangled and only internal
factors, i.e., those under the agent's control, are encoded in the type.
Encoding future information about external factors leads to inappropriate agent
reactions during testing, when the future is unknown and types must be drawn
independently from the actual future. We formalize this challenge as
distribution shift in the conditional distribution of agent types under
environmental stochasticity. We propose Robust Type Conditioning (RTC), which
eliminates this shift with adversarial training under randomly sampled types.
Experiments on two domains, including the large-scale Waymo Open Motion
Dataset, show improved distributional realism while maintaining or improving
task performance compared to state-of-the-art baselines.Comment: Published at IROS'2
Reusable Options through Gradient-based Meta Learning
Hierarchical methods in reinforcement learning have the potential to reduce the amount of decisions that the agent needs to perform when learning new tasks. However, finding a reusable useful temporal abstractions that facilitate fast learning remains a challenging problem. Recently, several deep learning approaches were proposed to learn such temporal abstractions in the form of options in an end-to-end manner. In this work, we point out several shortcomings of these methods and discuss their potential negative consequences. Subsequently, we formulate the desiderata for reusable options and use these to frame the problem of learning options as a gradient-based meta-learning problem. This allows us to formulate an objective that explicitly incentivizes options which allow a higher-level decision maker to adjust in few steps to different tasks. Experimentally, we show that our method is able to learn transferable components which accelerate learning and performs better than existing prior methods developed for this setting. Additionally, we perform ablations to quantify the impact of using gradient-based meta-learning as well as other proposed changes
Learning to Modulate pre-trained Models in RL
Reinforcement Learning (RL) has been successful in various domains like
robotics, game playing, and simulation. While RL agents have shown impressive
capabilities in their specific tasks, they insufficiently adapt to new tasks.
In supervised learning, this adaptation problem is addressed by large-scale
pre-training followed by fine-tuning to new down-stream tasks. Recently,
pre-training on multiple tasks has been gaining traction in RL. However,
fine-tuning a pre-trained model often suffers from catastrophic forgetting.
That is, the performance on the pre-training tasks deteriorates when
fine-tuning on new tasks. To investigate the catastrophic forgetting
phenomenon, we first jointly pre-train a model on datasets from two benchmark
suites, namely Meta-World and DMControl. Then, we evaluate and compare a
variety of fine-tuning methods prevalent in natural language processing, both
in terms of performance on new tasks, and how well performance on pre-training
tasks is retained. Our study shows that with most fine-tuning approaches, the
performance on pre-training tasks deteriorates significantly. Therefore, we
propose a novel method, Learning-to-Modulate (L2M), that avoids the degradation
of learned skills by modulating the information flow of the frozen pre-trained
model via a learnable modulation pool. Our method achieves state-of-the-art
performance on the Continual-World benchmark, while retaining performance on
the pre-training tasks. Finally, to aid future research in this area, we
release a dataset encompassing 50 Meta-World and 16 DMControl tasks.Comment: 10 pages (+ references and appendix), Code:
https://github.com/ml-jku/L2
Towards Continual Reinforcement Learning: A Review and Perspectives
In this article, we aim to provide a literature review of different
formulations and approaches to continual reinforcement learning (RL), also
known as lifelong or non-stationary RL. We begin by discussing our perspective
on why RL is a natural fit for studying continual learning. We then provide a
taxonomy of different continual RL formulations and mathematically characterize
the non-stationary dynamics of each setting. We go on to discuss evaluation of
continual RL agents, providing an overview of benchmarks used in the literature
and important metrics for understanding agent performance. Finally, we
highlight open problems and challenges in bridging the gap between the current
state of continual RL and findings in neuroscience. While still in its early
days, the study of continual RL has the promise to develop better incremental
reinforcement learners that can function in increasingly realistic applications
where non-stationarity plays a vital role. These include applications such as
those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure