5 research outputs found
Shrinkage in the Time-Varying Parameter Model Framework Using the R Package shrinkTVP
Time-varying parameter (TVP) models are widely used in time series analysis
to flexibly deal with processes which gradually change over time. However, the
risk of overfitting in TVP models is well known. This issue can be dealt with
using appropriate global-local shrinkage priors, which pull time-varying
parameters towards static ones. In this paper, we introduce the R package
shrinkTVP (Knaus, Bitto-Nemling, Cadonna, and Fr\"uhwirth-Schnatter 2019),
which provides a fully Bayesian implementation of shrinkage priors for TVP
models, taking advantage of recent developments in the literature, in
particular that of Bitto and Fr\"uhwirth-Schnatter (2019). The package
shrinkTVP allows for posterior simulation of the parameters through an
efficient Markov Chain Monte Carlo (MCMC) scheme. Moreover, summary and
visualization methods, as well as the possibility of assessing predictive
performance through log predictive density scores (LPDSs), are provided. The
computationally intensive tasks have been implemented in C++ and interfaced
with R. The paper includes a brief overview of the models and shrinkage priors
implemented in the package. Furthermore, core functionalities are illustrated,
both with simulated and real data
Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning
In lifelong learning, an agent learns throughout its entire life without
resets, in a constantly changing environment, as we humans do. Consequently,
lifelong learning comes with a plethora of research problems such as continual
domain shifts, which result in non-stationary rewards and environment dynamics.
These non-stationarities are difficult to detect and cope with due to their
continuous nature. Therefore, exploration strategies and learning methods are
required that are capable of tracking the steady domain shifts, and adapting to
them. We propose Reactive Exploration to track and react to continual domain
shifts in lifelong reinforcement learning, and to update the policy
correspondingly. To this end, we conduct experiments in order to investigate
different exploration strategies. We empirically show that representatives of
the policy-gradient family are better suited for lifelong learning, as they
adapt more quickly to distribution shifts than Q-learning. Thereby,
policy-gradient methods profit the most from Reactive Exploration and show good
results in lifelong learning with continual domain shifts. Our code is
available at: https://github.com/ml-jku/reactive-exploration.Comment: CoLLAs 202
A Dataset Perspective on Offline Reinforcement Learning
The application of Reinforcement Learning (RL) in real world environments can
be expensive or risky due to sub-optimal policies during training. In Offline
RL, this problem is avoided since interactions with an environment are
prohibited. Policies are learned from a given dataset, which solely determines
their performance. Despite this fact, how dataset characteristics influence
Offline RL algorithms is still hardly investigated. The dataset characteristics
are determined by the behavioral policy that samples this dataset. Therefore,
we define characteristics of behavioral policies as exploratory for yielding
high expected information in their interaction with the Markov Decision Process
(MDP) and as exploitative for having high expected return. We implement two
corresponding empirical measures for the datasets sampled by the behavioral
policy in deterministic MDPs. The first empirical measure SACo is defined by
the normalized unique state-action pairs and captures exploration. The second
empirical measure TQ is defined by the normalized average trajectory return and
captures exploitation. Empirical evaluations show the effectiveness of TQ and
SACo. In large-scale experiments using our proposed measures, we show that the
unconstrained off-policy Deep Q-Network family requires datasets with high SACo
to find a good policy. Furthermore, experiments show that policy constraint
algorithms perform well on datasets with high TQ and SACo. Finally, the
experiments show, that purely dataset-constrained Behavioral Cloning performs
competitively to the best Offline RL algorithms for datasets with high TQ.Comment: Code: https://github.com/ml-jku/OfflineR
History Compression via Language Models in Reinforcement Learning
In a partially observable Markov decision process (POMDP), an agent typically
uses a representation of the past to approximate the underlying MDP. We propose
to utilize a frozen Pretrained Language Transformer (PLT) for history
representation and compression to improve sample efficiency. To avoid training
of the Transformer, we introduce FrozenHopfield, which automatically associates
observations with pretrained token embeddings. To form these associations, a
modern Hopfield network stores these token embeddings, which are retrieved by
queries that are obtained by a random but fixed projection of observations. Our
new method, HELM, enables actor-critic network architectures that contain a
pretrained language Transformer for history representation as a memory module.
Since a representation of the past need not be learned, HELM is much more
sample efficient than competitors. On Minigrid and Procgen environments HELM
achieves new state-of-the-art results. Our code is available at
https://github.com/ml-jku/helm.Comment: ICML 202
CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP
CLIP yielded impressive results on zero-shot transfer learning tasks and is
considered as a foundation model like BERT or GPT3. CLIP vision models that
have a rich representation are pre-trained using the InfoNCE objective and
natural language supervision before they are fine-tuned on particular tasks.
Though CLIP excels at zero-shot transfer learning, it suffers from an
explaining away problem, that is, it focuses on one or few features, while
neglecting other relevant features. This problem is caused by insufficiently
extracting the covariance structure in the original multi-modal data. We
suggest to use modern Hopfield networks to tackle the problem of explaining
away. Their retrieved embeddings have an enriched covariance structure derived
from co-occurrences of features in the stored embeddings. However, modern
Hopfield networks increase the saturation effect of the InfoNCE objective which
hampers learning. We propose to use the InfoLOOB objective to mitigate this
saturation effect. We introduce the novel ``Contrastive Leave One Out Boost''
(CLOOB), which uses modern Hopfield networks for covariance enrichment together
with the InfoLOOB objective. In experiments we compare CLOOB to CLIP after
pre-training on the Conceptual Captions and the YFCC dataset with respect to
their zero-shot transfer learning performance on other datasets. CLOOB
consistently outperforms CLIP at zero-shot transfer learning across all
considered architectures and datasets.Comment: 15 pages (+ appendix); Blog: https://ml-jku.github.io/cloob GitHub:
https://github.com/ml-jku/cloo