360 research outputs found
TEACHING RESEARCH ON CULTIVATING COLLEGE STUDENTS’ AWARENESS OF PUBLIC CRISIS AND SUBJECTIVE PREVENTION OF EDUCATIONAL COGNITIVE IMPAIRMENT
TEACHING RESEARCH ON CULTIVATING COLLEGE STUDENTS’ AWARENESS OF PUBLIC CRISIS AND SUBJECTIVE PREVENTION OF EDUCATIONAL COGNITIVE IMPAIRMENT
Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination
The learned policy of model-free offline reinforcement learning (RL) methods
is often constrained to stay within the support of datasets to avoid possible
dangerous out-of-distribution actions or states, making it challenging to
handle out-of-support region. Model-based RL methods offer a richer dataset and
benefit generalization by generating imaginary trajectories with either trained
forward or reverse dynamics model. However, the imagined transitions may be
inaccurate, thus downgrading the performance of the underlying offline RL
method. In this paper, we propose to augment the offline dataset by using
trained bidirectional dynamics models and rollout policies with double check.
We introduce conservatism by trusting samples that the forward model and
backward model agree on. Our method, confidence-aware bidirectional offline
model-based imagination, generates reliable samples and can be combined with
any model-free offline RL method. Experimental results on the D4RL benchmarks
demonstrate that our method significantly boosts the performance of existing
model-free offline RL algorithms and achieves competitive or better scores
against baseline methods.Comment: NeurIPS 202
The primacy bias in Model-based RL
The primacy bias in deep reinforcement learning (DRL), which refers to the
agent's tendency to overfit early data and lose the ability to learn from new
data, can significantly decrease the performance of DRL algorithms. Previous
studies have shown that employing simple techniques, such as resetting the
agent's parameters, can substantially alleviate the primacy bias. However, we
observe that resetting the agent's parameters harms its performance in the
context of model-based reinforcement learning (MBRL). In fact, on further
investigation, we find that the primacy bias in MBRL differs from that in
model-free RL. In this work, we focus on investigating the primacy bias in MBRL
and propose world model resetting, which works in MBRL. We apply our method to
two different MBRL algorithms, MBPO and DreamerV2. We validate the
effectiveness of our method on multiple continuous control tasks on MuJoCo and
DeepMind Control Suite, as well as discrete control tasks on Atari 100k
benchmark. The results show that world model resetting can significantly
alleviate the primacy bias in model-based setting and improve algorithm's
performance. We also give a guide on how to perform world model resetting
effectively
Off-Policy RL Algorithms Can be Sample-Efficient for Continuous Control via Sample Multiple Reuse
Sample efficiency is one of the most critical issues for online reinforcement
learning (RL). Existing methods achieve higher sample efficiency by adopting
model-based methods, Q-ensemble, or better exploration mechanisms. We, instead,
propose to train an off-policy RL agent via updating on a fixed sampled batch
multiple times, thus reusing these samples and better exploiting them within a
single optimization loop. We name our method sample multiple reuse (SMR). We
theoretically show the properties of Q-learning with SMR, e.g., convergence.
Furthermore, we incorporate SMR with off-the-shelf off-policy RL algorithms and
conduct experiments on a variety of continuous control benchmarks. Empirical
results show that SMR significantly boosts the sample efficiency of the base
methods across most of the evaluated tasks without any hyperparameter tuning or
additional tricks.Comment: 37 page
Uncertain Lifetime, Bequest, Annuity and Capital accumulation under different motives of Bequests.
Master'sMASTER OF SOCIAL SCIENCE
Robustness of Utilizing Feedback in Embodied Visual Navigation
This paper presents a framework for training an agent to actively request
help in object-goal navigation tasks, with feedback indicating the location of
the target object in its field of view. To make the agent more robust in
scenarios where a teacher may not always be available, the proposed training
curriculum includes a mix of episodes with and without feedback. The results
show that this approach improves the agent's performance, even in the absence
of feedback.Comment: Accepted at the ICRA Workshop for Communicating Robot Learning across
Human-Robot Interactio
- …