Search CORE

360 research outputs found

TEACHING RESEARCH ON CULTIVATING COLLEGE STUDENTS’ AWARENESS OF PUBLIC CRISIS AND SUBJECTIVE PREVENTION OF EDUCATIONAL COGNITIVE IMPAIRMENT

Author: Geng Jiafei
Wu Xiaoli
Publication venue
Publication date: 01/01/2022
Field of study

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

TEACHING RESEARCH ON CULTIVATING COLLEGE STUDENTS’ AWARENESS OF PUBLIC CRISIS AND SUBJECTIVE PREVENTION OF EDUCATIONAL COGNITIVE IMPAIRMENT

Author: Geng Jiafei
Wu Xiaoli
Publication venue
Publication date: 01/01/2022
Field of study

Hrčak - Portal of scientific journals of Croatia

Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination

Author: Li Xiu
Lu Zongqing
Lyu Jiafei
Publication venue
Publication date: 09/10/2022
Field of study

The learned policy of model-free offline reinforcement learning (RL) methods is often constrained to stay within the support of datasets to avoid possible dangerous out-of-distribution actions or states, making it challenging to handle out-of-support region. Model-based RL methods offer a richer dataset and benefit generalization by generating imaginary trajectories with either trained forward or reverse dynamics model. However, the imagined transitions may be inaccurate, thus downgrading the performance of the underlying offline RL method. In this paper, we propose to augment the offline dataset by using trained bidirectional dynamics models and rollout policies with double check. We introduce conservatism by trusting samples that the forward model and backward model agree on. Our method, confidence-aware bidirectional offline model-based imagination, generates reliable samples and can be combined with any model-free offline RL method. Experimental results on the D4RL benchmarks demonstrate that our method significantly boosts the performance of existing model-free offline RL algorithms and achieves competitive or better scores against baseline methods.Comment: NeurIPS 202

arXiv.org e-Print Archive

The primacy bias in Model-based RL

Author: Li Xiu
Lyu Jiafei
Qiao Zhongjian
Publication venue
Publication date: 23/10/2023
Field of study

The primacy bias in deep reinforcement learning (DRL), which refers to the agent's tendency to overfit early data and lose the ability to learn from new data, can significantly decrease the performance of DRL algorithms. Previous studies have shown that employing simple techniques, such as resetting the agent's parameters, can substantially alleviate the primacy bias. However, we observe that resetting the agent's parameters harms its performance in the context of model-based reinforcement learning (MBRL). In fact, on further investigation, we find that the primacy bias in MBRL differs from that in model-free RL. In this work, we focus on investigating the primacy bias in MBRL and propose world model resetting, which works in MBRL. We apply our method to two different MBRL algorithms, MBPO and DreamerV2. We validate the effectiveness of our method on multiple continuous control tasks on MuJoCo and DeepMind Control Suite, as well as discrete control tasks on Atari 100k benchmark. The results show that world model resetting can significantly alleviate the primacy bias in model-based setting and improve algorithm's performance. We also give a guide on how to perform world model resetting effectively

arXiv.org e-Print Archive

Off-Policy RL Algorithms Can be Sample-Efficient for Continuous Control via Sample Multiple Reuse

Author: Li Xiu
Lu Zongqing
Lyu Jiafei
Wan Le
Publication venue
Publication date: 28/05/2023
Field of study

Sample efficiency is one of the most critical issues for online reinforcement learning (RL). Existing methods achieve higher sample efficiency by adopting model-based methods, Q-ensemble, or better exploration mechanisms. We, instead, propose to train an off-policy RL agent via updating on a fixed sampled batch multiple times, thus reusing these samples and better exploiting them within a single optimization loop. We name our method sample multiple reuse (SMR). We theoretically show the properties of Q-learning with SMR, e.g., convergence. Furthermore, we incorporate SMR with off-the-shelf off-policy RL algorithms and conduct experiments on a variety of continuous control benchmarks. Empirical results show that SMR significantly boosts the sample efficiency of the base methods across most of the evaluated tasks without any hyperparameter tuning or additional tricks.Comment: 37 page

arXiv.org e-Print Archive

Uncertain Lifetime, Bequest, Annuity and Capital accumulation under different motives of Bequests.

Author: HU JIAFEI
Publication venue
Publication date: 31/05/2010
Field of study

Master'sMASTER OF SOCIAL SCIENCE

ScholarBank@NUS

Robustness of Utilizing Feedback in Embodied Visual Navigation

Author: Duan Jiafei
Tan Cheston
Yu Samson
Zhang Jenny
Publication venue
Publication date: 06/03/2023
Field of study

This paper presents a framework for training an agent to actively request help in object-goal navigation tasks, with feedback indicating the location of the target object in its field of view. To make the agent more robust in scenarios where a teacher may not always be available, the proposed training curriculum includes a mix of episodes with and without feedback. The results show that this approach improves the agent's performance, even in the absence of feedback.Comment: Accepted at the ICRA Workshop for Communicating Robot Learning across Human-Robot Interactio

arXiv.org e-Print Archive