Search CORE

202 research outputs found

Offline Experience Replay for Continual Offline Reinforcement Learning

Author: Gai Sibo
He Li
Wang Donglin
Publication venue
Publication date: 23/05/2023
Field of study

The capability of continuously learning new skills via a sequence of pre-collected offline datasets is desired for an agent. However, consecutively learning a sequence of offline tasks likely leads to the catastrophic forgetting issue under resource-limited scenarios. In this paper, we formulate a new setting, continual offline reinforcement learning (CORL), where an agent learns a sequence of offline reinforcement learning tasks and pursues good performance on all learned tasks with a small replay buffer without exploring any of the environments of all the sequential tasks. For consistently learning on all sequential tasks, an agent requires acquiring new knowledge and meanwhile preserving old knowledge in an offline manner. To this end, we introduced continual learning algorithms and experimentally found experience replay (ER) to be the most suitable algorithm for the CORL problem. However, we observe that introducing ER into CORL encounters a new distribution shift problem: the mismatch between the experiences in the replay buffer and trajectories from the learned policy. To address such an issue, we propose a new model-based experience selection (MBES) scheme to build the replay buffer, where a transition model is learned to approximate the state distribution. This model is used to bridge the distribution bias between the replay buffer and the learned model by filtering the data from offline data that most closely resembles the learned model for storage. Moreover, in order to enhance the ability on learning new tasks, we retrofit the experience replay method with a new dual behavior cloning (DBC) architecture to avoid the disturbance of behavior-cloning loss on the Q-learning process. In general, we call our algorithm offline experience replay (OER). Extensive experiments demonstrate that our OER method outperforms SOTA baselines in widely-used Mujoco environments.Comment: 9 pages, 4 figure

arXiv.org e-Print Archive

Lugrandoside attenuates spinal cord injury by targeting peli1 and TLR4/NF-κB pathway to exert anti-inflammatory and anti-apoptotic effects

Author: Jiang Haitao
Li Sibo
Sheng Wenbo
Yuan Hantao
Publication venue: 'African Journals Online (AJOL)'
Publication date: 27/04/2022
Field of study

Purpose: To investigate the curative effect and mechanism of lugrandoside (LG) on spinal cord injury (SCI).Methods: We probed the expression of Pellino1 (peli1) in microglia and spinal cord tissues withdifferent treatments of LG. Lipopolysaccharide (LPS) was used to activate the microglia. Furthermore, rats were used to establish SCI model, and LG, at low and high concentrations, was administered to injured animals to ascertain whether LG exerted a therapeutic effect on SCI.Results: LG inhibited the activation and recruitment of glial cells by acting as a negative regulator of glial inflammation, and this reverse the targeting of peli1 and TLR4/NF-κB pathway. Furthermore, the in vivo data showed that LG exerted a neuroprotective effect, following SCI, via anti-inflammatory and antiapoptotic effects. Furthermore, Peli1 and TLR4/NF-κB were suppressed by LG stimuli.Conclusion: These results suggest that LG protects neural tissue against neuroinflammation and apoptosis by suppressing TLR4/NF-κB pathway and negatively targeting peli1. The findings may provide new insights into the treatment of spinal cord injury

AJOL - African Journals Online

Time-Sensitive Collaborative Filtering Algorithm with Feature Stability

Author: Li Guiling
Pang Shanchen
Qiao Sibo
Wang Min
Yu Shihang
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 29/02/2020
Field of study

In the recommendation system, the collaborative filtering algorithm is widely used. However, there are lots of problems which need to be solved in recommendation field, such as low precision, the long tail of items. In this paper, we design an algorithm called FSTS for solving the low precision and the long tail. We adopt stability variables and time-sensitive factors to solve the problem of user's interest drift, and improve the accuracy of prediction. Experiments show that, compared with Item-CF, the precision, the recall, the coverage and the popularity have been significantly improved by FSTS algorithm. At the same time, it can mine long tail items and alleviate the phenomenon of the long tail

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

A fuzzy-clustering based approach for MADM handover in 5G ultra-dense networks

Author: Kwong Chiew Foong
Li Lincan
Liu Qianyu
Wang Jing
Zhang Sibo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/09/2019
Field of study

As the global data traffic has significantly increased in the recent year, the ultra-dense deployment of cellular networks (UDN) is being proposed as one of the key technologies in the fifth-generation mobile communications system (5G) to provide a much higher density of radio resource. The densification of small base stations could introduce much higher inter-cell interference and lead user to meet the edge of coverage more frequently. As the current handover scheme was originally proposed for macro BS, it could cause serious handover issues in UDN i.e. ping-pong handover, handover failures and frequent handover. In order to address these handover challenges and provide a high quality of service (QoS) to the user in UDN. This paper proposed a novel handover scheme, which integrates both advantages of fuzzy logic and multiple attributes decision algorithms (MADM) to ensure handover process be triggered at the right time and connection be switched to the optimal neighbouring BS. To further enhance the performance of the proposed scheme, this paper also adopts the subtractive clustering technique by using historical data to define the optimal membership functions within the fuzzy system. Performance results show that the proposed handover scheme outperforms traditional approaches and can significantly minimise the number of handovers and the ping-pong handover while maintaining QoS at a relatively high level. © 2019, Springer Science+Business Media, LLC, part of Springer Nature

Nottingham ePrints

Nottingham eTheses

What influenced the lesion patterns and hemodynamic characteristics in patients with internal carotid artery stenosis? A retrospective study

Author: Li Furong
Liu Jinjie
Liu Sibo
Liu Zanhua
Sui Xiaowen
Wang Hong
Zhang Meiyan
Zhao Hongling
Publication venue: 'Elsevier BV'
Publication date: 01/01/1970
Field of study

•Blood perfusion influences ischemic lesions in patients with of ICAS.•Communicating arteries influence intracranial blood flow.•TCD was a convenient and rapid tool to assess cerebral blood flow

Via Medica Journals