Search CORE

53 research outputs found

Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy

Author: Huang Furong
Jia Ruonan
Wang Xiyao
Wongkamjan Wichayaporn
Publication venue
Publication date: 18/06/2023
Field of study

Model-based reinforcement learning (RL) often achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning. Previous works learn a dynamics model that fits under the empirical state-action visitation distribution for all historical policies, i.e., the sample replay buffer. However, in this paper, we observe that fitting the dynamics model under the distribution for \emph{all historical policies} does not necessarily benefit model prediction for the \emph{current policy} since the policy in use is constantly evolving over time. The evolving policy during training will cause state-action visitation distribution shifts. We theoretically analyze how this distribution shift over historical policies affects the model learning and model rollouts. We then propose a novel dynamics model learning method, named \textit{Policy-adapted Dynamics Model Learning (PDML)}. PDML dynamically adjusts the historical policy mixture distribution to ensure the learned model can continually adapt to the state-action visitation distribution of the evolving policy. Experiments on a range of continuous control environments in MuJoCo show that PDML achieves significant improvement in sample efficiency and higher asymptotic performance combined with the state-of-the-art model-based RL methods.Comment: 16 pages, 5 figure

arXiv.org e-Print Archive

Contrastive Brain Network Learning via Hierarchical Signed Graph Pooling Model

Author: Fu Xiyao
Guo Lei
Huang Heng
Ma Guixiang
Tang Haoteng
Zhang Liang
Publication venue
Publication date: 14/07/2022
Field of study

Recently brain networks have been widely adopted to study brain dynamics, brain development and brain diseases. Graph representation learning techniques on brain functional networks can facilitate the discovery of novel biomarkers for clinical phenotypes and neurodegenerative diseases. However, current graph learning techniques have several issues on brain network mining. Firstly, most current graph learning models are designed for unsigned graph, which hinders the analysis of many signed network data (e.g., brain functional networks). Meanwhile, the insufficiency of brain network data limits the model performance on clinical phenotypes predictions. Moreover, few of current graph learning model is interpretable, which may not be capable to provide biological insights for model outcomes. Here, we propose an interpretable hierarchical signed graph representation learning model to extract graph-level representations from brain functional networks, which can be used for different prediction tasks. In order to further improve the model performance, we also propose a new strategy to augment functional brain network data for contrastive learning. We evaluate this framework on different classification and regression tasks using the data from HCP and OASIS. Our results from extensive experiments demonstrate the superiority of the proposed model compared to several state-of-the-art techniques. Additionally, we use graph saliency maps, derived from these prediction tasks, to demonstrate detection and interpretation of phenotypic biomarkers

arXiv.org e-Print Archive

COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL

Author: Huang Furong
Jia Ruonan
Sun Yanchao
Wang Xiyao
Wongkamjan Wichayaporn
Xu Huazhe
Zheng Ruijie
Publication venue
Publication date: 29/12/2023
Field of study

Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration using current policy for dynamics model learning. However, due to the complex real-world environment, it is inevitable to learn an imperfect dynamics model with model prediction error, which can further mislead policy learning and result in sub-optimal solutions. In this paper, we propose

\texttt{COPlanner}

, a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem with conservative model rollouts and optimistic environment exploration.

\texttt{COPlanner}

leverages an uncertainty-aware policy-guided model predictive control (UP-MPC) component to plan for multi-step uncertainty estimation. This estimated uncertainty then serves as a penalty during model rollouts and as a bonus during real environment exploration respectively, to choose actions. Consequently,

\texttt{COPlanner}

can avoid model uncertain regions through conservative model rollouts, thereby alleviating the influence of model error. Simultaneously, it explores high-reward model uncertain regions to reduce model error actively through optimistic real environment exploration.

\texttt{COPlanner}

is a plug-and-play framework that can be applied to any dyna-style model-based methods. Experimental results on a series of proprioceptive and visual continuous control tasks demonstrate that both sample efficiency and asymptotic performance of strong model-based methods are significantly improved combined with

\texttt{COPlanner}

.Comment: 22 pages, 17 figure

arXiv.org e-Print Archive

3D bi-directional transformer U-Net for medical image segmentation

Author: Fu Xiyao
Huang Heng
Sun Zhexian
Tang Haoteng
Wang Yong
Zhan Liang
Zou Eric M
Publication venue: Digital Commons@Becker
Publication date: 01/01/2022
Field of study

As one of the popular deep learning methods, deep convolutional neural networks (DCNNs) have been widely adopted in segmentation tasks and have received positive feedback. However, in segmentation tasks, DCNN-based frameworks are known for their incompetence in dealing with global relations within imaging features. Although several techniques have been proposed to enhance the global reasoning of DCNN, these models are either not able to gain satisfying performances compared with traditional fully-convolutional structures or not capable of utilizing the basic advantages of CNN-based networks (namely the ability of local reasoning). In this study, compared with current attempts to combine FCNs and global reasoning methods, we fully extracted the ability of self-attention by designing a novel attention mechanism for 3D computation and proposed a new segmentation framework (named 3DTU) for three-dimensional medical image segmentation tasks. This new framework processes images in an end-to-end manner and executes 3D computation on both the encoder side (which contains a 3D transformer) and the decoder side (which is based on a 3D DCNN). We tested our framework on two independent datasets that consist of 3D MRI and CT images. Experimental results clearly demonstrate that our method outperforms several state-of-the-art segmentation methods in various metrics

Digital Commons@Becker

TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

Author: Daumé III Hal
Huang Furong
Ma Shuang
Sun Yanchao
Wang Xiyao
Xu Huazhe
Zhao Jieyu
Zheng Ruijie
Publication venue
Publication date: 22/06/2023
Field of study

Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle. Prior works have attempted to address this challenge by creating self-supervised auxiliary tasks, aiming to enrich the agent's learned representations with control-relevant information for future state prediction. However, these objectives are often insufficient to learn representations that can represent the optimal policy or value function, and they often consider tasks with small, abstract discrete action spaces and thus overlook the importance of action representation learning in continuous control. In this paper, we introduce TACO: Temporal Action-driven Contrastive Learning, a simple yet powerful temporal contrastive learning approach that facilitates the concurrent acquisition of latent state and action representations for agents. TACO simultaneously learns a state and an action representation by optimizing the mutual information between representations of current states paired with action sequences and representations of the corresponding future states. Theoretically, TACO can be shown to learn state and action representations that encompass sufficient information for control, thereby improving sample efficiency. For online RL, TACO achieves 40% performance boost after one million environment interaction steps on average across nine challenging visual continuous control tasks from Deepmind Control Suite. In addition, we show that TACO can also serve as a plug-and-play module adding to existing offline visual RL methods to establish the new state-of-the-art performance for offline visual RL across offline datasets with varying quality

arXiv.org e-Print Archive

Robust output-feedback predictive control for proximity eddy current de-tumbling with constraints and uncertainty

Author: Chang Haitao
Huang Panfeng
Liu Xiyao
Lu Zhenyu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Proximity operation can significantly improve the efficiency of eddy current de-tumbling. However, the tumbling motion and non-cooperation of space debris make the chaser execute collision avoidance maneuvers and be influenced by model uncertainty. In this paper, an inertial-oriented safety corridor is proposed by taking the debris' angular momentum as the central axis, which can avoid the frequent collision maneuvers of the chaser. Meanwhile, a desired de-tumbling trajectory under this safety corridor is designed to de-tumble the angular velocity of space debris. Then, a robust output-feedback controller considering safety corridor and model uncertainty is proposed by combining moving horizon estimation and model predictive control. The moving horizon estimation is employed to estimate the system state and model uncertainty which is compensated by a feedforward control law. Furthermore, the model predictive control without terminal ingredients is designed to realize the optimal performance of fuel consumption and the robust tracking stability of the system. Finally, taking the Chinese Sinosat-2 satellite as the simulation case, the effectiveness of the proposed scheme is verified

UWE Bristol Research Repository

Rare primary bladder mucosa-associated lymphoid tissue lymphoma: A case report and review of literature

Author: Chaoyou Huang
Fen Li
Xi Tu
Xiyao Zhuang
Youliang Qian
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2023
Field of study

Primary bladder mucosa-associated lymphoid tissue (MALT) lymphoma is an extremely rare bladder tumor. Only scarce reports have been reported. We hereby report a case of an 81-year-old female patient with bladder tumor presenting with frequent urination and dysuria, whose pelvic magnetic resonance imaging (MRI) considered bladder cancer. She underwent transurethral resection of the bladder tumor (TURBT), and histopathology confirmed the mass to be bladder MALT lymphoma. The patient refused further treatment, and no disease recurrence one year after surgery. The current data are insufficient to draw conclusions about the long-term efficacy of treatment for this tumor, regular follow-up is necessary. To further understand the clinical features, pathology, treatment and prognosis of this tumor, we have searched the literature from 1990 to the present, analyzing a total of 64 cases of primary MALT lymphoma

Directory of Open Access Journals

CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models

Author: Chen Zhuang
Dong Yuxiao
Hou Wenjing
Huang Minlie
Huang Yongkang
Peng Libiao
Sabour Sahand
Song Yi
Tang Jie
Wan Dazhen
Wen Bosi
Xiao Xiyao
Yang Jiaming
Yu Jifan
Zhang Xiaohan
Zhang Yijia
Zhou Jinfeng
Publication venue
Publication date: 28/11/2023
Field of study

In this paper, we present CharacterGLM, a series of models built upon ChatGLM, with model sizes ranging from 6B to 66B parameters. Our CharacterGLM is designed for generating Character-based Dialogues (CharacterDial), which aims to equip a conversational AI system with character customization for satisfying people's inherent social desires and emotional needs. On top of CharacterGLM, we can customize various AI characters or social agents by configuring their attributes (identities, interests, viewpoints, experiences, achievements, social relationships, etc.) and behaviors (linguistic features, emotional expressions, interaction patterns, etc.). Our model outperforms most mainstream close-source large langauge models, including the GPT series, especially in terms of consistency, human-likeness, and engagement according to manual evaluations. We will release our 6B version of CharacterGLM and a subset of training data to facilitate further research development in the direction of character-based dialogue generation.Comment: Work in progres

arXiv.org e-Print Archive