53 research outputs found
Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy
Model-based reinforcement learning (RL) often achieves higher sample
efficiency in practice than model-free RL by learning a dynamics model to
generate samples for policy learning. Previous works learn a dynamics model
that fits under the empirical state-action visitation distribution for all
historical policies, i.e., the sample replay buffer. However, in this paper, we
observe that fitting the dynamics model under the distribution for \emph{all
historical policies} does not necessarily benefit model prediction for the
\emph{current policy} since the policy in use is constantly evolving over time.
The evolving policy during training will cause state-action visitation
distribution shifts. We theoretically analyze how this distribution shift over
historical policies affects the model learning and model rollouts. We then
propose a novel dynamics model learning method, named \textit{Policy-adapted
Dynamics Model Learning (PDML)}. PDML dynamically adjusts the historical policy
mixture distribution to ensure the learned model can continually adapt to the
state-action visitation distribution of the evolving policy. Experiments on a
range of continuous control environments in MuJoCo show that PDML achieves
significant improvement in sample efficiency and higher asymptotic performance
combined with the state-of-the-art model-based RL methods.Comment: 16 pages, 5 figure
Contrastive Brain Network Learning via Hierarchical Signed Graph Pooling Model
Recently brain networks have been widely adopted to study brain dynamics,
brain development and brain diseases. Graph representation learning techniques
on brain functional networks can facilitate the discovery of novel biomarkers
for clinical phenotypes and neurodegenerative diseases. However, current graph
learning techniques have several issues on brain network mining. Firstly, most
current graph learning models are designed for unsigned graph, which hinders
the analysis of many signed network data (e.g., brain functional networks).
Meanwhile, the insufficiency of brain network data limits the model performance
on clinical phenotypes predictions. Moreover, few of current graph learning
model is interpretable, which may not be capable to provide biological insights
for model outcomes. Here, we propose an interpretable hierarchical signed graph
representation learning model to extract graph-level representations from brain
functional networks, which can be used for different prediction tasks. In order
to further improve the model performance, we also propose a new strategy to
augment functional brain network data for contrastive learning. We evaluate
this framework on different classification and regression tasks using the data
from HCP and OASIS. Our results from extensive experiments demonstrate the
superiority of the proposed model compared to several state-of-the-art
techniques. Additionally, we use graph saliency maps, derived from these
prediction tasks, to demonstrate detection and interpretation of phenotypic
biomarkers
COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL
Dyna-style model-based reinforcement learning contains two phases: model
rollouts to generate sample for policy learning and real environment
exploration using current policy for dynamics model learning. However, due to
the complex real-world environment, it is inevitable to learn an imperfect
dynamics model with model prediction error, which can further mislead policy
learning and result in sub-optimal solutions. In this paper, we propose
, a planning-driven framework for model-based methods to
address the inaccurately learned dynamics model problem with conservative model
rollouts and optimistic environment exploration. leverages
an uncertainty-aware policy-guided model predictive control (UP-MPC) component
to plan for multi-step uncertainty estimation. This estimated uncertainty then
serves as a penalty during model rollouts and as a bonus during real
environment exploration respectively, to choose actions. Consequently,
can avoid model uncertain regions through conservative
model rollouts, thereby alleviating the influence of model error.
Simultaneously, it explores high-reward model uncertain regions to reduce model
error actively through optimistic real environment exploration.
is a plug-and-play framework that can be applied to any
dyna-style model-based methods. Experimental results on a series of
proprioceptive and visual continuous control tasks demonstrate that both sample
efficiency and asymptotic performance of strong model-based methods are
significantly improved combined with .Comment: 22 pages, 17 figure
3D bi-directional transformer U-Net for medical image segmentation
As one of the popular deep learning methods, deep convolutional neural networks (DCNNs) have been widely adopted in segmentation tasks and have received positive feedback. However, in segmentation tasks, DCNN-based frameworks are known for their incompetence in dealing with global relations within imaging features. Although several techniques have been proposed to enhance the global reasoning of DCNN, these models are either not able to gain satisfying performances compared with traditional fully-convolutional structures or not capable of utilizing the basic advantages of CNN-based networks (namely the ability of local reasoning). In this study, compared with current attempts to combine FCNs and global reasoning methods, we fully extracted the ability of self-attention by designing a novel attention mechanism for 3D computation and proposed a new segmentation framework (named 3DTU) for three-dimensional medical image segmentation tasks. This new framework processes images in an end-to-end manner and executes 3D computation on both the encoder side (which contains a 3D transformer) and the decoder side (which is based on a 3D DCNN). We tested our framework on two independent datasets that consist of 3D MRI and CT images. Experimental results clearly demonstrate that our method outperforms several state-of-the-art segmentation methods in various metrics
TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning
Despite recent progress in reinforcement learning (RL) from raw pixel data,
sample inefficiency continues to present a substantial obstacle. Prior works
have attempted to address this challenge by creating self-supervised auxiliary
tasks, aiming to enrich the agent's learned representations with
control-relevant information for future state prediction. However, these
objectives are often insufficient to learn representations that can represent
the optimal policy or value function, and they often consider tasks with small,
abstract discrete action spaces and thus overlook the importance of action
representation learning in continuous control. In this paper, we introduce
TACO: Temporal Action-driven Contrastive Learning, a simple yet powerful
temporal contrastive learning approach that facilitates the concurrent
acquisition of latent state and action representations for agents. TACO
simultaneously learns a state and an action representation by optimizing the
mutual information between representations of current states paired with action
sequences and representations of the corresponding future states.
Theoretically, TACO can be shown to learn state and action representations that
encompass sufficient information for control, thereby improving sample
efficiency. For online RL, TACO achieves 40% performance boost after one
million environment interaction steps on average across nine challenging visual
continuous control tasks from Deepmind Control Suite. In addition, we show that
TACO can also serve as a plug-and-play module adding to existing offline visual
RL methods to establish the new state-of-the-art performance for offline visual
RL across offline datasets with varying quality
Robust output-feedback predictive control for proximity eddy current de-tumbling with constraints and uncertainty
Proximity operation can significantly improve the efficiency of eddy current de-tumbling. However, the tumbling motion and non-cooperation of space debris make the chaser execute collision avoidance maneuvers and be influenced by model uncertainty. In this paper, an inertial-oriented safety corridor is proposed by taking the debris' angular momentum as the central axis, which can avoid the frequent collision maneuvers of the chaser. Meanwhile, a desired de-tumbling trajectory under this safety corridor is designed to de-tumble the angular velocity of space debris. Then, a robust output-feedback controller considering safety corridor and model uncertainty is proposed by combining moving horizon estimation and model predictive control. The moving horizon estimation is employed to estimate the system state and model uncertainty which is compensated by a feedforward control law. Furthermore, the model predictive control without terminal ingredients is designed to realize the optimal performance of fuel consumption and the robust tracking stability of the system. Finally, taking the Chinese Sinosat-2 satellite as the simulation case, the effectiveness of the proposed scheme is verified
Rare primary bladder mucosa-associated lymphoid tissue lymphoma: A case report and review of literature
Primary bladder mucosa-associated lymphoid tissue (MALT) lymphoma is an extremely rare bladder tumor. Only scarce reports have been reported. We hereby report a case of an 81-year-old female patient with bladder tumor presenting with frequent urination and dysuria, whose pelvic magnetic resonance imaging (MRI) considered bladder cancer. She underwent transurethral resection of the bladder tumor (TURBT), and histopathology confirmed the mass to be bladder MALT lymphoma. The patient refused further treatment, and no disease recurrence one year after surgery. The current data are insufficient to draw conclusions about the long-term efficacy of treatment for this tumor, regular follow-up is necessary. To further understand the clinical features, pathology, treatment and prognosis of this tumor, we have searched the literature from 1990 to the present, analyzing a total of 64 cases of primary MALT lymphoma
CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models
In this paper, we present CharacterGLM, a series of models built upon
ChatGLM, with model sizes ranging from 6B to 66B parameters. Our CharacterGLM
is designed for generating Character-based Dialogues (CharacterDial), which
aims to equip a conversational AI system with character customization for
satisfying people's inherent social desires and emotional needs. On top of
CharacterGLM, we can customize various AI characters or social agents by
configuring their attributes (identities, interests, viewpoints, experiences,
achievements, social relationships, etc.) and behaviors (linguistic features,
emotional expressions, interaction patterns, etc.). Our model outperforms most
mainstream close-source large langauge models, including the GPT series,
especially in terms of consistency, human-likeness, and engagement according to
manual evaluations. We will release our 6B version of CharacterGLM and a subset
of training data to facilitate further research development in the direction of
character-based dialogue generation.Comment: Work in progres
- …