53 research outputs found

    Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy

    Full text link
    Model-based reinforcement learning (RL) often achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning. Previous works learn a dynamics model that fits under the empirical state-action visitation distribution for all historical policies, i.e., the sample replay buffer. However, in this paper, we observe that fitting the dynamics model under the distribution for \emph{all historical policies} does not necessarily benefit model prediction for the \emph{current policy} since the policy in use is constantly evolving over time. The evolving policy during training will cause state-action visitation distribution shifts. We theoretically analyze how this distribution shift over historical policies affects the model learning and model rollouts. We then propose a novel dynamics model learning method, named \textit{Policy-adapted Dynamics Model Learning (PDML)}. PDML dynamically adjusts the historical policy mixture distribution to ensure the learned model can continually adapt to the state-action visitation distribution of the evolving policy. Experiments on a range of continuous control environments in MuJoCo show that PDML achieves significant improvement in sample efficiency and higher asymptotic performance combined with the state-of-the-art model-based RL methods.Comment: 16 pages, 5 figure

    Contrastive Brain Network Learning via Hierarchical Signed Graph Pooling Model

    Full text link
    Recently brain networks have been widely adopted to study brain dynamics, brain development and brain diseases. Graph representation learning techniques on brain functional networks can facilitate the discovery of novel biomarkers for clinical phenotypes and neurodegenerative diseases. However, current graph learning techniques have several issues on brain network mining. Firstly, most current graph learning models are designed for unsigned graph, which hinders the analysis of many signed network data (e.g., brain functional networks). Meanwhile, the insufficiency of brain network data limits the model performance on clinical phenotypes predictions. Moreover, few of current graph learning model is interpretable, which may not be capable to provide biological insights for model outcomes. Here, we propose an interpretable hierarchical signed graph representation learning model to extract graph-level representations from brain functional networks, which can be used for different prediction tasks. In order to further improve the model performance, we also propose a new strategy to augment functional brain network data for contrastive learning. We evaluate this framework on different classification and regression tasks using the data from HCP and OASIS. Our results from extensive experiments demonstrate the superiority of the proposed model compared to several state-of-the-art techniques. Additionally, we use graph saliency maps, derived from these prediction tasks, to demonstrate detection and interpretation of phenotypic biomarkers

    COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL

    Full text link
    Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration using current policy for dynamics model learning. However, due to the complex real-world environment, it is inevitable to learn an imperfect dynamics model with model prediction error, which can further mislead policy learning and result in sub-optimal solutions. In this paper, we propose COPlanner\texttt{COPlanner}, a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem with conservative model rollouts and optimistic environment exploration. COPlanner\texttt{COPlanner} leverages an uncertainty-aware policy-guided model predictive control (UP-MPC) component to plan for multi-step uncertainty estimation. This estimated uncertainty then serves as a penalty during model rollouts and as a bonus during real environment exploration respectively, to choose actions. Consequently, COPlanner\texttt{COPlanner} can avoid model uncertain regions through conservative model rollouts, thereby alleviating the influence of model error. Simultaneously, it explores high-reward model uncertain regions to reduce model error actively through optimistic real environment exploration. COPlanner\texttt{COPlanner} is a plug-and-play framework that can be applied to any dyna-style model-based methods. Experimental results on a series of proprioceptive and visual continuous control tasks demonstrate that both sample efficiency and asymptotic performance of strong model-based methods are significantly improved combined with COPlanner\texttt{COPlanner}.Comment: 22 pages, 17 figure

    3D bi-directional transformer U-Net for medical image segmentation

    Get PDF
    As one of the popular deep learning methods, deep convolutional neural networks (DCNNs) have been widely adopted in segmentation tasks and have received positive feedback. However, in segmentation tasks, DCNN-based frameworks are known for their incompetence in dealing with global relations within imaging features. Although several techniques have been proposed to enhance the global reasoning of DCNN, these models are either not able to gain satisfying performances compared with traditional fully-convolutional structures or not capable of utilizing the basic advantages of CNN-based networks (namely the ability of local reasoning). In this study, compared with current attempts to combine FCNs and global reasoning methods, we fully extracted the ability of self-attention by designing a novel attention mechanism for 3D computation and proposed a new segmentation framework (named 3DTU) for three-dimensional medical image segmentation tasks. This new framework processes images in an end-to-end manner and executes 3D computation on both the encoder side (which contains a 3D transformer) and the decoder side (which is based on a 3D DCNN). We tested our framework on two independent datasets that consist of 3D MRI and CT images. Experimental results clearly demonstrate that our method outperforms several state-of-the-art segmentation methods in various metrics

    TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

    Full text link
    Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle. Prior works have attempted to address this challenge by creating self-supervised auxiliary tasks, aiming to enrich the agent's learned representations with control-relevant information for future state prediction. However, these objectives are often insufficient to learn representations that can represent the optimal policy or value function, and they often consider tasks with small, abstract discrete action spaces and thus overlook the importance of action representation learning in continuous control. In this paper, we introduce TACO: Temporal Action-driven Contrastive Learning, a simple yet powerful temporal contrastive learning approach that facilitates the concurrent acquisition of latent state and action representations for agents. TACO simultaneously learns a state and an action representation by optimizing the mutual information between representations of current states paired with action sequences and representations of the corresponding future states. Theoretically, TACO can be shown to learn state and action representations that encompass sufficient information for control, thereby improving sample efficiency. For online RL, TACO achieves 40% performance boost after one million environment interaction steps on average across nine challenging visual continuous control tasks from Deepmind Control Suite. In addition, we show that TACO can also serve as a plug-and-play module adding to existing offline visual RL methods to establish the new state-of-the-art performance for offline visual RL across offline datasets with varying quality

    Robust output-feedback predictive control for proximity eddy current de-tumbling with constraints and uncertainty

    Get PDF
    Proximity operation can significantly improve the efficiency of eddy current de-tumbling. However, the tumbling motion and non-cooperation of space debris make the chaser execute collision avoidance maneuvers and be influenced by model uncertainty. In this paper, an inertial-oriented safety corridor is proposed by taking the debris' angular momentum as the central axis, which can avoid the frequent collision maneuvers of the chaser. Meanwhile, a desired de-tumbling trajectory under this safety corridor is designed to de-tumble the angular velocity of space debris. Then, a robust output-feedback controller considering safety corridor and model uncertainty is proposed by combining moving horizon estimation and model predictive control. The moving horizon estimation is employed to estimate the system state and model uncertainty which is compensated by a feedforward control law. Furthermore, the model predictive control without terminal ingredients is designed to realize the optimal performance of fuel consumption and the robust tracking stability of the system. Finally, taking the Chinese Sinosat-2 satellite as the simulation case, the effectiveness of the proposed scheme is verified

    Rare primary bladder mucosa-associated lymphoid tissue lymphoma: A case report and review of literature

    Get PDF
    Primary bladder mucosa-associated lymphoid tissue (MALT) lymphoma is an extremely rare bladder tumor. Only scarce reports have been reported. We hereby report a case of an 81-year-old female patient with bladder tumor presenting with frequent urination and dysuria, whose pelvic magnetic resonance imaging (MRI) considered bladder cancer. She underwent transurethral resection of the bladder tumor (TURBT), and histopathology confirmed the mass to be bladder MALT lymphoma. The patient refused further treatment, and no disease recurrence one year after surgery. The current data are insufficient to draw conclusions about the long-term efficacy of treatment for this tumor, regular follow-up is necessary. To further understand the clinical features, pathology, treatment and prognosis of this tumor, we have searched the literature from 1990 to the present, analyzing a total of 64 cases of primary MALT lymphoma

    CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models

    Full text link
    In this paper, we present CharacterGLM, a series of models built upon ChatGLM, with model sizes ranging from 6B to 66B parameters. Our CharacterGLM is designed for generating Character-based Dialogues (CharacterDial), which aims to equip a conversational AI system with character customization for satisfying people's inherent social desires and emotional needs. On top of CharacterGLM, we can customize various AI characters or social agents by configuring their attributes (identities, interests, viewpoints, experiences, achievements, social relationships, etc.) and behaviors (linguistic features, emotional expressions, interaction patterns, etc.). Our model outperforms most mainstream close-source large langauge models, including the GPT series, especially in terms of consistency, human-likeness, and engagement according to manual evaluations. We will release our 6B version of CharacterGLM and a subset of training data to facilitate further research development in the direction of character-based dialogue generation.Comment: Work in progres
    corecore