60 research outputs found

    Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy

    Full text link
    Model-based reinforcement learning (RL) often achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning. Previous works learn a dynamics model that fits under the empirical state-action visitation distribution for all historical policies, i.e., the sample replay buffer. However, in this paper, we observe that fitting the dynamics model under the distribution for \emph{all historical policies} does not necessarily benefit model prediction for the \emph{current policy} since the policy in use is constantly evolving over time. The evolving policy during training will cause state-action visitation distribution shifts. We theoretically analyze how this distribution shift over historical policies affects the model learning and model rollouts. We then propose a novel dynamics model learning method, named \textit{Policy-adapted Dynamics Model Learning (PDML)}. PDML dynamically adjusts the historical policy mixture distribution to ensure the learned model can continually adapt to the state-action visitation distribution of the evolving policy. Experiments on a range of continuous control environments in MuJoCo show that PDML achieves significant improvement in sample efficiency and higher asymptotic performance combined with the state-of-the-art model-based RL methods.Comment: 16 pages, 5 figure

    Ecological Discourse Analysis and Ecological Diplomacy Analysis of Coverage about Beijing Winter Olympics Based on Transitivity System

    Get PDF
    This study aims to interpret the ecological meaning of discourse from system-functional linguistics perspective and analyses the ecological diplomacy ideas embedded in reporting, and to guide people to develop an ecological consciousness of living in harmony with nature through the analysis of ecological orientations. The study explores the ecological factors in the participant, process and environmental components. The results show that in terms of participant role distribution, material process participants and relational process participants account for the largest proportion. In terms of transitivity processes, the focus is on the use of material and relational processes. In terms of the distribution of circumstantial element, there are significant differences in the use of ecological discourse in coverage. The idea of ecological diplomacy is also implicitly reflected in the coverage

    COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL

    Full text link
    Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration using current policy for dynamics model learning. However, due to the complex real-world environment, it is inevitable to learn an imperfect dynamics model with model prediction error, which can further mislead policy learning and result in sub-optimal solutions. In this paper, we propose COPlanner\texttt{COPlanner}, a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem with conservative model rollouts and optimistic environment exploration. COPlanner\texttt{COPlanner} leverages an uncertainty-aware policy-guided model predictive control (UP-MPC) component to plan for multi-step uncertainty estimation. This estimated uncertainty then serves as a penalty during model rollouts and as a bonus during real environment exploration respectively, to choose actions. Consequently, COPlanner\texttt{COPlanner} can avoid model uncertain regions through conservative model rollouts, thereby alleviating the influence of model error. Simultaneously, it explores high-reward model uncertain regions to reduce model error actively through optimistic real environment exploration. COPlanner\texttt{COPlanner} is a plug-and-play framework that can be applied to any dyna-style model-based methods. Experimental results on a series of proprioceptive and visual continuous control tasks demonstrate that both sample efficiency and asymptotic performance of strong model-based methods are significantly improved combined with COPlanner\texttt{COPlanner}.Comment: 22 pages, 17 figure

    Trojan Horse nanotheranostics with dual transformability and multifunctionality for highly effective cancer treatment.

    Get PDF
    Nanotheranostics with integrated diagnostic and therapeutic functions show exciting potentials towards precision nanomedicine. However, targeted delivery of nanotheranostics is hindered by several biological barriers. Here, we report the development of a dual size/charge- transformable, Trojan-Horse nanoparticle (pPhD NP) for delivery of ultra-small, full active pharmaceutical ingredients (API) nanotheranostics with integrated dual-modal imaging and trimodal therapeutic functions. pPhD NPs exhibit ideal size and charge for drug transportation. In tumour microenvironment, pPhD NPs responsively transform to full API nanotheranostics with ultra-small size and higher surface charge, which dramatically facilitate the tumour penetration and cell internalisation. pPhD NPs enable visualisation of biodistribution by near-infrared fluorescence imaging, tumour accumulation and therapeutic effect by magnetic resonance imaging. Moreover, the synergistic photothermal-, photodynamic- and chemo-therapies achieve a 100% complete cure rate on both subcutaneous and orthotopic oral cancer models. This nanoplatform with powerful delivery efficiency and versatile theranostic functions shows enormous potentials to improve cancer treatment

    Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning

    Full text link
    Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks due to its high sample efficiency. To save the computation cost of conducting planning online, recent practices tend to distill optimized action sequences into an RL policy during the training phase. Although the distillation can incorporate both the foresight of planning and the exploration ability of RL policies, the theoretical understanding of these methods is yet unclear. In this paper, we extend the policy improvement step of Soft Actor-Critic (SAC) by developing an approach to distill from model-based planning to the policy. We then demonstrate that such an approach of policy improvement has a theoretical guarantee of monotonic improvement and convergence to the maximum value defined in SAC. We discuss effective design choices and implement our theory as a practical algorithm -- Model-based Planning Distilled to Policy (MPDP) -- that updates the policy jointly over multiple future time steps. Extensive experiments show that MPDP achieves better sample efficiency and asymptotic performance than both model-free and model-based planning algorithms on six continuous control benchmark tasks in MuJoCo

    A quasi-experimental study of the volume-based procurement (VBP) effect on antiviral medications of hepatitis B virus in China

    Get PDF
    Background: The Pilot Plan of National Centralized Volume-Based Procurement (NCVBP) was adopted to cope with the rapid increase in drug expenditures. This research aimed to quantitatively evaluate the impact of the NCVBP on antiviral medications for the hepatitis B virus.Methods: Data on nucleoside analogs (NAs) medications of hepatitis B virus monthly procurement records in the pilot cities from January 2018 to December 2019 were extracted from the China Drug Supply Information Platform (CDSIP). The impacts of the NCVBP on purchased volumes, expenditures, and pre-defined daily dose costs were evaluated by interrupted time-series (ITS) analysis using Stata 16.0. We constructed two segments with one interruptive point (March 2019).Results: Compared to the same period between pre-and post-intervention, the purchased volume of NAs medications were increased by 92.85%, and selected medications were increased by 119.09%. Analysis of changes in the level of NAs medication followed a decrease in purchased expenditure (coefficient: 5364.88, p < 0.001), meanwhile, the purchased volume was increased with statistical significance (coefficient:605.49, p < 0.001). The Defined Daily Dose cost (DDDc) of NAs medication followed a decrease (coefficient: 8.90, p < 0.001). The NCVBP reform was followed by an increase of 618.41 ten thousand Defined Daily Dose (DDD) (p < 0.001) in purchased volume and a reduction of 5273.84 ten thousand Chinese Yuan (CNY) (p < 0.001) in the purchased expenditure of selected medications in the level. The DDDc of selected medications decreased in the level (coefficient: 9.87, p < 0.001), while the DDDc of alternative medications increased in the slope (coefficient:0.07, p = 0.030). The purchased volume and expenditure of bid-winning products increased by 964.08 ten thousand DDD and 637.36 ten thousand CNY in the level (p < 0.001). An increase of 633.46 ten thousand DDD (p < 0.001) in purchased volume and a reduction of 4285.32 ten thousand CNY (p < 0.001) in the purchased expenditure of generic drugs in the level was observed.Conclusion: The NCVBP reduced the DDDc of NAs medication, improved the utilization of the selected medications, and promoted the usage of generic products

    Absolute monocyte counts could predict disease activity and secondary loss of response of patients with Crohn's disease treated with anti-TNF-α drug.

    No full text
    BackgroundAssessing Crohn's disease (CD) activity is critical for monitoring disease progression. In CD, monocytes could release TNF-α. Thus, it is extremely important to study its role in the disease activity and loss of response to anti-TNF-α biologics.MethodsIn this study, we collected CD patients treated with biologics from January 2017 to May 2022. Indicators associated with disease activity were evaluated by Spearman correlation analysis and Mann-Whitney U test. Specifically, logistic analyses were used to explore the predictors of primary nonresponse (PNR) and secondary loss of response (SLOR) within 1 year of anti-TNF-α agents. In addition, a nomogram was developed for therapeutic effect prediction.Results283 patients with CD were identified. Disease activity group, defined as CDAI equal to or greater than 150, had significant elevated absolute monocyte counts than disease remission group based on CDAI score (p = 0.019, Z = -2.354). Logistic analyses showed that absolute monocyte counts could be an independent predictor of 1-year SLOR of anti-TNF-α agents in CD patients (p = 0.013). A nomogram established based on gender, absolute monocyte counts, and hemoglobin could predict SLOR within 1 year of anti-TNF-α agents reliably.ConclusionThe results of this study support the utility of absolute monocyte counts detecting disease activity and anti-TNF-α therapy effect in patients with CD

    Spectral super-resolution reflectance retrieval from remotely sensed imaging spectrometer data

    Full text link
    Existing atmospheric correction methods retrieve surface reflectance keeping the same nominal spectral response functions (SRFs) as that of the airborne/spaceborne imaging spectrometer radiance data. Since the SRFs vary dependent on sensor type and configuration, the retrieved reflectance of the same ground object varies from sensor to sensor as well. This imposes evident limitations on data validation efforts between sensors at surface reflectance level. We propose a method to retrieve super-resolution reflectance at the surface, by combining the first-principles atmospheric correction method FLAASH (fast line-of-sight atmospheric analysis of spectral hypercubes) with spectral super-resolution of imaging spectrometer radiance data. This approach is validated by comparing airborne AVIRIS (airborne visible/infrared imaging spectrometer) and spaceborne Hyperion data. The results demonstrate that the super-resolution reflectance in spectral bands with sufficiently high signal-to-noise ratio (SNR) serves as intermediate quantity to cross validate data originating from different imaging spectrometers

    The ROC curves of the single indicator and the combined indicator to predict disease activity.

    No full text
    Abbreviation: CDAI, Crohn’s Disease Activity Index; Hb, Hemoglobin; PLT, platelet count; HCT: Hematocrit; PT, prothrombin time; APTT, activated partial thromboplastin time; ESR, erythrocyte sedimentation rate; CRP, C-reactive protein, ALB, albumin; TBIL, total bilirubin.</p
    • …
    corecore