29 research outputs found

    Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

    Full text link
    Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty of both transition and observation. Most of the existing methods for safe reinforcement learning can only handle transition disturbance or observation disturbance since these two kinds of disturbance affect different parts of the agent; besides, the popular worst-case return may lead to overly pessimistic policies. To address these issues, we first theoretically prove that the performance degradation under transition disturbance and observation disturbance depends on a novel metric of Value Function Range (VFR), which corresponds to the gap in the value function between the best state and the worst state. Based on the analysis, we adopt conditional value-at-risk (CVaR) as an assessment of risk and propose a novel reinforcement learning algorithm of CVaR-Proximal-Policy-Optimization (CPPO) which formalizes the risk-sensitive constrained optimization problem by keeping its CVaR under a given threshold. Experimental results show that CPPO achieves a higher cumulative reward and is more robust against both observation and transition disturbances on a series of continuous control tasks in MuJoCo

    Task Aware Dreamer for Task Generalization in Reinforcement Learning

    Full text link
    A long-standing goal of reinforcement learning is to acquire agents that can learn on training tasks and generalize well on unseen tasks that may share a similar dynamic but with different reward functions. A general challenge is to quantitatively measure the similarities between these different tasks, which is vital for analyzing the task distribution and further designing algorithms with stronger generalization. To address this, we present a novel metric named Task Distribution Relevance (TDR) via optimal Q functions of different tasks to capture the relevance of the task distribution quantitatively. In the case of tasks with a high TDR, i.e., the tasks differ significantly, we show that the Markovian policies cannot differentiate them, leading to poor performance. Based on this insight, we encode all historical information into policies for distinguishing different tasks and propose Task Aware Dreamer (TAD), which extends world models into our reward-informed world models to capture invariant latent features over different tasks. In TAD, we calculate the corresponding variational lower bound of the data log-likelihood, including a novel term to distinguish different tasks via states, to optimize reward-informed world models. Extensive experiments in both image-based control tasks and state-based control tasks demonstrate that TAD can significantly improve the performance of handling different tasks simultaneously, especially for those with high TDR, and demonstrate a strong generalization ability to unseen tasks

    A Search for Light Fermionic Dark Matter Absorption on Electrons in PandaX-4T

    Full text link
    We report a search on a sub-MeV fermionic dark matter absorbed by electrons with an outgoing active neutrino using the 0.63 tonne-year exposure collected by PandaX-4T liquid xenon experiment. No significant signals are observed over the expected background. The data are interpreted into limits to the effective couplings between such dark matter and electrons. For axial-vector or vector interactions, our sensitivity is competitive in comparison to existing astrophysical bounds on the decay of such dark matter into photon final states. In particular, we present the first direct detection limits for an axial-vector (vector) interaction which are the strongest in the mass range from 25 to 45 (35 to 50) keV/c2^2

    Research on changes and circulations of sea-ice in Eurasian in recent 50 years

    Get PDF
    Using the monthly 1° x 1° sea-ice concentration data of Hadley center and the monthly NCEP geopotential height data from January 1953 to February 2003, temporal and spatial changing characters of sea-ice are examined. The results show almost all of the sea-ice of eight regions was decreasing, especially all seasons in Europe. But in Asia part, those display some increasing trends in spring and winter. Abrupt times of sea-ice in Europe were at end of 1970's and in Asia the times in summer/fall (spring/winter) were at end of 1980's
    corecore