Search CORE

21 research outputs found

Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Author: Chen Ning
Su Hang
Yan Dong
Ying Chengyang
Zhou Xinning
Zhu Jun
Publication venue
Publication date: 17/09/2022
Field of study

Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty of both transition and observation. Most of the existing methods for safe reinforcement learning can only handle transition disturbance or observation disturbance since these two kinds of disturbance affect different parts of the agent; besides, the popular worst-case return may lead to overly pessimistic policies. To address these issues, we first theoretically prove that the performance degradation under transition disturbance and observation disturbance depends on a novel metric of Value Function Range (VFR), which corresponds to the gap in the value function between the best state and the worst state. Based on the analysis, we adopt conditional value-at-risk (CVaR) as an assessment of risk and propose a novel reinforcement learning algorithm of CVaR-Proximal-Policy-Optimization (CPPO) which formalizes the risk-sensitive constrained optimization problem by keeping its CVaR under a given threshold. Experimental results show that CPPO achieves a higher cumulative reward and is more robust against both observation and transition disturbances on a series of continuous control tasks in MuJoCo

arXiv.org e-Print Archive

Consistent Attack: Universal Adversarial Perturbation on Embodied Vision Navigation

Author: Ai Jianyong
Ding Wenbo
Qiaoben You
Su Hang
Ying Chengyang
Zhou Xinning
Publication venue: 'Elsevier BV'
Publication date: 25/03/2023
Field of study

Embodied agents in vision navigation coupled with deep neural networks have attracted increasing attention. However, deep neural networks have been shown vulnerable to malicious adversarial noises, which may potentially cause catastrophic failures in Embodied Vision Navigation. Among different adversarial noises, universal adversarial perturbations (UAP), i.e., a constant image-agnostic perturbation applied on every input frame of the agent, play a critical role in Embodied Vision Navigation since they are computation-efficient and application-practical during the attack. However, existing UAP methods ignore the system dynamics of Embodied Vision Navigation and might be sub-optimal. In order to extend UAP to the sequential decision setting, we formulate the disturbed environment under the universal noise

\delta

, as a

\delta

-disturbed Markov Decision Process (

\delta

-MDP). Based on the formulation, we analyze the properties of

\delta

-MDP and propose two novel Consistent Attack methods, named Reward UAP and Trajectory UAP, for attacking Embodied agents, which consider the dynamic of the MDP and calculate universal noises by estimating the disturbed distribution and the disturbed Q function. For various victim models, our Consistent Attack can cause a significant drop in their performance in the PointGoal task in Habitat with different datasets and different scenes. Extensive experimental results indicate that there exist serious potential risks for applying Embodied Vision Navigation methods to the real world

arXiv.org e-Print Archive

Task Aware Dreamer for Task Generalization in Reinforcement Learning

Author: Hao Zhongkai
Liu Songming
Su Hang
Yan Dong
Ying Chengyang
Zhou Xinning
Zhu Jun
Publication venue
Publication date: 01/09/2023
Field of study

A long-standing goal of reinforcement learning is to acquire agents that can learn on training tasks and generalize well on unseen tasks that may share a similar dynamic but with different reward functions. A general challenge is to quantitatively measure the similarities between these different tasks, which is vital for analyzing the task distribution and further designing algorithms with stronger generalization. To address this, we present a novel metric named Task Distribution Relevance (TDR) via optimal Q functions of different tasks to capture the relevance of the task distribution quantitatively. In the case of tasks with a high TDR, i.e., the tasks differ significantly, we show that the Markovian policies cannot differentiate them, leading to poor performance. Based on this insight, we encode all historical information into policies for distinguishing different tasks and propose Task Aware Dreamer (TAD), which extends world models into our reward-informed world models to capture invariant latent features over different tasks. In TAD, we calculate the corresponding variational lower bound of the data log-likelihood, including a novel term to distinguish different tasks via states, to optimize reward-informed world models. Extensive experiments in both image-based control tasks and state-based control tasks demonstrate that TAD can significantly improve the performance of handling different tasks simultaneously, especially for those with high TDR, and demonstrate a strong generalization ability to unseen tasks

arXiv.org e-Print Archive

Search for light dark matter from atmosphere in PandaX-4T

Author: Abdukerim Abdusalam
Bo Zihao
Chen Wei
Chen Xun
Chen Yunhua
Cheng Chen
Cheng Zhaokan
Cui Xiangyi
Fan Yingjie
Fang Deqing
Fu Changbo
Fu Mengting
Geng Lisheng
Giboni Karl
Gu Linhui
Guo Xuyuan
Han Chencheng
Han Ke
He Changda
He Jinrong
Hou Ruquan
Huang Di
Huang Yanlin
Huang Zhou
Ji Xiangdong
Ju Yonglin
Li Chenxiang
Li Jiafu
Li Mingchuan
Li Shu
Li Shuaijie
Lin Qing
Liu Jianglai
Lu Xiaoying
Luo Lingyin
Luo Yunyang
Ma Wenbo
Ma Yugang
Mao Yajun
Meng Yue
Ning Xuyang
Qi Ningchun
Qian Zhicheng
Ren Xiangxiang
Shaheed Nasir
Shang Changsong
Shang Xiaofeng
Shen Guofang
Si Lin
Su Liangliang
Sun Wenliang
Tan Andi
Tao Yi
Wang Anqing
Wang Meng
Wang Qiuhong
Wang Shaobo
Wang Siguang
Wang Wei
Wang Xiuli
Wang Zhou
Wei Yuehuan
Wu Lei
Wu Mengmeng
Wu Weihao
Xia Jingkai
Xiao Mengjiao
Xiao Xiang
Xie Pengwei
Yan Binbin
Yan Xiyu
Yang Jijun
Yang Yong
Yao Yukun
Yu Chunxu
Yuan Jumin
Yuan Ying
Yuan Zhe
Zeng Xinning
Zhang Dan
Zhang Minzhen
Zhang Peng
Zhang Shibo
Zhang Shu
Zhang Tao
Zhang Yang
Zhang Yingxin
Zhang Yuanyuan
Zhao Li
Zheng Qibin
Zhou Jifang
Zhou Ning
Zhou Xiaopeng
Zhou Yong
Zhou Yubo
Publication venue
Publication date: 28/06/2023
Field of study

We report a search for light dark matter produced through the cascading decay of

\eta

mesons, which are created as a result of inelastic collisions between cosmic rays and Earth's atmosphere. We introduce a new and general framework, publicly accessible, designed to address boosted dark matter specifically, with which a full and dedicated simulation including both elastic and quasi-elastic processes of Earth attenuation effect on the dark matter particles arriving at the detector is performed. In the PandaX-4T commissioning data of 0.63 tonne

\cdot

year exposure, no significant excess over background is observed. The first constraints on the interaction between light dark matter generated in the atmosphere and nucleus through a light scalar mediator are obtained. The lowest excluded cross-section is set at

5.9 \times 10^{-37}{\rm cm^2}

for dark matter mass of

0.1

MeV

/c^2

and mediator mass of 300 MeV

/c^2

. The lowest upper limit of

\eta

to dark matter decay branching ratio is

1.6 \times 10^{-7}

arXiv.org e-Print Archive

Document clustering using sample weighting

Author: Su Xinning
Zhang Chengzhi
Zhou Dingmin
Publication venue
Publication date: 01/01/2007
Field of study

Clustering algorithm based on Sample weighting has been noticed recently. In this paper, a novel sample weighting clustering algorithm is presented based on K-Means and fuzzy C-Means algorithm. The algorithm uses academic documents as the clustering objects. The PageRank value of each document is calculated according to the cited relationship among them, and it is used as the weight in the algorithm. Experiments show that the proposed algorithm is effective to improve performance of document clustering

E-LIS

A quantitative evaluation system of Chinese journals in the humanities and social sciences

Author: SU Xinning
Su Xinning (E-mail: [email protected])
ZHOU Ping
Publication venue
Publication date: 15/11/2009
Field of study

Based on analyses on existing indicators for evaluating journals in the humanities and social sciences and our experience in constructing the Chinese Social Science Citation Index (CSSCI), we proposed a comprehensive system for evaluating Chinese academic journals in the humanities and social sciences. This system constitutes 8 primary indicators and 17 sub-indicators for multidisciplinary journals and 19 sub-indicators for discipline-specific journals. Each indicator or sub-indicator is assigned a suitable weight according to its importance in terms of measuring a journal’s academic quality and/or impact.</p

National Science Library,Chinese Academy of Sciences

A quantitative evaluation system of Chinese journals in the humanities and social sciences journals in the humanities and social sciences

Author: Su Xinning
Zhou Ping
Publication venue: Beijing Magtech
Publication date: 01/01/2009
Field of study

Lirias

Understanding Adversarial Attacks on Observations in Deep Reinforcement Learning

Author: Qiaoben You
Su Hang
Ying Chengyang
Zhang Bo
Zhou Xinning
Zhu Jun
Publication venue
Publication date: 01/12/2021
Field of study

Deep reinforcement learning models are vulnerable to adversarial attacks that can decrease a victim's cumulative expected reward by manipulating the victim's observations. Despite the efficiency of previous optimization-based methods for generating adversarial noise in supervised learning, such methods might not be able to achieve the lowest cumulative reward since they do not explore the environmental dynamics in general. In this paper, we provide a framework to better understand the existing methods by reformulating the problem of adversarial attacks on reinforcement learning in the function space. Our reformulation generates an optimal adversary in the function space of the targeted attacks, repelling them via a generic two-stage framework. In the first stage, we train a deceptive policy by hacking the environment, and discover a set of trajectories routing to the lowest reward or the worst-case performance. Next, the adversary misleads the victim to imitate the deceptive policy by perturbing the observations. Compared to existing approaches, we theoretically show that our adversary is stronger under an appropriate noise level. Extensive experiments demonstrate our method's superiority in terms of efficiency and effectiveness, achieving the state-of-the-art performance in both Atari and MuJoCo environments

arXiv.org e-Print Archive

Guest editorial

Author: Chengzhi Zhang
Daqing He
Xinning Su
Publication venue: 'Emerald'
Publication date
Field of study

Crossref

An Implementation of Actor-Critic Algorithm on Spiking Neural Network Using Temporal Coding Method

Author: Huangchao Yu
Junqi Lu
Su Cao
Xiangke Wang
Xinning Wu
Publication venue: 'MDPI AG'
Publication date: 16/10/2022
Field of study

Taking advantage of faster speed, less resource consumption and better biological interpretability of spiking neural networks, this paper developed a novel spiking neural network reinforcement learning method using actor-critic architecture and temporal coding. The simple improved leaky integrate-and-fire (LIF) model was used to describe the behavior of a spike neuron. Then the actor-critic network structure and the update formulas using temporally encoded information were provided. The current model was finally examined in the decision-making task, the gridworld task, the UAV flying through a window task and the avoiding a flying basketball task. In the 5 × 5 grid map, the value function learned was close to the ideal situation and the quickest way from one state to another was found. A UAV trained by this method was able to fly through the window quickly in simulation. An actual flight test of a UAV avoiding a flying basketball was conducted. With this model, the success rate of the test was 96% and the average decision time was 41.3 ms. The results show the effectiveness and accuracy of the temporal coded spiking neural network RL method. In conclusion, an attempt was made to provide insights into developing spiking neural network reinforcement learning methods for decision-making and autonomous control of unmanned systems

Multidisciplinary Digital Publishing Institute