Search CORE

149 research outputs found

A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games

Author: Ding Zihan
Jin Chi
Liu Qinghua
Su Dijia
Publication venue
Publication date: 18/07/2022
Field of study

This paper proposes novel, end-to-end deep reinforcement learning algorithms for learning two-player zero-sum Markov games. Our objective is to find the Nash Equilibrium policies, which are free from exploitation by adversarial opponents. Distinct from prior efforts on finding Nash equilibria in extensive-form games such as Poker, which feature tree-structured transition dynamics and discrete state space, this paper focuses on Markov games with general transition dynamics and continuous state space. We propose (1) Nash DQN algorithm, which integrates DQN with a Nash finding subroutine for the joint value functions; and (2) Nash DQN Exploiter algorithm, which additionally adopts an exploiter for guiding agent's exploration. Our algorithms are the practical variants of theoretical algorithms which are guaranteed to converge to Nash equilibria in the basic tabular setting. Experimental evaluation on both tabular examples and two-player Atari games demonstrates the robustness of the proposed algorithms against adversarial opponents, as well as their advantageous performance over existing methods

arXiv.org e-Print Archive

Rebalanced Zero-shot Learning

Author: Huang Kaizhu
Jin Xiaobo
Liu Youfa
Yang Guanyu
Ye Zihan
Publication venue
Publication date: 13/07/2023
Field of study

Zero-shot learning (ZSL) aims to identify unseen classes with zero samples during training. Broadly speaking, present ZSL methods usually adopt class-level semantic labels and compare them with instance-level semantic predictions to infer unseen classes. However, we find that such existing models mostly produce imbalanced semantic predictions, i.e. these models could perform precisely for some semantics, but may not for others. To address the drawback, we aim to introduce an imbalanced learning framework into ZSL. However, we find that imbalanced ZSL has two unique challenges: (1) Its imbalanced predictions are highly correlated with the value of semantic labels rather than the number of samples as typically considered in the traditional imbalanced learning; (2) Different semantics follow quite different error distributions between classes. To mitigate these issues, we first formalize ZSL as an imbalanced regression problem which offers empirical evidences to interpret how semantic labels lead to imbalanced semantic predictions. We then propose a re-weighted loss termed Re-balanced Mean-Squared Error (ReMSE), which tracks the mean and variance of error distributions, thus ensuring rebalanced learning across classes. As a major contribution, we conduct a series of analyses showing that ReMSE is theoretically well established. Extensive experiments demonstrate that the proposed method effectively alleviates the imbalance in semantic prediction and outperforms many state-of-the-art ZSL methods. Our code is available at https://github.com/FouriYe/ReZSL-TIP23.Comment: Accepted to IEEE Transactions on Image Processing (TIP) 202

arXiv.org e-Print Archive

CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario

Author: Ding Yaoyao
Feng Siyuan
Jin Haiming
Li Zhenhui
Liu Chang
Yu Yong
Zhang Huichu
Zhang Weinan
Zhou Zihan
Zhu Yichen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/05/2019
Field of study

Traffic signal control is an emerging application scenario for reinforcement learning. Besides being as an important problem that affects people's daily life in commuting, traffic signal control poses its unique challenges for reinforcement learning in terms of adapting to dynamic traffic environment and coordinating thousands of agents including vehicles and pedestrians. A key factor in the success of modern reinforcement learning relies on a good simulator to generate a large number of data samples for learning. The most commonly used open-source traffic simulator SUMO is, however, not scalable to large road network and large traffic flow, which hinders the study of reinforcement learning on traffic scenarios. This motivates us to create a new traffic simulator CityFlow with fundamentally optimized data structures and efficient algorithms. CityFlow can support flexible definitions for road network and traffic flow based on synthetic and real-world data. It also provides user-friendly interface for reinforcement learning. Most importantly, CityFlow is more than twenty times faster than SUMO and is capable of supporting city-wide traffic simulation with an interactive render for monitoring. Besides traffic signal control, CityFlow could serve as the base for other transportation studies and can create new possibilities to test machine learning methods in the intelligent transportation domain.Comment: WWW 2019 Demo Pape

arXiv.org e-Print Archive

Crossref

Black-box Backdoor Defense via Zero-shot Image Purification

Author: Du Mengnan
Guan Zihan
Liu Ninghao
Shi Yucheng
Sun Jin
Wu Xuansheng
Publication venue
Publication date: 27/10/2023
Field of study

Backdoor attacks inject poisoned samples into the training data, resulting in the misclassification of the poisoned input during a model's deployment. Defending against such attacks is challenging, especially for real-world black-box models where only query access is permitted. In this paper, we propose a novel defense framework against backdoor attacks through Zero-shot Image Purification (ZIP). Our framework can be applied to poisoned models without requiring internal information about the model or any prior knowledge of the clean/poisoned samples. Our defense framework involves two steps. First, we apply a linear transformation (e.g., blurring) on the poisoned image to destroy the backdoor pattern. Then, we use a pre-trained diffusion model to recover the missing semantic information removed by the transformation. In particular, we design a new reverse process by using the transformed image to guide the generation of high-fidelity purified images, which works in zero-shot settings. We evaluate our ZIP framework on multiple datasets with different types of attacks. Experimental results demonstrate the superiority of our ZIP framework compared to state-of-the-art backdoor defense baselines. We believe that our results will provide valuable insights for future defense methods for black-box models. Our code is available at https://github.com/sycny/ZIP.Comment: Accepted by NeurIPS 202

arXiv.org e-Print Archive

RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit

Author: Cheng Jiehan
Dou Zhicheng
Jin Jiajie
Liu Jiongnan
Wang Zihan
Wen Ji-Rong
Publication venue
Publication date: 08/06/2023
Field of study

Although Large Language Models (LLMs) have demonstrated extraordinary capabilities in many domains, they still have a tendency to hallucinate and generate fictitious responses to user requests. This problem can be alleviated by augmenting LLMs with information retrieval (IR) systems (also known as retrieval-augmented LLMs). Applying this strategy, LLMs can generate more factual texts in response to user input according to the relevant content retrieved by IR systems from external corpora as references. In addition, by incorporating external knowledge, retrieval-augmented LLMs can answer in-domain questions that cannot be answered by solely relying on the world knowledge stored in parameters. To support research in this area and facilitate the development of retrieval-augmented LLM systems, we develop RETA-LLM, a {RET}reival-{A}ugmented LLM toolkit. In RETA-LLM, we create a complete pipeline to help researchers and users build their customized in-domain LLM-based systems. Compared with previous retrieval-augmented LLM systems, RETA-LLM provides more plug-and-play modules to support better interaction between IR systems and LLMs, including {request rewriting, document retrieval, passage extraction, answer generation, and fact checking} modules. Our toolkit is publicly available at https://github.com/RUC-GSAI/YuLan-IR/tree/main/RETA-LLM.Comment: Technical Report for RETA-LL

arXiv.org e-Print Archive

Learning a Universal Human Prior for Dexterous Manipulation from Human Preference

Author: Chen Yuanpei
Ding Zihan
Dong Hao
Gu Shixiang Shane
Jin Chi
Ren Allen Z.
Publication venue
Publication date: 10/04/2023
Field of study

Generating human-like behavior on robots is a great challenge especially in dexterous manipulation tasks with robotic hands. Even in simulation with no sample constraints, scripting controllers is intractable due to high degrees of freedom, and manual reward engineering can also be hard and lead to non-realistic motions. Leveraging the recent progress on Reinforcement Learning from Human Feedback (RLHF), we propose a framework to learn a universal human prior using direct human preference feedback over videos, for efficiently tuning the RL policy on 20 dual-hand robot manipulation tasks in simulation, without a single human demonstration. One task-agnostic reward model is trained through iteratively generating diverse polices and collecting human preference over the trajectories; it is then applied for regularizing the behavior of polices in the fine-tuning stage. Our method empirically demonstrates more human-like behaviors on robot hands in diverse tasks including even unseen tasks, indicating its generalization capability

arXiv.org e-Print Archive

The roles and responsibilities of general practice nurses in China: a qualitative study

Author: Brown Steven
Chi Chunhua
Dong Aimei
Hou Shuxiao
Hu Lin
Jin Xue
Pan Zihan
Pang Hui
Plester Gail
Publication venue
Publication date: 06/09/2024
Field of study

Background: General hospitals in China have been establishing General Practice Departments (GPD). Although General Practice Nurses (GPNs) are an important part of this medical system, their training has not been synchronised. This study explored the working status of nurses in GPDs in general hospitals in Beijing to provide a theoretical basis for the training and development of GPNs in China. Methods: We conducted in-depth, individual interviews with outpatient nurses at 19 hospitals in Beijing between March and April 2021. We employed a qualitative analysis to interpret participant narratives and used a codebook thematic analysis to analyse the interview data and extract themes. Results: The analysis revealed four themes: (i) a lack of full-time GPNs in GPDs of most tertiary hospitals, (ii) the inability of GPNs to fully express their potential and skills owing to their limited roles, (iii) insufficient standardised patient education provided by nurses in GPDs, and (iv) a lack of systematic and relevant training for nurses working in general practice settings. Conclusions: To promote the development of GPNs, GPDs in general hospitals in China should hire full-time GPNs, define their job duties in alignment with their values, and provide standardised training to strengthen their core competencies

University of Birmingham Research Portal