149 research outputs found
A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games
This paper proposes novel, end-to-end deep reinforcement learning algorithms
for learning two-player zero-sum Markov games. Our objective is to find the
Nash Equilibrium policies, which are free from exploitation by adversarial
opponents. Distinct from prior efforts on finding Nash equilibria in
extensive-form games such as Poker, which feature tree-structured transition
dynamics and discrete state space, this paper focuses on Markov games with
general transition dynamics and continuous state space. We propose (1) Nash DQN
algorithm, which integrates DQN with a Nash finding subroutine for the joint
value functions; and (2) Nash DQN Exploiter algorithm, which additionally
adopts an exploiter for guiding agent's exploration. Our algorithms are the
practical variants of theoretical algorithms which are guaranteed to converge
to Nash equilibria in the basic tabular setting. Experimental evaluation on
both tabular examples and two-player Atari games demonstrates the robustness of
the proposed algorithms against adversarial opponents, as well as their
advantageous performance over existing methods
Rebalanced Zero-shot Learning
Zero-shot learning (ZSL) aims to identify unseen classes with zero samples
during training. Broadly speaking, present ZSL methods usually adopt
class-level semantic labels and compare them with instance-level semantic
predictions to infer unseen classes. However, we find that such existing models
mostly produce imbalanced semantic predictions, i.e. these models could perform
precisely for some semantics, but may not for others. To address the drawback,
we aim to introduce an imbalanced learning framework into ZSL. However, we find
that imbalanced ZSL has two unique challenges: (1) Its imbalanced predictions
are highly correlated with the value of semantic labels rather than the number
of samples as typically considered in the traditional imbalanced learning; (2)
Different semantics follow quite different error distributions between classes.
To mitigate these issues, we first formalize ZSL as an imbalanced regression
problem which offers empirical evidences to interpret how semantic labels lead
to imbalanced semantic predictions. We then propose a re-weighted loss termed
Re-balanced Mean-Squared Error (ReMSE), which tracks the mean and variance of
error distributions, thus ensuring rebalanced learning across classes. As a
major contribution, we conduct a series of analyses showing that ReMSE is
theoretically well established. Extensive experiments demonstrate that the
proposed method effectively alleviates the imbalance in semantic prediction and
outperforms many state-of-the-art ZSL methods. Our code is available at
https://github.com/FouriYe/ReZSL-TIP23.Comment: Accepted to IEEE Transactions on Image Processing (TIP) 202
CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario
Traffic signal control is an emerging application scenario for reinforcement
learning. Besides being as an important problem that affects people's daily
life in commuting, traffic signal control poses its unique challenges for
reinforcement learning in terms of adapting to dynamic traffic environment and
coordinating thousands of agents including vehicles and pedestrians. A key
factor in the success of modern reinforcement learning relies on a good
simulator to generate a large number of data samples for learning. The most
commonly used open-source traffic simulator SUMO is, however, not scalable to
large road network and large traffic flow, which hinders the study of
reinforcement learning on traffic scenarios. This motivates us to create a new
traffic simulator CityFlow with fundamentally optimized data structures and
efficient algorithms. CityFlow can support flexible definitions for road
network and traffic flow based on synthetic and real-world data. It also
provides user-friendly interface for reinforcement learning. Most importantly,
CityFlow is more than twenty times faster than SUMO and is capable of
supporting city-wide traffic simulation with an interactive render for
monitoring. Besides traffic signal control, CityFlow could serve as the base
for other transportation studies and can create new possibilities to test
machine learning methods in the intelligent transportation domain.Comment: WWW 2019 Demo Pape
Black-box Backdoor Defense via Zero-shot Image Purification
Backdoor attacks inject poisoned samples into the training data, resulting in
the misclassification of the poisoned input during a model's deployment.
Defending against such attacks is challenging, especially for real-world
black-box models where only query access is permitted. In this paper, we
propose a novel defense framework against backdoor attacks through Zero-shot
Image Purification (ZIP). Our framework can be applied to poisoned models
without requiring internal information about the model or any prior knowledge
of the clean/poisoned samples. Our defense framework involves two steps. First,
we apply a linear transformation (e.g., blurring) on the poisoned image to
destroy the backdoor pattern. Then, we use a pre-trained diffusion model to
recover the missing semantic information removed by the transformation. In
particular, we design a new reverse process by using the transformed image to
guide the generation of high-fidelity purified images, which works in zero-shot
settings. We evaluate our ZIP framework on multiple datasets with different
types of attacks. Experimental results demonstrate the superiority of our ZIP
framework compared to state-of-the-art backdoor defense baselines. We believe
that our results will provide valuable insights for future defense methods for
black-box models. Our code is available at https://github.com/sycny/ZIP.Comment: Accepted by NeurIPS 202
RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit
Although Large Language Models (LLMs) have demonstrated extraordinary
capabilities in many domains, they still have a tendency to hallucinate and
generate fictitious responses to user requests. This problem can be alleviated
by augmenting LLMs with information retrieval (IR) systems (also known as
retrieval-augmented LLMs). Applying this strategy, LLMs can generate more
factual texts in response to user input according to the relevant content
retrieved by IR systems from external corpora as references. In addition, by
incorporating external knowledge, retrieval-augmented LLMs can answer in-domain
questions that cannot be answered by solely relying on the world knowledge
stored in parameters. To support research in this area and facilitate the
development of retrieval-augmented LLM systems, we develop RETA-LLM, a
{RET}reival-{A}ugmented LLM toolkit. In RETA-LLM, we create a complete pipeline
to help researchers and users build their customized in-domain LLM-based
systems. Compared with previous retrieval-augmented LLM systems, RETA-LLM
provides more plug-and-play modules to support better interaction between IR
systems and LLMs, including {request rewriting, document retrieval, passage
extraction, answer generation, and fact checking} modules. Our toolkit is
publicly available at https://github.com/RUC-GSAI/YuLan-IR/tree/main/RETA-LLM.Comment: Technical Report for RETA-LL
Learning a Universal Human Prior for Dexterous Manipulation from Human Preference
Generating human-like behavior on robots is a great challenge especially in
dexterous manipulation tasks with robotic hands. Even in simulation with no
sample constraints, scripting controllers is intractable due to high degrees of
freedom, and manual reward engineering can also be hard and lead to
non-realistic motions. Leveraging the recent progress on Reinforcement Learning
from Human Feedback (RLHF), we propose a framework to learn a universal human
prior using direct human preference feedback over videos, for efficiently
tuning the RL policy on 20 dual-hand robot manipulation tasks in simulation,
without a single human demonstration. One task-agnostic reward model is trained
through iteratively generating diverse polices and collecting human preference
over the trajectories; it is then applied for regularizing the behavior of
polices in the fine-tuning stage. Our method empirically demonstrates more
human-like behaviors on robot hands in diverse tasks including even unseen
tasks, indicating its generalization capability
The roles and responsibilities of general practice nurses in China: a qualitative study
Background: General hospitals in China have been establishing General Practice Departments (GPD). Although General Practice Nurses (GPNs) are an important part of this medical system, their training has not been synchronised. This study explored the working status of nurses in GPDs in general hospitals in Beijing to provide a theoretical basis for the training and development of GPNs in China. Methods: We conducted in-depth, individual interviews with outpatient nurses at 19 hospitals in Beijing between March and April 2021. We employed a qualitative analysis to interpret participant narratives and used a codebook thematic analysis to analyse the interview data and extract themes. Results: The analysis revealed four themes: (i) a lack of full-time GPNs in GPDs of most tertiary hospitals, (ii) the inability of GPNs to fully express their potential and skills owing to their limited roles, (iii) insufficient standardised patient education provided by nurses in GPDs, and (iv) a lack of systematic and relevant training for nurses working in general practice settings. Conclusions: To promote the development of GPNs, GPDs in general hospitals in China should hire full-time GPNs, define their job duties in alignment with their values, and provide standardised training to strengthen their core competencies
- …