Search CORE

9 research outputs found

Generative Exploration and Exploitation

Author: Jiang Jiechuan
Lu Zongqing
Publication venue
Publication date: 20/11/2019
Field of study

Sparse reward is one of the biggest challenges in reinforcement learning (RL). In this paper, we propose a novel method called Generative Exploration and Exploitation (GENE) to overcome sparse reward. GENE automatically generates start states to encourage the agent to explore the environment and to exploit received reward signals. GENE can adaptively tradeoff between exploration and exploitation according to the varying distributions of states experienced by the agent as the learning progresses. GENE relies on no prior knowledge about the environment and can be combined with any RL algorithm, no matter on-policy or off-policy, single-agent or multi-agent. Empirically, we demonstrate that GENE significantly outperforms existing methods in three tasks with only binary rewards, including Maze, Maze Ant, and Cooperative Navigation. Ablation studies verify the emergence of progressive exploration and automatic reversing.Comment: AAAI'2

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Model-Based Decentralized Policy Optimization

Author: Jiang Jiechuan
Lu Zongqing
Luo Hao
Publication venue
Publication date: 16/02/2023
Field of study

Decentralized policy optimization has been commonly used in cooperative multi-agent tasks. However, since all agents are updating their policies simultaneously, from the perspective of individual agents, the environment is non-stationary, resulting in it being hard to guarantee monotonic policy improvement. To help the policy improvement be stable and monotonic, we propose model-based decentralized policy optimization (MDPO), which incorporates a latent variable function to help construct the transition and reward function from an individual perspective. We theoretically analyze that the policy optimization of MDPO is more stable than model-free decentralized policy optimization. Moreover, due to non-stationarity, the latent variable function is varying and hard to be modeled. We further propose a latent variable prediction method to reduce the error of the latent variable function, which theoretically contributes to the monotonic policy improvement. Empirically, MDPO can indeed obtain superior performance than model-free decentralized policy optimization in a variety of cooperative multi-agent tasks.Comment: 24 page

arXiv.org e-Print Archive

Online Tuning for Offline Decentralized Multi-Agent Reinforcement Learning

Author: Jiang Jiechuan
Lu Zongqing
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 26/06/2023
Field of study

Offline reinforcement learning could learn effective policies from a fixed dataset, which is promising for real-world applications. However, in offline decentralized multi-agent reinforcement learning, due to the discrepancy between the behavior policy and learned policy, the transition dynamics in offline experiences do not accord with the transition dynamics in online execution, which creates severe errors in value estimates, leading to uncoordinated low-performing policies. One way to overcome this problem is to bridge offline training and online tuning. However, considering both deployment efficiency and sample efficiency, we could only collect very limited online experiences, making it insufficient to use merely online data for updating the agent policy. To utilize both offline and online experiences to tune the policies of agents, we introduce online transition correction (OTC) to implicitly correct the offline transition dynamics by modifying sampling probabilities. We design two types of distances, i.e., embedding-based and value-based distance, to measure the similarity between transitions, and further propose an adaptive rank-based prioritization to sample transitions according to the transition similarity. OTC is simple yet effective to increase data efficiency and improve agent policies in online tuning. Empirically, OTC outperforms baselines in a variety of tasks

Association for the Advancement of Artificial Intelligence: AAAI Publications

Learning from Visual Observation via Offline Pretrained State-to-Go Transformer

Author: Jiang Jiechuan
Li Ke
Lu Zongqing
Zhou Bohan
Publication venue
Publication date: 22/06/2023
Field of study

Learning from visual observation (LfVO), aiming at recovering policies from only visual observation data, is promising yet a challenging problem. Existing LfVO approaches either only adopt inefficient online learning schemes or require additional task-specific information like goal states, making them not suited for open-ended tasks. To address these issues, we propose a two-stage framework for learning from visual observation. In the first stage, we introduce and pretrain State-to-Go (STG) Transformer offline to predict and differentiate latent transitions of demonstrations. Subsequently, in the second stage, the STG Transformer provides intrinsic rewards for downstream reinforcement learning tasks where an agent learns merely from intrinsic rewards. Empirical results on Atari and Minecraft show that our proposed method outperforms baselines and in some tasks even achieves performance comparable to the policy learned from environmental rewards. These results shed light on the potential of utilizing video-only data to solve difficult visual reinforcement learning tasks rather than relying on complete offline datasets containing states, actions, and rewards. The project's website and code can be found at https://sites.google.com/view/stgtransformer.Comment: 19 page

arXiv.org e-Print Archive

Association Between Interictal High-Frequency Oscillations and Slow Wave in Refractory Focal Epilepsy With Good Surgical Outcome

Author: Amiri
Blanco
Bragin
Brazdil
Buzsáki
Cepeda
Cho
Esser
Frauscher
Frauscher
Gonzalez Otarula
Grenier
Guoping Ren
Holler
Jacobs
Jiang
Jiaqing Yan
Jiechuan Ren
Jindong Dai
Jirsch
Jiruska
Klimes
Le Van Quyen
Leung
Motoi
Mukovski
Nazer
Pail
Qun Wang
Ren
Riedner
Sakuraba
Samiee
Schall
Shanshan Mei
Song
Thomschewski
Valderrama
van’t Klooster
von Ellenrieder
von Stein
Xiaofei Wang
Xiaofeng Yang
Yueqian Sun
Yunlin Li
Publication venue: 'Frontiers Media SA'
Publication date
Field of study

Crossref