Search CORE

14 research outputs found

DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs

Author: Du Simon S.
Fei-Fei Li
Liu Bo
Tenenbaum Joshua B.
Wang Yunbo
Wu Jiajun
Zhu Yuke
Publication venue
Publication date: 07/05/2020
Field of study

A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty. We cast POMDP filtering and planning problems as two closely related Sequential Monte Carlo (SMC) processes, one over the real states and the other over the future optimal trajectories, and combine the merits of these two parts in a new model named the DualSMC network. In particular, we first introduce an adversarial particle filter that leverages the adversarial relationship between its internal components. Based on the filtering results, we then propose a planning algorithm that extends the previous SMC planning approach [Piche et al., 2018] to continuous POMDPs with an uncertainty-dependent policy. Crucially, not only can DualSMC handle complex observations such as image input but also it remains highly interpretable. It is shown to be effective in three continuous POMDP domains: the floor positioning domain, the 3D light-dark navigation domain, and a modified Reacher domain.Comment: IJCAI 202

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Deep Variational Reinforcement Learning for POMDPs

Author: Igl Maximilian
Le Tuan Anh
Whiteson Shimon
Wood Frank
Zintgraf Luisa
Publication venue
Publication date: 01/01/2018
Field of study

Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown. Consequently, there is great need for reinforcement learning methods that can tackle such problems given only a stream of incomplete and noisy observations. In this paper, we propose deep variational reinforcement learning (DVRL), which introduces an inductive bias that allows an agent to learn a generative model of the environment and perform inference in that model to effectively aggregate the available information. We develop an n-step approximation to the evidence lower bound (ELBO), allowing the model to be trained jointly with the policy. This ensures that the latent state representation is suitable for the control task. In experiments on Mountain Hike and flickering Atari we show that our method outperforms previous approaches relying on recurrent neural networks to encode the past

arXiv.org e-Print Archive

Oxford University Research Archive

Deep Reinforcement Learning with VizDoomFirst-Person Shooter

Author: Akimov Dmitry
Makarov Ilya
Publication venue
Publication date: 23/09/2019
Field of study

In this work, we study deep reinforcement algorithms forpartially observable Markov decision processes (POMDP) combined withDeep Q-Networks. To our knowledge, we are the first to apply standardMarkov decision process architectures to POMDP scenarios. We proposean extension of DQN with Dueling Networks and several other model-freepolicies to training agent using deep reinforcement learning in VizDoomenvironment, which is replication of Doom first-person shooter. We de-velop several agents for the following scenarios in VizDoom first-personshooter (FPS): Basic, Defend The Center, Health Gathering. We com-pare our agent with Recurrent DQN with Prioritized Experience Replayand Snapshot Ensembling agent and get approximately triple increase inper episode reward. It is important to say that POMDP scenario closethe gap between human and computer player scenarios thus providingmore meaningful justification for Deep RL agent performance

Learning to Infer User Hidden States for Online Sequential Advertising

Author: Gai Kun
Jin Junqi
Li Han
Luo Lan
Luo Rui
Luo Tiejian
Peng Zhaoqing
Wang Jun
Xu Haiyang
Xu Jian
Xu Miao
Yang Yaodong
Yu Chuan
Zhang Weinan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/09/2020
Field of study

To drive purchase in online advertising, it is of the advertiser's great interest to optimize the sequential advertising strategy whose performance and interpretability are both important. The lack of interpretability in existing deep reinforcement learning methods makes it not easy to understand, diagnose and further optimize the strategy. In this paper, we propose our Deep Intents Sequential Advertising (DISA) method to address these issues. The key part of interpretability is to understand a consumer's purchase intent which is, however, unobservable (called hidden states). In this paper, we model this intention as a latent variable and formulate the problem as a Partially Observable Markov Decision Process (POMDP) where the underlying intents are inferred based on the observable behaviors. Large-scale industrial offline and online experiments demonstrate our method's superior performance over several baselines. The inferred hidden states are analyzed, and the results prove the rationality of our inference.Comment: to be published in CIKM 202

arXiv.org e-Print Archive

Crossref

UCL Discovery