Search CORE

1,501 research outputs found

Learning policies for Markov decision processes from data

Author: Hanawal Manjesh K.
Liu Hao
Paschalidis Ioannis Ch.
Zhu Henghui
Publication venue
Publication date: 01/01/2017
Field of study

We consider the problem of learning a policy for a Markov decision process consistent with data captured on the state-actions pairs followed by the policy. We assume that the policy belongs to a class of parameterized policies which are defined using features associated with the state-action pairs. The features are known a priori, however, only an unknown subset of them could be relevant. The policy parameters that correspond to an observed target policy are recovered using `1-regularized logistic regression that best fits the observed state-action samples. We establish bounds on the difference between the average reward of the estimated and the original policy (regret) in terms of the generalization error and the ergodic coefficient of the underlying Markov chain. To that end, we combine sample complexity theory and sensitivity analysis of the stationary distribution of Markov chains. Our analysis suggests that to achieve regret within order O( √ ), it suffices to use training sample size on the order of Ω(logn · poly(1/ )), where n is the number of the features. We demonstrate the effectiveness of our method on a synthetic robot navigation example

Boston University Institutional Repository (OpenBU)

Learning policies for Markov decision processes from data

Author: Hanawal Manjesh K.
Liu Hao
Paschalidis Ioannis Ch.
Zhu Henghui
Publication venue
Publication date: 01/01/2017
Field of study

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Generating Multi-Categorical Samples with Generative Adversarial Networks

Author: Camino Ramiro
Hammerschmidt Christian
State Radu
Publication venue
Publication date: 01/07/2018
Field of study

We propose a method to train generative adversarial networks on mutivariate feature vectors representing multiple categorical values. In contrast to the continuous domain, where GAN-based methods have delivered considerable results, GANs struggle to perform equally well on discrete data. We propose and compare several architectures based on multiple (Gumbel) softmax output layers taking into account the structure of the data. We evaluate the performance of our architecture on datasets with different sparsity, number of features, ranges of categorical values, and dependencies among the features. Our proposed architecture and method outperforms existing models

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg