Search CORE

1,503 research outputs found

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

Author: Chen Yuxin
Chi Yuejie
Gu Yuantao
Li Gen
Wei Yuting
Publication venue
Publication date: 02/09/2020
Field of study

We investigate the sample efficiency of reinforcement learning in a

\gamma

-discounted infinite-horizon Markov decision process (MDP) with state space

\mathcal{S}

and action space

\mathcal{A}

, assuming access to a generative model. Despite a number of prior work tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined. In particular, prior results suffer from a sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least

\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^2}

(up to some log factor). The current paper overcomes this barrier by certifying the minimax optimality of model-based reinforcement learning as soon as the sample size exceeds the order of

\frac{|\mathcal{S}||\mathcal{A}|}{1-\gamma}

(modulo some log factor). More specifically, a perturbed model-based planning algorithm provably finds an

\varepsilon

-optimal policy with an order of

\frac{|\mathcal{S}||\mathcal{A}| }{(1-\gamma)^3\varepsilon^2}\log\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)\varepsilon}

samples for any

\varepsilon \in (0, \frac{1}{1-\gamma}]

. Along the way, we derive improved (instance-dependent) guarantees for model-based policy evaluation. To the best of our knowledge, this work provides the first minimax-optimal guarantee in a generative model that accommodates the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically impossible)

arXiv.org e-Print Archive

Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

Author: Fan Jianqing
Wang Bingyan
Yan Yuling
Publication venue
Publication date: 26/10/2022
Field of study

The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space

\mathcal{S}

and the action space

\mathcal{A}

are both finite, to obtain a nearly optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with

|\mathcal{S}|\times|\mathcal{A}|

, which can be prohibitively large when

\mathcal{S}

\mathcal{A}

is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp.

~

Q-learning) provably learns an

\varepsilon

-optimal policy (resp.

~

Q-function) with high probability as soon as the sample size exceeds the order of

\frac{K}{(1-\gamma)^{3}\varepsilon^{2}}

(resp.

~

\frac{K}{(1-\gamma)^{4}\varepsilon^{2}}

), up to some logarithmic factor. Here

K

is the feature dimension and

\gamma\in(0,1)

is the discount factor of the MDP. Both sample complexity bounds are provably tight, and our result for the model-based approach matches the minimax lower bound. Our results show that for arbitrarily large-scale MDP, both the model-based approach and Q-learning are sample-efficient when

K

is relatively small, and hence the title of this paper

arXiv.org e-Print Archive

IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models

Author: Gong Yu
Wang Benyou
Wang Jun
Xu Yinghui
Yu Lantao
Zhang Dell
Zhang Peng
Zhang Weinan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/08/2017
Field of study

This paper provides a unified account of two schools of thinking in information retrieval modelling: the generative retrieval focusing on predicting relevant documents given a query, and the discriminative retrieval focusing on predicting relevancy given a query-document pair. We propose a game theoretical minimax game to iteratively optimise both models. On one hand, the discriminative model, aiming to mine signals from labelled and unlabelled data, provides guidance to train the generative model towards fitting the underlying relevance distribution over documents given the query. On the other hand, the generative model, acting as an attacker to the current discriminative model, generates difficult examples for the discriminative model in an adversarial way by minimising its discrimination objective. With the competition between these two models, we show that the unified framework takes advantage of both schools of thinking: (i) the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model, and (ii) the discriminative model is able to exploit the unlabelled data selected by the generative model to achieve a better estimation for document ranking. Our experimental results have demonstrated significant performance gains as much as 23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of applications including web search, item recommendation, and question answering.Comment: 12 pages; appendix adde

arXiv.org e-Print Archive

Birkbeck Institutional Research Online