1,503 research outputs found
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model
We investigate the sample efficiency of reinforcement learning in a
-discounted infinite-horizon Markov decision process (MDP) with state
space and action space , assuming access to a
generative model. Despite a number of prior work tackling this problem, a
complete picture of the trade-offs between sample complexity and statistical
accuracy is yet to be determined. In particular, prior results suffer from a
sample size barrier, in the sense that their claimed statistical guarantees
hold only when the sample size exceeds at least
(up to some log factor). The
current paper overcomes this barrier by certifying the minimax optimality of
model-based reinforcement learning as soon as the sample size exceeds the order
of (modulo some log factor). More
specifically, a perturbed model-based planning algorithm provably finds an
-optimal policy with an order of
samples for any . Along the way, we
derive improved (instance-dependent) guarantees for model-based policy
evaluation. To the best of our knowledge, this work provides the first
minimax-optimal guarantee in a generative model that accommodates the entire
range of sample sizes (beyond which finding a meaningful policy is information
theoretically impossible)
Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model
The curse of dimensionality is a widely known issue in reinforcement learning
(RL). In the tabular setting where the state space and the action
space are both finite, to obtain a nearly optimal policy with
sampling access to a generative model, the minimax optimal sample complexity
scales linearly with , which can be
prohibitively large when or is large. This paper
considers a Markov decision process (MDP) that admits a set of state-action
features, which can linearly express (or approximate) its probability
transition kernel. We show that a model-based approach (resp.Q-learning)
provably learns an -optimal policy (resp.Q-function) with high
probability as soon as the sample size exceeds the order of
(resp.), up to some logarithmic
factor. Here is the feature dimension and is the discount
factor of the MDP. Both sample complexity bounds are provably tight, and our
result for the model-based approach matches the minimax lower bound. Our
results show that for arbitrarily large-scale MDP, both the model-based
approach and Q-learning are sample-efficient when is relatively small, and
hence the title of this paper
IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models
This paper provides a unified account of two schools of thinking in
information retrieval modelling: the generative retrieval focusing on
predicting relevant documents given a query, and the discriminative retrieval
focusing on predicting relevancy given a query-document pair. We propose a game
theoretical minimax game to iteratively optimise both models. On one hand, the
discriminative model, aiming to mine signals from labelled and unlabelled data,
provides guidance to train the generative model towards fitting the underlying
relevance distribution over documents given the query. On the other hand, the
generative model, acting as an attacker to the current discriminative model,
generates difficult examples for the discriminative model in an adversarial way
by minimising its discrimination objective. With the competition between these
two models, we show that the unified framework takes advantage of both schools
of thinking: (i) the generative model learns to fit the relevance distribution
over documents via the signals from the discriminative model, and (ii) the
discriminative model is able to exploit the unlabelled data selected by the
generative model to achieve a better estimation for document ranking. Our
experimental results have demonstrated significant performance gains as much as
23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of
applications including web search, item recommendation, and question answering.Comment: 12 pages; appendix adde
- …