1,917 research outputs found
Only Relevant Information Matters: Filtering Out Noisy Samples to Boost RL
In reinforcement learning, policy gradient algorithms optimize the policy
directly and rely on sampling efficiently an environment. Nevertheless, while
most sampling procedures are based on direct policy sampling, self-performance
measures could be used to improve such sampling prior to each policy update.
Following this line of thought, we introduce SAUNA, a method where
non-informative transitions are rejected from the gradient update. The level of
information is estimated according to the fraction of variance explained by the
value function: a measure of the discrepancy between V and the empirical
returns. In this work, we use this metric to select samples that are useful to
learn from, and we demonstrate that this selection can significantly improve
the performance of policy gradient methods. In this paper: (a) We define
SAUNA's metric and introduce its method to filter transitions. (b) We conduct
experiments on a set of benchmark continuous control problems. SAUNA
significantly improves performance. (c) We investigate how SAUNA reliably
selects samples with the most positive impact on learning and study its
improvement on both performance and sample efficiency.Comment: Accepted at IJCAI 202
A generative model for sparse, evolving digraphs
Generating graphs that are similar to real ones is an open problem, while the
similarity notion is quite elusive and hard to formalize. In this paper, we
focus on sparse digraphs and propose SDG, an algorithm that aims at generating
graphs similar to real ones. Since real graphs are evolving and this evolution
is important to study in order to understand the underlying dynamical system,
we tackle the problem of generating series of graphs. We propose SEDGE, an
algorithm meant to generate series of graphs similar to a real series. SEDGE is
an extension of SDG. We consider graphs that are representations of software
programs and show experimentally that our approach outperforms other existing
approaches. Experiments show the performance of both algorithms
Bandits Warm-up Cold Recommender Systems
We address the cold start problem in recommendation systems assuming no
contextual information is available neither about users, nor items. We consider
the case in which we only have access to a set of ratings of items by users.
Most of the existing works consider a batch setting, and use cross-validation
to tune parameters. The classical method consists in minimizing the root mean
square error over a training subset of the ratings which provides a
factorization of the matrix of ratings, interpreted as a latent representation
of items and users. Our contribution in this paper is 5-fold. First, we
explicit the issues raised by this kind of batch setting for users or items
with very few ratings. Then, we propose an online setting closer to the actual
use of recommender systems; this setting is inspired by the bandit framework.
The proposed methodology can be used to turn any recommender system dataset
(such as Netflix, MovieLens,...) into a sequential dataset. Then, we explicit a
strong and insightful link between contextual bandit algorithms and matrix
factorization; this leads us to a new algorithm that tackles the
exploration/exploitation dilemma associated to the cold start problem in a
strikingly new perspective. Finally, experimental evidence confirm that our
algorithm is effective in dealing with the cold start problem on publicly
available datasets. Overall, the goal of this paper is to bridge the gap
between recommender systems based on matrix factorizations and those based on
contextual bandits
Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques
In many recommendation applications such as news recommendation, the items
that can be rec- ommended come and go at a very fast pace. This is a challenge
for recommender systems (RS) to face this setting. Online learning algorithms
seem to be the most straight forward solution. The contextual bandit framework
was introduced for that very purpose. In general the evaluation of a RS is a
critical issue. Live evaluation is of- ten avoided due to the potential loss of
revenue, hence the need for offline evaluation methods. Two options are
available. Model based meth- ods are biased by nature and are thus difficult to
trust when used alone. Data driven methods are therefore what we consider here.
Evaluat- ing online learning algorithms with past data is not simple but some
methods exist in the litera- ture. Nonetheless their accuracy is not satisfac-
tory mainly due to their mechanism of data re- jection that only allow the
exploitation of a small fraction of the data. We precisely address this issue
in this paper. After highlighting the limita- tions of the previous methods, we
present a new method, based on bootstrapping techniques. This new method comes
with two important improve- ments: it is much more accurate and it provides a
measure of quality of its estimation. The latter is a highly desirable property
in order to minimize the risks entailed by putting online a RS for the first
time. We provide both theoretical and ex- perimental proofs of its superiority
compared to state-of-the-art methods, as well as an analysis of the convergence
of the measure of quality
- …