107 research outputs found
Reducing statistical time-series problems to binary classification
We show how binary classification methods developed to work on i.i.d. data
can be used for solving statistical problems that are seemingly unrelated to
classification and concern highly-dependent time series. Specifically, the
problems of time-series clustering, homogeneity testing and the three-sample
problem are addressed. The algorithms that we construct for solving these
problems are based on a new metric between time-series distributions, which can
be evaluated using binary classification methods. Universal consistency of the
proposed algorithms is proven under most general assumptions. The theoretical
results are illustrated with experiments on synthetic and real-world data.Comment: In proceedings of NIPS 2012, pp. 2069-207
Bandits Warm-up Cold Recommender Systems
We address the cold start problem in recommendation systems assuming no
contextual information is available neither about users, nor items. We consider
the case in which we only have access to a set of ratings of items by users.
Most of the existing works consider a batch setting, and use cross-validation
to tune parameters. The classical method consists in minimizing the root mean
square error over a training subset of the ratings which provides a
factorization of the matrix of ratings, interpreted as a latent representation
of items and users. Our contribution in this paper is 5-fold. First, we
explicit the issues raised by this kind of batch setting for users or items
with very few ratings. Then, we propose an online setting closer to the actual
use of recommender systems; this setting is inspired by the bandit framework.
The proposed methodology can be used to turn any recommender system dataset
(such as Netflix, MovieLens,...) into a sequential dataset. Then, we explicit a
strong and insightful link between contextual bandit algorithms and matrix
factorization; this leads us to a new algorithm that tackles the
exploration/exploitation dilemma associated to the cold start problem in a
strikingly new perspective. Finally, experimental evidence confirm that our
algorithm is effective in dealing with the cold start problem on publicly
available datasets. Overall, the goal of this paper is to bridge the gap
between recommender systems based on matrix factorizations and those based on
contextual bandits
Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques
In many recommendation applications such as news recommendation, the items
that can be rec- ommended come and go at a very fast pace. This is a challenge
for recommender systems (RS) to face this setting. Online learning algorithms
seem to be the most straight forward solution. The contextual bandit framework
was introduced for that very purpose. In general the evaluation of a RS is a
critical issue. Live evaluation is of- ten avoided due to the potential loss of
revenue, hence the need for offline evaluation methods. Two options are
available. Model based meth- ods are biased by nature and are thus difficult to
trust when used alone. Data driven methods are therefore what we consider here.
Evaluat- ing online learning algorithms with past data is not simple but some
methods exist in the litera- ture. Nonetheless their accuracy is not satisfac-
tory mainly due to their mechanism of data re- jection that only allow the
exploitation of a small fraction of the data. We precisely address this issue
in this paper. After highlighting the limita- tions of the previous methods, we
present a new method, based on bootstrapping techniques. This new method comes
with two important improve- ments: it is much more accurate and it provides a
measure of quality of its estimation. The latter is a highly desirable property
in order to minimize the risks entailed by putting online a RS for the first
time. We provide both theoretical and ex- perimental proofs of its superiority
compared to state-of-the-art methods, as well as an analysis of the convergence
of the measure of quality
Learning for stochastic dynamic programming
We present experimental results about learning function values (i.e. Bellman values) in stochastic dynamic programming (SDP). All results come from openDP (opendp.sourceforge.net), a freely available source code, and therefore can be reproduced. The goal is an independent comparison of learning methods in the framework of SDP
Taylor-based pseudo-metrics for random process fitting in dynamic programming.
Stochastic optimization is the research of optimizing , the expectation of , wher e is a random variable. Typically is the cost related to a strategy which faces the reali zation of the random process. Many stochastic optimization problems deal with multiple time steps, leading to computationally difficu lt problems ; efficient solutions exist, for example through Bellman's optimality principle, but only provided that the random process is represented by a well structured process, typically an inhomogeneous Markovian process (hopefully with a finite number of states) or a scenario tree. The problem is that in the general case, is far from b eing Markovian. So, we look for , "looking like ", but belonging to a given family \A' which do es not at all contain . The problem is the numerical evaluation of " looks like ". A classical method is the use of the Kantorovitch-Rubinstein distance or other transportation metrics \c ite{Pflug}, justified by straightforward bounds on the deviation through the use of the Kantorovitch-Rubinstein distance and uniform lipschitz conditions. These approaches might be bett er than the use of high-level statistics \cite{Keefer}. We propose other (pseudo-)distances, based upon refined inequalities, guaranteeing a good choice of . Moreover, as in many cases, we indeed prefer t he optimization with risk management, e.g. optimization of where is a random noise modelizing the lack of knowledge on the precise random variables, we propose distances which can deal with a user-defined noise. Tests on artificial data sets with realistic loss functions show the rel evance of the method
Adaptative play in texas hold'em poker
International audienceWe present a Texas Hold'em poker player for limit heads-up games. Our bot is designed to adapt automatically to the strategy of the opponent and is not based on Nash equilibrium computation. The main idea is to design a bot that builds beliefs on his opponent's hand. A forest of game trees is generated according to those beliefs and the solutions of the trees are combined to make the best decision. The beliefs are updated during the game according to several methods, each of which corresponding to a basic strategy. We then use an exploration-exploitation bandit algorithm, namely the UCB (Upper Confidence Bound), to select a strategy to follow. This results in a global play that takes into account the opponent's strategy, and which turns out to be rather unpredictable. Indeed, if a given strategy is exploited by an opponent, the UCB algorithm will detect it using change point detection, and will choose another one. The initial resulting program , called Brennus, participated to the AAAI'07 Computer Poker Competition in both online and equilibrium competition and ranked eight out of seventeen competitors
Active learning in regression, with an application to stochastic dynamic programming
International audienceWe study active learning as a derandomized form of sampling. We show that full derandomization is not suitable in a robust framework, propose partially derandomized samplings, and develop new active learning methods (i) in which expert knowledge is easy to integrate (ii) with a parameter for the exploration/exploitation dilemma (iii) less randomized than the full-random sampling (yet also not deterministic). Experiments are performed in the case of regression for value-function learning on a continuous domain. Our main results are (i) efficient partially derandomized point sets (ii) moderate-derandomization theorems (iii) experimental evidence of the importance of the frontier (iv) a new regression-specific user-friendly sampling tool lessrobust than blind samplers but that sometimes works very efficiently in large dimensions. All experiments can be reproduced by downloading the source code and running the provided command line
- …