Search CORE

31 research outputs found

PAC-Bayes analysis beyond the usual bounds

Author: Kuzborskij I
Rivasplata O
Shawe-Taylor J
Szepesvári C
Publication venue: Neural Information Processing Systems (NeurIPS)
Publication date: 06/12/2020
Field of study

We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirements of fixed ‘data-free’ priors, bounded losses, and i.i.d. data. We highlight that those requirements were used to upper-bound an exponential moment term, while the basic PAC-Bayes theorem remains valid without those restrictions. We present three bounds that illustrate the use of data-dependent priors, including one for the unbounded square loss

UCL Discovery

On the Role of Optimization in Double Descent: A Least Squares Study

Author: Kuzborskij I
Pascanu R
Rivasplata O
Szepesvári C
Triki AR
Publication venue: Advances in Neural Information Processing Systems
Publication date: 01/01/2021
Field of study

Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization. Recently, the double descent phenomena has been proposed to reconcile this observation with theory, suggesting that the test error has a second descent when the model becomes sufficiently overparameterized, as the model size itself acts as an implicit regularizer. In this paper we add to the growing body of work in this space, providing a careful study of learning dynamics as a function of model size for the least squares scenario. We show an excess risk bound for the gradient descent solution of the least squares objective. The bound depends on the smallest non-zero eigenvalue of the covariance matrix of the input features, via a functional form that has the double descent behavior. This gives a new perspective on the double descent curves reported in the literature. Our analysis of the excess risk allows to decouple the effect of optimization and generalization error. In particular, we find that in case of noiseless regression, double descent is explained solely by optimization-related quantities, which was missed in studies focusing on the Moore-Penrose pseudoinverse solution. We believe that our derivation provides an alternative view compared to existing work, shedding some light on a possible cause of this phenomena, at least in the considered least squares setting. We empirically explore if our predictions hold for neural networks, in particular whether the covariance of intermediary hidden activations has a similar behavior as the one predicted by our derivations

arXiv.org e-Print Archive

UCL Discovery

Tighter risk certificates for neural networks

Author: Pérez-Ortiz M
Rivasplata O
Shawe-Taylor J
Szepesvári C
Publication venue
Publication date: 01/01/2021
Field of study

This paper presents an empirical study regarding training probabilistic neural networks using training objectives derived from PAC-Bayes bounds. In the context of probabilistic neural networks, the output of training is a probability distribution over network weights. We present two training objectives, used here for the first time in connection with training neural networks. These two training objectives are derived from tight PAC-Bayes bounds. We also re-implement a previously used training objective based on a classical PAC-Bayes bound, to compare the properties of the predictors learned using the different training objectives. We compute risk certificates for the learnt predictors, based on part of the data used to learn the predictors. We further experiment with different types of priors on the weights (both data-free and data-dependent priors) and neural network architectures. Our experiments on MNIST and CIFAR-10 show that our training methods produce competitive test set errors and non-vacuous risk bounds with much tighter values than previous results in the literature, showing promise not only to guide the learning algorithm through bounding the risk but also for model selection. These observations suggest that the methods studied here might be good candidates for self-certified learning, in the sense of using the whole data set for learning a predictor and certifying its risk on any unseen data (from the same distribution as the training data) potentially without the need for holding out test data

UCL Discovery

Bounds and dynamics for empirical game theoretic analysis

Author: Everett R
Graepel T
Hughes E
Lanctot M
Leibo JZ
Pérolat J
Szepesvári C
Tuyls K
Publication venue
Publication date: 04/12/2019
Field of study

This paper provides several theoretical results for empirical game theory. Specifically, we introduce bounds for empirical game theoretical analysis of complex multi-agent interactions. In doing so we provide insights in the empirical meta game showing that a Nash equilibrium of the estimated meta-game is an approximate Nash equilibrium of the true underlying meta-game. We investigate and show how many data samples are required to obtain a close enough approximation of the underlying game. Additionally, we extend the evolutionary dynamics analysis of meta-games using heuristic payoff tables (HPTs) to asymmetric games. The state-of-the-art has only considered evolutionary dynamics of symmetric HPTs in which agents have access to the same strategy sets and the payoff structure is symmetric, implying that agents are interchangeable. Finally, we carry out an empirical illustration of the generalised method in several domains, illustrating the theory and evolutionary dynamics of several versions of the AlphaGo algorithm (symmetric), the dynamics of the Colonel Blotto game played by human players on Facebook (symmetric), the dynamics of several teams of players in the capture the flag game (symmetric), and an example of a meta-game in Leduc Poker (asymmetric), generated by the policy-space response oracle multi-agent learning algorithm

UCL Discovery

BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback

Author: de Rijke M.
Globerson A.
Kveton B.
Lattimore T.
Li C.
Markov I.
Silva R.
Szepesvári C.
Zoghi M.
Publication venue
Publication date: 01/01/2019
Field of study

In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists. Learning to rank has traditionally been studied in two settings. In the offline setting, rankers are typically learned from relevance labels created by judges. This approach has generally become standard in industrial applications of ranking, such as search. However, this approach lacks exploration and thus is limited by the information content of the offline training data. In the online setting, an algorithm can experiment with lists and learn from feedback on them in a sequential fashion. Bandit algorithms are well-suited for this setting but they tend to learn user preferences from scratch, which results in a high initial cost of exploration. This poses an additional challenge of safe exploration in ranked lists. We propose BubbleRank, a bandit algorithm for safe re-ranking that combines the strengths of both the offline and online settings. The algorithm starts with an initial base list and improves it online by gradually exchanging higher-ranked less attractive items for lower-ranked more attractive items. We prove an upper bound on the n-step regret of BubbleRank that degrades gracefully with the quality of the initial base list. Our theoretical findings are supported by extensive experiments on a large-scale real-world click dataset

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Scenario trees and policy selection for multistage stochastic programming using machine learning

Author: Berstsekas DP
Bertsekas DP
Birge JR
Boris Defourny
Busoniu L
Damien Ernst
Defourny B
Defourny B
Dempster MAH
Dupacova J
Frauendorfer K
Grant M
Hastie T
Heitsch H
Heitsch H
Hilli P
Kallrath J
Kouwenberg R
Küchler C
Louis Wehenkel
Mak W-K
Mulvey JM
Nesterov Y
O'Hagan A
Pages G
Pennanen T
Peters J
Powell WB
Rasmussen CE
Shapiro A
Shapiro A
Sutton RS
Szepesvári C
Thénié J
Wallace SW
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date: 19/12/2011
Field of study

We propose a hybrid algorithmic strategy for complex stochastic optimization problems, which combines the use of scenario trees from multistage stochastic programming with machine learning techniques for learning a policy in the form of a statistical model, in the context of constrained vector-valued decisions. Such a policy allows one to run out-of-sample simulations over a large number of independent scenarios, and obtain a signal on the quality of the approximation scheme used to solve the multistage stochastic program. We propose to apply this fast simulation technique to choose the best tree from a set of scenario trees. A solution scheme is introduced, where several scenario trees with random branching structure are solved in parallel, and where the tree from which the best policy for the true problem could be learned is ultimately retained. Numerical tests show that excellent trade-offs can be achieved between run times and solution quality

arXiv.org e-Print Archive

CiteSeerX

Crossref

Open Repository and Bibliography - Liège

Crowd computing as a cooperation problem: an evolutionary approach

Author: A. Fernández Anta
A. Mass-Colell
A. Szolnoki
A. Szolnoki
A. Traulsen
Angel Sánchez
Antonio Fernández Anta
C. Castellano
C. Darwin
C. Gracia-Lázaro
C. Szepesvári
C.F. Camerer
C.P. Roca
Chryssis Georgiou
D. Anderson
D. Rose
D. Semmann
D. Stauffer
E. Christoforou
E. Fehr
E. Goffman
E. Korpela
E.M. Heien
Evgenia Christoforou
F. Vega-Redondo
F.G. Cross
G. Szabó
G.J. Stigler
H. Gintis
H. Gintis
I. Abraham
I. Erev
J. Duffy
J. Grujić
J. Gómez-Gardeñes
J. Gómez-Gardeñes
J. Hofbauer
J. Hofbauer
J. Maynard-Smith
J. Nash
J. Neumann von
J. Peña
J. Shneidman
L. Sarmenta
L.R. Izquierdo
M. Babaioff
M. Babaioff
M. Perc
M. Perc
M.A. Nowak
M.A. Nowak
M.W. Macy
Miguel A. Mosteiro
N. Goldenfeld
P. Ball
P. Golle
P. Kollock
P. Taylor
P.W. Anderson
R. Boyd
R. Eidenbenz
R. Rees
R.N. Mantegna
R.R. Bush
S.S. Izquierdo
T.C. Schelling
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Cooperation is one of the socio-economic issues that has received more attention from the physics community. The problem has been mostly considered by studying games such as the Prisoner's Dilemma or the Public Goods Game. Here, we take a step forward by studying cooperation in the context of crowd computing. We introduce a model loosely based on Principal-agent theory in which people (workers) contribute to the solution of a distributed problem by computing answers and reporting to the problem proposer (master). To go beyond classical approaches involving the concept of Nash equilibrium, we work on an evolutionary framework in which both the master and the workers update their behavior through reinforcement learning. Using a Markov chain approach, we show theoretically that under certain----not very restrictive-conditions, the master can ensure the reliability of the answer resulting of the process. Then, we study the model by numerical simulations, finding that convergence, meaning that the system reaches a point in which it always produces reliable answers, may in general be much faster than the upper bounds given by the theoretical calculation. We also discuss the effects of the master's level of tolerance to defectors, about which the theory does not provide information. The discussion shows that the system works even with very large tolerances. We conclude with a discussion of our results and possible directions to carry this research further.This work is supported by the Cyprus Research Promotion Foundation grant TE/HPO/0609(BE)/05, the National Science Foundation (CCF-0937829, CCF-1114930), Comunidad de Madrid grant S2009TIC-1692 and MODELICO-CM, Spanish MOSAICO, PRODIEVO and RESINEE grants and MICINN grant TEC2011-29688-C02-01, and National Natural Science Foundation of China grant 61020106002.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Kean Digital Learning Commons

Universidad Carlos III de Madrid e-Archivo

Universal parameter optimisation in games based on SPSA

Author: A. D. Anastasiadis
B. T. Polyak
C. Igel
Csaba Szepesvári
D. Billings
G. Tesauro
H. Chen
H. J. Kushner
H. Robbins
J. Baxter
J. Baxter
J. C. Spall
J. C. Spall
J. C. Spall
J. Dippon
J. Kiefer
J. R. Blum
K. Chellapilla
Levente Kocsis
N. L. Kleinman
P. Glasserman
P. L’Ecuyer
R. J. Williams
R. S. Sutton
R. Y. Rubinstein
Y. Björnsson
Y. He
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A simpler approach to accelerated optimization: iterative averaging meets optimism

Author: György A.
Joulani P.
Raj A.
Szepesvári C.
Publication venue
Publication date: 01/03/2021
Field of study

MPG.PuRe

Spatial Spreading Of Action Values In Q-Learning

Author: Carlos H. C. Ribeiro
Csaba Szepesvári
Publication venue
Publication date
Field of study

One of the problems associated with Q-learning is its inefficient use of training information: experience provides information that affects the learning process only locally both in space and time. We investigate here the use of spreading of action value updates towards regions of the state space different from the one visited by the agent at a certain instant of time. In particular, we consider the case when the strength of spreading decays over time. It is shown that this mechanism still provides convergence to the optimal action values and can generate superior policies provided that the similarity function underlying the spreading mechanism fits the world, at least weakly. In such cases the initial performance of the algorithm can be poor and the performance starts to improve when spreading is almost over. We can interpret the phenomenon as the realisation of a `search-then-converge' method. I. INTRODUCTION Reinforcement learning (RL) has been proposed in the last years as a promi..

CiteSeerX