31 research outputs found
PAC-Bayes analysis beyond the usual bounds
We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirements of fixed ‘data-free’ priors, bounded losses, and i.i.d. data. We highlight that those requirements were used to upper-bound an exponential moment term, while the basic PAC-Bayes theorem remains valid without those restrictions. We present three bounds that illustrate the use of data-dependent priors, including one for the unbounded square loss
On the Role of Optimization in Double Descent: A Least Squares Study
Empirically it has been observed that the performance of deep neural networks
steadily improves as we increase model size, contradicting the classical view
on overfitting and generalization. Recently, the double descent phenomena has
been proposed to reconcile this observation with theory, suggesting that the
test error has a second descent when the model becomes sufficiently
overparameterized, as the model size itself acts as an implicit regularizer. In
this paper we add to the growing body of work in this space, providing a
careful study of learning dynamics as a function of model size for the least
squares scenario. We show an excess risk bound for the gradient descent
solution of the least squares objective. The bound depends on the smallest
non-zero eigenvalue of the covariance matrix of the input features, via a
functional form that has the double descent behavior. This gives a new
perspective on the double descent curves reported in the literature. Our
analysis of the excess risk allows to decouple the effect of optimization and
generalization error. In particular, we find that in case of noiseless
regression, double descent is explained solely by optimization-related
quantities, which was missed in studies focusing on the Moore-Penrose
pseudoinverse solution. We believe that our derivation provides an alternative
view compared to existing work, shedding some light on a possible cause of this
phenomena, at least in the considered least squares setting. We empirically
explore if our predictions hold for neural networks, in particular whether the
covariance of intermediary hidden activations has a similar behavior as the one
predicted by our derivations
Tighter risk certificates for neural networks
This paper presents an empirical study regarding training probabilistic neural networks using training objectives derived from PAC-Bayes bounds. In the context of probabilistic neural networks, the output of training is a probability distribution over network weights. We present two training objectives, used here for the first time in connection with training neural networks. These two training objectives are derived from tight PAC-Bayes bounds. We also re-implement a previously used training objective based on a classical PAC-Bayes bound, to compare the properties of the predictors learned using the different training objectives. We compute risk certificates for the learnt predictors, based on part of the data used to learn the predictors. We further experiment with different types of priors on the weights (both data-free and data-dependent priors) and neural network architectures. Our experiments on MNIST and CIFAR-10 show that our training methods produce competitive test set errors and non-vacuous risk bounds with much tighter values than previous results in the literature, showing promise not only to guide the learning algorithm through bounding the risk but also for model selection. These observations suggest that the methods studied here might be good candidates for self-certified learning, in the sense of using the whole data set for learning a predictor and certifying its risk on any unseen data (from the same distribution as the training data) potentially without the need for holding out test data
Bounds and dynamics for empirical game theoretic analysis
This paper provides several theoretical results for empirical game theory. Specifically, we introduce bounds for empirical game theoretical analysis of complex multi-agent interactions. In doing so we provide insights in the empirical meta game showing that a Nash equilibrium of the estimated meta-game is an approximate Nash equilibrium of the true underlying meta-game. We investigate and show how many data samples are required to obtain a close enough approximation of the underlying game. Additionally, we extend the evolutionary dynamics analysis of meta-games using heuristic payoff tables (HPTs) to asymmetric games. The state-of-the-art has only considered evolutionary dynamics of symmetric HPTs in which agents have access to the same strategy sets and the payoff structure is symmetric, implying that agents are interchangeable. Finally, we carry out an empirical illustration of the generalised method in several domains, illustrating the theory and evolutionary dynamics of several versions of the AlphaGo algorithm (symmetric), the dynamics of the Colonel Blotto game played by human players on Facebook (symmetric), the dynamics of several teams of players in the capture the flag game (symmetric), and an example of a meta-game in Leduc Poker (asymmetric), generated by the policy-space response oracle multi-agent learning algorithm
BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback
In this paper, we study the problem of safe online learning to re-rank, where
user feedback is used to improve the quality of displayed lists. Learning to
rank has traditionally been studied in two settings. In the offline setting,
rankers are typically learned from relevance labels created by judges. This
approach has generally become standard in industrial applications of ranking,
such as search. However, this approach lacks exploration and thus is limited by
the information content of the offline training data. In the online setting, an
algorithm can experiment with lists and learn from feedback on them in a
sequential fashion. Bandit algorithms are well-suited for this setting but they
tend to learn user preferences from scratch, which results in a high initial
cost of exploration. This poses an additional challenge of safe exploration in
ranked lists. We propose BubbleRank, a bandit algorithm for safe re-ranking
that combines the strengths of both the offline and online settings. The
algorithm starts with an initial base list and improves it online by gradually
exchanging higher-ranked less attractive items for lower-ranked more attractive
items. We prove an upper bound on the n-step regret of BubbleRank that degrades
gracefully with the quality of the initial base list. Our theoretical findings
are supported by extensive experiments on a large-scale real-world click
dataset
Scenario trees and policy selection for multistage stochastic programming using machine learning
We propose a hybrid algorithmic strategy for complex stochastic optimization
problems, which combines the use of scenario trees from multistage stochastic
programming with machine learning techniques for learning a policy in the form
of a statistical model, in the context of constrained vector-valued decisions.
Such a policy allows one to run out-of-sample simulations over a large number
of independent scenarios, and obtain a signal on the quality of the
approximation scheme used to solve the multistage stochastic program. We
propose to apply this fast simulation technique to choose the best tree from a
set of scenario trees. A solution scheme is introduced, where several scenario
trees with random branching structure are solved in parallel, and where the
tree from which the best policy for the true problem could be learned is
ultimately retained. Numerical tests show that excellent trade-offs can be
achieved between run times and solution quality
Crowd computing as a cooperation problem: an evolutionary approach
Cooperation is one of the socio-economic issues that has received more attention from the physics community. The problem has been mostly considered by studying games such as the Prisoner's Dilemma or the Public Goods Game. Here, we take a step forward by studying cooperation in the context of crowd computing. We introduce a model loosely based on Principal-agent theory in which people (workers) contribute to the solution of a distributed problem by computing answers and reporting to the problem proposer (master). To go beyond classical approaches involving the concept of Nash equilibrium, we work on an evolutionary framework in which both the master and the workers update their behavior through reinforcement learning. Using a Markov chain approach, we show theoretically that under certain----not very restrictive-conditions, the master can ensure the reliability of the answer resulting of the process. Then, we study the model by numerical simulations, finding that convergence, meaning that the system reaches a point in which it always produces reliable answers, may in general be much faster than the upper bounds given by the theoretical calculation. We also discuss the effects of the master's level of tolerance to defectors, about which the theory does not provide information. The discussion shows that the system works even with very large tolerances. We conclude with a discussion of our results and possible directions to carry this research further.This work is supported by the Cyprus Research Promotion Foundation grant TE/HPO/0609(BE)/05, the National Science Foundation (CCF-0937829, CCF-1114930), Comunidad de Madrid grant S2009TIC-1692 and MODELICO-CM, Spanish MOSAICO, PRODIEVO and RESINEE grants and MICINN grant TEC2011-29688-C02-01, and National Natural Science Foundation of China grant 61020106002.Publicad
Spatial Spreading Of Action Values In Q-Learning
One of the problems associated with Q-learning is its inefficient use of training information: experience provides information that affects the learning process only locally both in space and time. We investigate here the use of spreading of action value updates towards regions of the state space different from the one visited by the agent at a certain instant of time. In particular, we consider the case when the strength of spreading decays over time. It is shown that this mechanism still provides convergence to the optimal action values and can generate superior policies provided that the similarity function underlying the spreading mechanism fits the world, at least weakly. In such cases the initial performance of the algorithm can be poor and the performance starts to improve when spreading is almost over. We can interpret the phenomenon as the realisation of a `search-then-converge' method. I. INTRODUCTION Reinforcement learning (RL) has been proposed in the last years as a promi..