Search CORE

8,812 research outputs found

Explicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms

Author: Derbeko P.
El-Yaniv R.
Meir R.
Publication venue: 'AI Access Foundation'
Publication date: 30/06/2011
Field of study

Inductive learning is based on inferring a general rule from a finite data set and using it to label new data. In transduction one attempts to solve the problem of using a labeled training set to label a set of unlabeled points, which are given to the learner prior to learning. Although transduction seems at the outset to be an easier task than induction, there have not been many provably useful algorithms for transduction. Moreover, the precise relation between induction and transduction has not yet been determined. The main theoretical developments related to transduction were presented by Vapnik more than twenty years ago. One of Vapnik's basic results is a rather tight error bound for transductive classification based on an exact computation of the hypergeometric tail. While tight, this bound is given implicitly via a computational routine. Our first contribution is a somewhat looser but explicit characterization of a slightly extended PAC-Bayesian version of Vapnik's transductive bound. This characterization is obtained using concentration inequalities for the tail of sums of random variables obtained by sampling without replacement. We then derive error bounds for compression schemes such as (transductive) support vector machines and for transduction algorithms based on clustering. The main observation used for deriving these new error bounds and algorithms is that the unlabeled test points, which in the transductive setting are known in advance, can be used in order to construct useful data dependent prior distributions over the hypothesis space

arXiv.org e-Print Archive

Crossref

PAC-Bayesian Theory Meets Bayesian Inference

Author: Bach Francis
Germain Pascal
Lacoste Alexandre
Lacoste-Julien Simon
Publication venue
Publication date: 27/05/2016
Field of study

We exhibit a strong link between frequentist PAC-Bayesian risk bounds and the Bayesian marginal likelihood. That is, for the negative log-likelihood loss function, we show that the minimization of PAC-Bayesian generalization risk bounds maximizes the Bayesian marginal likelihood. This provides an alternative explanation to the Bayesian Occam's razor criteria, under the assumption that the data is generated by an i.i.d distribution. Moreover, as the negative log-likelihood is an unbounded loss function, we motivate and propose a PAC-Bayesian theorem tailored for the sub-gamma loss family, and we show that our approach is sound on classical Bayesian linear regression tasks.Comment: Published at NIPS 2015 (http://papers.nips.cc/paper/6569-pac-bayesian-theory-meets-bayesian-inference

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet

Author: Hutter Marcus
Publication venue
Publication date: 01/01/2002
Field of study

Various optimality properties of universal sequence predictors based on Bayes-mixtures in general, and Solomonoff's prediction scheme in particular, will be studied. The probability of observing

x_t

at time

t

, given past observations

x_1...x_{t-1}

can be computed with the chain rule if the true generating distribution

\mu

of the sequences

x_1x_2x_3...

is known. If

\mu

is unknown, but known to belong to a countable or continuous class \M one can base ones prediction on the Bayes-mixture

\xi

defined as a

w_\nu

-weighted sum or integral of distributions \nu\in\M. The cumulative expected loss of the Bayes-optimal universal prediction scheme based on

\xi

is shown to be close to the loss of the Bayes-optimal, but infeasible prediction scheme based on

\mu

. We show that the bounds are tight and that no other predictor can lead to significantly smaller bounds. Furthermore, for various performance measures, we show Pareto-optimality of

\xi

and give an Occam's razor argument that the choice

w_\nu\sim 2^{-K(\nu)}

for the weights is optimal, where

K(\nu)

is the length of the shortest program describing

\nu

. The results are applied to games of chance, defined as a sequence of bets, observations, and rewards. The prediction schemes (and bounds) are compared to the popular predictors based on expert advice. Extensions to infinite alphabets, partial, delayed and probabilistic prediction, classification, and more active systems are briefly discussed.Comment: 34 page

arXiv.org e-Print Archive

CiteSeerX

Towards Machine Wald

Author: A. Ben-Tal
A. Ben-Tal
A. Ben-Tal
A. Dvoretzky
A. Madansky
A. Shapiro
A. Spanos
A. Wald
A. Wald
A. Wald
A. Wald
A. Wald
A. Wald
A.A. Gaivoronski
A.A. Kidane
A.D. Rikun
A.M. Geoffrion
A.M. Stuart
A.S. Nemirovsky
A.W. Marshall
B. Rustem
B.J.K. Kleijn
C. Scovel
C.C. Huang
C.C. Huang
C.D. Aliprantis
D. Bertsimas
D. Blackwell
D.A. Freedman
D.A. Freedman
D.G. Kendall
E.L. Lehmann
E.L. Lehmann
E.L. Lehmann
E.S. Pearson
E.T. Jaynes
E.W. Packel
G. Belot
G. Tintner
G. Winkler
G. Winkler
G.A. Hanasusanto
G.B. Dantzig
G.W. Platzman
H. Hotelling
H. Joe
H. Leahu
H. Owhadi
H. Owhadi
H. Owhadi
H. Strasser
H. Woźniakowski
H.D. Kurz
H.D. Sherali
H.J. Godwin
H.P. Edmundson
I. Castillo
I. Elishakoff
I. Gilboa
I. Olkin
I. Pinelis
I. Pinelis
J. Kiefer
J. Kiefer
J. Lenhard
J. Neumann Von
J. Neumann Von
J. Neumann Von
J. Neyman
J. Neyman
J. Neyman
J. Neyman
J. Pfanzagl
J. Rojo
J. Rojo
J. Rojo
J. Wolfowitz
J. Žáčková
J.E. Smith
J.F. Nash Jr
J.R. Birge
J.W. Tukey
K. Frauendorfer
K. Isii
K. Zhou
L. Cam Le
L. Cam Le
L. Cam Le
L. Cam Le
L. Wasserman
L. Wasserman
L.D. Brown
L.D. Brown
L.E. Dubins
L.F. Richardson
L.G. Valiant
L.J. Savage
M. Adams
M. Kac
M. Mangel
M. Sniedovich
M. Sniedovich
M. Sniedovich
M. Wilson
M.G. Kreĭn
N.D. Singpurwalla
N.M. Laird
P. Kall
P. Lynch
P. Ressel
P.-H.T. Kamga
P.J. Huber
P.R. Halmos
R. Fisher
R. Fisher
R. Leonard
R. Mises von
R.A. Fisher
R.A. Fisher
R.F. Drenick
R.I. Boţ
R.T. Rockafellar
S. Boucheron
S.N. Bernstein
T. Tjur
T.J. Sullivan
T.W. Anderson
V. Bentkus
V. Bentkus
V. Bentkus
V.I. Bogachev
V.S. Varadarajan
W. Chen
W. Hoeffding
W. Wiesemann
Y. Ermoliev
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2015
Field of study

The past century has seen a steady increase in the need of estimating and predicting complex systems and making (possibly critical) decisions with limited information. Although computers have made possible the numerical evaluation of sophisticated statistical models, these models are still designed \emph{by humans} because there is currently no known recipe or algorithm for dividing the design of a statistical model into a sequence of arithmetic operations. Indeed enabling computers to \emph{think} as \emph{humans} have the ability to do when faced with uncertainty is challenging in several major ways: (1) Finding optimal statistical models remains to be formulated as a well posed problem when information on the system of interest is incomplete and comes in the form of a complex combination of sample data, partial knowledge of constitutive relations and a limited description of the distribution of input random variables. (2) The space of admissible scenarios along with the space of relevant information, assumptions, and/or beliefs, tend to be infinite dimensional, whereas calculus on a computer is necessarily discrete and finite. With this purpose, this paper explores the foundations of a rigorous framework for the scientific computation of optimal statistical estimators/models and reviews their connections with Decision Theory, Machine Learning, Bayesian Inference, Stochastic Optimization, Robust Optimization, Optimal Uncertainty Quantification and Information Based Complexity.Comment: 37 page

arXiv.org e-Print Archive

Crossref

Caltech Authors

Learning the Structure and Parameters of Large-Population Graphical Games from Behavioral Data

Author: Honorio Jean
Ortiz Luis
Publication venue
Publication date: 03/05/2015
Field of study

We consider learning, from strictly behavioral data, the structure and parameters of linear influence games (LIGs), a class of parametric graphical games introduced by Irfan and Ortiz (2014). LIGs facilitate causal strategic inference (CSI): Making inferences from causal interventions on stable behavior in strategic settings. Applications include the identification of the most influential individuals in large (social) networks. Such tasks can also support policy-making analysis. Motivated by the computational work on LIGs, we cast the learning problem as maximum-likelihood estimation (MLE) of a generative model defined by pure-strategy Nash equilibria (PSNE). Our simple formulation uncovers the fundamental interplay between goodness-of-fit and model complexity: good models capture equilibrium behavior within the data while controlling the true number of equilibria, including those unobserved. We provide a generalization bound establishing the sample complexity for MLE in our framework. We propose several algorithms including convex loss minimization (CLM) and sigmoidal approximations. We prove that the number of exact PSNE in LIGs is small, with high probability; thus, CLM is sound. We illustrate our approach on synthetic data and real-world U.S. congressional voting records. We briefly discuss our learning framework's generality and potential applicability to general graphical games.Comment: Journal of Machine Learning Research. (accepted, pending publication.) Last conference version: submitted March 30, 2012 to UAI 2012. First conference version: entitled, Learning Influence Games, initially submitted on June 1, 2010 to NIPS 201

arXiv.org e-Print Archive

CiteSeerX