8,812 research outputs found
Explicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms
Inductive learning is based on inferring a general rule from a finite data
set and using it to label new data. In transduction one attempts to solve the
problem of using a labeled training set to label a set of unlabeled points,
which are given to the learner prior to learning. Although transduction seems
at the outset to be an easier task than induction, there have not been many
provably useful algorithms for transduction. Moreover, the precise relation
between induction and transduction has not yet been determined. The main
theoretical developments related to transduction were presented by Vapnik more
than twenty years ago. One of Vapnik's basic results is a rather tight error
bound for transductive classification based on an exact computation of the
hypergeometric tail. While tight, this bound is given implicitly via a
computational routine. Our first contribution is a somewhat looser but explicit
characterization of a slightly extended PAC-Bayesian version of Vapnik's
transductive bound. This characterization is obtained using concentration
inequalities for the tail of sums of random variables obtained by sampling
without replacement. We then derive error bounds for compression schemes such
as (transductive) support vector machines and for transduction algorithms based
on clustering. The main observation used for deriving these new error bounds
and algorithms is that the unlabeled test points, which in the transductive
setting are known in advance, can be used in order to construct useful data
dependent prior distributions over the hypothesis space
PAC-Bayesian Theory Meets Bayesian Inference
We exhibit a strong link between frequentist PAC-Bayesian risk bounds and the
Bayesian marginal likelihood. That is, for the negative log-likelihood loss
function, we show that the minimization of PAC-Bayesian generalization risk
bounds maximizes the Bayesian marginal likelihood. This provides an alternative
explanation to the Bayesian Occam's razor criteria, under the assumption that
the data is generated by an i.i.d distribution. Moreover, as the negative
log-likelihood is an unbounded loss function, we motivate and propose a
PAC-Bayesian theorem tailored for the sub-gamma loss family, and we show that
our approach is sound on classical Bayesian linear regression tasks.Comment: Published at NIPS 2015
(http://papers.nips.cc/paper/6569-pac-bayesian-theory-meets-bayesian-inference
Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet
Various optimality properties of universal sequence predictors based on
Bayes-mixtures in general, and Solomonoff's prediction scheme in particular,
will be studied. The probability of observing at time , given past
observations can be computed with the chain rule if the true
generating distribution of the sequences is known. If
is unknown, but known to belong to a countable or continuous class \M
one can base ones prediction on the Bayes-mixture defined as a
-weighted sum or integral of distributions \nu\in\M. The cumulative
expected loss of the Bayes-optimal universal prediction scheme based on
is shown to be close to the loss of the Bayes-optimal, but infeasible
prediction scheme based on . We show that the bounds are tight and that no
other predictor can lead to significantly smaller bounds. Furthermore, for
various performance measures, we show Pareto-optimality of and give an
Occam's razor argument that the choice for the weights
is optimal, where is the length of the shortest program describing
. The results are applied to games of chance, defined as a sequence of
bets, observations, and rewards. The prediction schemes (and bounds) are
compared to the popular predictors based on expert advice. Extensions to
infinite alphabets, partial, delayed and probabilistic prediction,
classification, and more active systems are briefly discussed.Comment: 34 page
Towards Machine Wald
The past century has seen a steady increase in the need of estimating and
predicting complex systems and making (possibly critical) decisions with
limited information. Although computers have made possible the numerical
evaluation of sophisticated statistical models, these models are still designed
\emph{by humans} because there is currently no known recipe or algorithm for
dividing the design of a statistical model into a sequence of arithmetic
operations. Indeed enabling computers to \emph{think} as \emph{humans} have the
ability to do when faced with uncertainty is challenging in several major ways:
(1) Finding optimal statistical models remains to be formulated as a well posed
problem when information on the system of interest is incomplete and comes in
the form of a complex combination of sample data, partial knowledge of
constitutive relations and a limited description of the distribution of input
random variables. (2) The space of admissible scenarios along with the space of
relevant information, assumptions, and/or beliefs, tend to be infinite
dimensional, whereas calculus on a computer is necessarily discrete and finite.
With this purpose, this paper explores the foundations of a rigorous framework
for the scientific computation of optimal statistical estimators/models and
reviews their connections with Decision Theory, Machine Learning, Bayesian
Inference, Stochastic Optimization, Robust Optimization, Optimal Uncertainty
Quantification and Information Based Complexity.Comment: 37 page
Learning the Structure and Parameters of Large-Population Graphical Games from Behavioral Data
We consider learning, from strictly behavioral data, the structure and
parameters of linear influence games (LIGs), a class of parametric graphical
games introduced by Irfan and Ortiz (2014). LIGs facilitate causal strategic
inference (CSI): Making inferences from causal interventions on stable behavior
in strategic settings. Applications include the identification of the most
influential individuals in large (social) networks. Such tasks can also support
policy-making analysis. Motivated by the computational work on LIGs, we cast
the learning problem as maximum-likelihood estimation (MLE) of a generative
model defined by pure-strategy Nash equilibria (PSNE). Our simple formulation
uncovers the fundamental interplay between goodness-of-fit and model
complexity: good models capture equilibrium behavior within the data while
controlling the true number of equilibria, including those unobserved. We
provide a generalization bound establishing the sample complexity for MLE in
our framework. We propose several algorithms including convex loss minimization
(CLM) and sigmoidal approximations. We prove that the number of exact PSNE in
LIGs is small, with high probability; thus, CLM is sound. We illustrate our
approach on synthetic data and real-world U.S. congressional voting records. We
briefly discuss our learning framework's generality and potential applicability
to general graphical games.Comment: Journal of Machine Learning Research. (accepted, pending
publication.) Last conference version: submitted March 30, 2012 to UAI 2012.
First conference version: entitled, Learning Influence Games, initially
submitted on June 1, 2010 to NIPS 201
- …