14 research outputs found
On metric entropy, Vapnik-Chervonenkis dimension, and learnability for a class of distributions
Cover title.Includes bibliographical references (p. 13-14).Research supported by the U.S. Army Research Office. DAAL03-86-K-0171 Research supported by the Department of the Navy for SDIO.Sanjeev R. Kulkarni
Non-uniform Online Learning: Towards Understanding Induction
Can a physicist make only finite errors in the endless pursuit of the law of
nature? This millennium-old question of inductive inference is a fundamental,
yet mysterious problem in philosophy, lacking rigorous justifications. While
classic online learning theory and inductive inference share a similar
sequential decision-making spirit, the former's reliance on an adaptive
adversary and worst-case error bounds limits its applicability to the latter.
In this work, we introduce the concept of non-uniform online learning, which we
argue aligns more closely with the principles of inductive reasoning. This
setting assumes a predetermined ground-truth hypothesis and considers
non-uniform, hypothesis-wise error bounds. In the realizable setting, we
provide a complete characterization of learnability with finite error: a
hypothesis class is non-uniform learnable if and only if it's a countable union
of Littlestone classes, no matter the observations are adaptively chosen or iid
sampled. Additionally, we propose a necessary condition for the weaker
criterion of consistency which we conjecture to be tight. To further promote
our theory, we extend our result to the more realistic agnostic setting,
showing that any countable union of Littlestone classes can be learnt with
regret . We hope this work could offer a new perspective
of interpreting the power of induction from an online learning viewpoint.Comment: A manuscript. Comments are most welcome
Let's take the bias out of econometrics
This study exposes the cognitive flaws of ‘endogeneity bias’. It examines how conceptualisation of the bias has evolved to embrace all major econometric problems, despite extensive lack of hard evidence. It reveals the crux of the bias – a priori rejection of causal variables as conditionally valid ones, and of the bias correction by consistent estimators – modification of those variables by non-uniquely and non-causally generated regressors. It traces the flaws to misconceptions about error terms and estimation consistency. It highlights the need to shake off the bias to let statistical learning play an active and formal role in econometrics.
JEL classification: B23, B40, C10, C5
Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction
Multi-modal regression is important in forecasting nonstationary processes or
with a complex mixture of distributions. It can be tackled with multiple
hypotheses frameworks but with the difficulty of combining them efficiently in
a learning model. A Structured Radial Basis Function Network is presented as an
ensemble of multiple hypotheses predictors for regression problems. The
predictors are regression models of any type that can form centroidal Voronoi
tessellations which are a function of their losses during training. It is
proved that this structured model can efficiently interpolate this tessellation
and approximate the multiple hypotheses target distribution and is equivalent
to interpolating the meta-loss of the predictors, the loss being a zero set of
the interpolation error. This model has a fixed-point iteration algorithm
between the predictors and the centers of the basis functions. Diversity in
learning can be controlled parametrically by truncating the tessellation
formation with the losses of individual predictors. A closed-form solution with
least-squares is presented, which to the authors knowledge, is the fastest
solution in the literature for multiple hypotheses and structured predictions.
Superior generalization performance and computational efficiency is achieved
using only two-layer neural networks as predictors controlling diversity as a
key component of success. A gradient-descent approach is introduced which is
loss-agnostic regarding the predictors. The expected value for the loss of the
structured model with Gaussian basis functions is computed, finding that
correlation between predictors is not an appropriate tool for diversification.
The experiments show outperformance with respect to the top competitors in the
literature.Comment: 63 Pages, 40 Figure
Estimation et contrôle des performances en généralisation des réseaux de neurones
63 pagesIntroduction à la théorie statistique de l'apprentissage
Recommended from our members
N-learners problem: Fusion of concepts
We are given N learners each capable of learning concepts (subsets) of a domain set X in the sense of Valiant, i.e. for any c {element of} C {improper subset} 2{sup X}, given a finite set of examples of the form ; ;...; generated according to an unknown probability distribution P{sub X} on X, each learner produces a close approximation to c with a high probability. We are interested in combining the N learners using a single fuser or consolidator. We consider the paradigm of passive fusion, where each learner is first trained with the sample without the influence of the consolidator. The composite system is constituted by the fuser and the individual learners. We consider two cases: open and closed fusion. In open fusion the fuser is given the sample and the hypotheses of the individual learners; we show that the fusion rule can be obtained by formulating this problem as another learning problem. For the case all individual learners are trained with the same sample, we show sufficiency conditions that ensure the composite system to be better than the best of the individual: the hypothesis space of the consolidator (a) satisfies the isolation property of degree at least N, and (b) has Vapnik-Chervonenkis dimension less than or equal to that of every individual learner. If individual learners are trained by independently generated samples, we obtain a much weaker bound on the VC-dimension of the hypothesis space of the fuser. Second, in closed fusion the fuser does not have an access to either the training sample or the hypotheses of the individual learners. By suitable designing a linear threshold function of the outputs of individual learners, we show that the composite system can be made better than the best of the learners