Search CORE

2 research outputs found

Approximation and learning by greedy algorithms

Author: Barron Andrew R.
Cohen Albert
Dahmen Wolfgang
DeVore Ronald A.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

We consider the problem of approximating a given element

f

from a Hilbert space

\mathcal{H}

by means of greedy algorithms and the application of such procedures to the regression problem in statistical learning theory. We improve on the existing theory of convergence rates for both the orthogonal greedy algorithm and the relaxed greedy algorithm, as well as for the forward stepwise projection algorithm. For all these algorithms, we prove convergence results for a variety of function classes and not simply those that are related to the convex hull of the dictionary. We then show how these bounds for convergence rates lead to a new theory for the performance of greedy algorithms in learning. In particular, we build upon the results in [IEEE Trans. Inform. Theory 42 (1996) 2118--2132] to construct learning algorithms based on greedy approximations which are universally consistent and provide provable convergence rates for large classes of functions. The use of greedy algorithms in the context of learning is very appealing since it greatly reduces the computational burden when compared with standard model selection using general dictionaries.Comment: Published in at http://dx.doi.org/10.1214/009053607000000631 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Publikationsserver der RWTH Aachen University

MDL, Penalized Likelihood, and Statistical Risk

Author: Andrew R Barron
Cong Huang
Jonathan Q Li
Xi Luo
Publication venue
Publication date: 24/04/2020
Field of study

Abstract-We determine, for both countable and uncountable collections of functions, information-theoretic conditions on a penalty pen(f ) such that the optimizerf of the penalized log likelihood criterion log 1/likelihood(f )+pen(f ) has risk not more than the index of resolvability corresponding to the accuracy of the optimizer of the expected value of the criterion. If F is the linear span of a dictionary of functions, traditional descriptionlength penalties are based on the number of non-zero terms (the 0 norm of the coefficients). We specialize our general conclusions to show the 1 norm of the coefficients times a suitable multiplier λ is also an information-theoretically valid penalty

CiteSeerX