2 research outputs found
Approximation and learning by greedy algorithms
We consider the problem of approximating a given element from a Hilbert
space by means of greedy algorithms and the application of such
procedures to the regression problem in statistical learning theory. We improve
on the existing theory of convergence rates for both the orthogonal greedy
algorithm and the relaxed greedy algorithm, as well as for the forward stepwise
projection algorithm. For all these algorithms, we prove convergence results
for a variety of function classes and not simply those that are related to the
convex hull of the dictionary. We then show how these bounds for convergence
rates lead to a new theory for the performance of greedy algorithms in
learning. In particular, we build upon the results in [IEEE Trans. Inform.
Theory 42 (1996) 2118--2132] to construct learning algorithms based on greedy
approximations which are universally consistent and provide provable
convergence rates for large classes of functions. The use of greedy algorithms
in the context of learning is very appealing since it greatly reduces the
computational burden when compared with standard model selection using general
dictionaries.Comment: Published in at http://dx.doi.org/10.1214/009053607000000631 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
MDL, Penalized Likelihood, and Statistical Risk
Abstract-We determine, for both countable and uncountable collections of functions, information-theoretic conditions on a penalty pen(f ) such that the optimizerf of the penalized log likelihood criterion log 1/likelihood(f )+pen(f ) has risk not more than the index of resolvability corresponding to the accuracy of the optimizer of the expected value of the criterion. If F is the linear span of a dictionary of functions, traditional descriptionlength penalties are based on the number of non-zero terms (the 0 norm of the coefficients). We specialize our general conclusions to show the 1 norm of the coefficients times a suitable multiplier λ is also an information-theoretically valid penalty