571 research outputs found
Matching Pursuit Shrinkage in Hilbert Spaces
International audienceIn this paper, we study a variant of the Matching Pursuit named Matching Pursuit Shrinkage. Similarly to the Matching Pursuit it seeks for an approximation of a datum living in a Hilbert space by a sparse linear expansion in an enumerable set of atoms. The difference with the usual Matching Pursuit is that, once an atom has been selected, we do not erase all the information along the direction of this atom. Doing so, we can evolve slowly along that direction. The goal is to attenuate the negative impact of bad atom selections. We analyse the link between the shrinkage function used by the algorithm and the fact that the result belongs to an lp space
Boosting for high-dimensional linear models
We prove that boosting with the squared error loss, Boosting, is
consistent for very high-dimensional linear models, where the number of
predictor variables is allowed to grow essentially as fast as (exp(sample
size)), assuming that the true underlying regression function is sparse in
terms of the -norm of the regression coefficients. In the language of
signal processing, this means consistency for de-noising using a strongly
overcomplete dictionary if the underlying signal is sparse in terms of the
-norm. We also propose here an -based method for tuning,
namely for choosing the number of boosting iterations. This makes Boosting
computationally attractive since it is not required to run the algorithm
multiple times for cross-validation as commonly used so far. We demonstrate
Boosting for simulated data, in particular where the predictor dimension
is large in comparison to sample size, and for a difficult tumor-classification
problem with gene expression microarray data.Comment: Published at http://dx.doi.org/10.1214/009053606000000092 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Improving the Practicality of Model-Based Reinforcement Learning: An Investigation into Scaling up Model-Based Methods in Online Settings
This thesis is a response to the current scarcity of practical model-based control algorithms in the reinforcement learning (RL) framework. As of yet there is no consensus on how best to integrate imperfect transition models into RL whilst mitigating policy improvement instabilities in online settings. Current state-of-the-art policy learning algorithms that surpass human performance often rely on model-free approaches that enjoy unmitigated sampling of transition data. Model-based RL (MBRL) instead attempts to distil experience into transition models that allow agents to plan new policies without needing to return to the environment and sample more data. The initial focus of this investigation is on kernel conditional mean embeddings (CMEs) (Song et al., 2009) deployed in an approximate policy iteration (API) algorithm (Grünewälder et al., 2012a). This existing MBRL algorithm boasts theoretically stable policy updates in continuous state and discrete action spaces. The Bellman operator’s value function and (transition) conditional expectation are modelled and embedded respectively as functions in a reproducing kernel Hilbert space (RKHS). The resulting finite-induced approximate pseudo-MDP (Yao et al., 2014a) can be solved exactly in a dynamic programming algorithm with policy improvement suboptimality guarantees. However model construction and policy planning scale cubically and quadratically respectively with the training set size, rendering the CME impractical for sampleabundant tasks in online settings. Three variants of CME API are investigated to strike a balance between stable policy updates and reduced computational complexity. The first variant models the value function and state-action representation explicitly in a parametric CME (PCME) algorithm with favourable computational complexity. However a soft conservative policy update technique is developed to mitigate policy learning oscillations in the planning process. The second variant returns to the non-parametric embedding and contributes (along with external work) to the compressed CME (CCME); a sparse and computationally more favourable CME. The final variant is a fully end-to-end differentiable embedding trained with stochastic gradient updates. The value function remains modelled in an RKHS such that backprop is driven by a non-parametric RKHS loss function. Actively compressed CME (ACCME) satisfies the pseudo-MDP contraction constraint using a sparse softmax activation function. The size of the pseudo-MDP (i.e. the size of the embedding’s last layer) is controlled by sparsifying the last layer weight matrix by extending the truncated gradient method (Langford et al., 2009) with group lasso updates in a novel ‘use it or lose it’ neuron pruning mechanism. Surprisingly this technique does not require extensive fine-tuning between control tasks
Sharp Convergence Rates for Matching Pursuit
We study the fundamental limits of matching pursuit, or the pure greedy
algorithm, for approximating a target function by a sparse linear combination
of elements from a dictionary. When the target function is contained in the
variation space corresponding to the dictionary, many impressive works over the
past few decades have obtained upper and lower bounds on the error of matching
pursuit, but they do not match. The main contribution of this paper is to close
this gap and obtain a sharp characterization of the decay rate of matching
pursuit. Specifically, we construct a worst case dictionary which shows that
the existing best upper bound cannot be significantly improved. It turns out
that, unlike other greedy algorithm variants, the converge rate is suboptimal
and is determined by the solution to a certain non-linear equation. This
enables us to conclude that any amount of shrinkage improves matching pursuit
in the worst case
Optimization with Sparsity-Inducing Penalties
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties. We cover proximal methods, block-coordinate
descent, reweighted -penalized techniques, working-set and homotopy
methods, as well as non-convex formulations and extensions, and provide an
extensive set of experiments to compare various algorithms from a computational
point of view
Learning Model-Based Sparsity via Projected Gradient Descent
Several convex formulation methods have been proposed previously for
statistical estimation with structured sparsity as the prior. These methods
often require a carefully tuned regularization parameter, often a cumbersome or
heuristic exercise. Furthermore, the estimate that these methods produce might
not belong to the desired sparsity model, albeit accurately approximating the
true parameter. Therefore, greedy-type algorithms could often be more desirable
in estimating structured-sparse parameters. So far, these greedy methods have
mostly focused on linear statistical models. In this paper we study the
projected gradient descent with non-convex structured-sparse parameter model as
the constraint set. Should the cost function have a Stable Model-Restricted
Hessian the algorithm produces an approximation for the desired minimizer. As
an example we elaborate on application of the main results to estimation in
Generalized Linear Model
- …