14,751 research outputs found
Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression
We consider the optimization of a quadratic objective function whose
gradients are only accessible through a stochastic oracle that returns the
gradient at any given point plus a zero-mean finite variance random error. We
present the first algorithm that achieves jointly the optimal prediction error
rates for least-squares regression, both in terms of forgetting of initial
conditions in O(1/n 2), and in terms of dependence on the noise and dimension d
of the problem, as O(d/n). Our new algorithm is based on averaged accelerated
regularized gradient descent, and may also be analyzed through finer
assumptions on initial conditions and the Hessian matrix, leading to
dimension-free quantities that may still be small while the " optimal " terms
above are large. In order to characterize the tightness of these new bounds, we
consider an application to non-parametric regression and use the known lower
bounds on the statistical performance (without computational limits), which
happen to match our bounds obtained from a single pass on the data and thus
show optimality of our algorithm in a wide variety of particular trade-offs
between bias and variance
Non-convex regularization in remote sensing
In this paper, we study the effect of different regularizers and their
implications in high dimensional image classification and sparse linear
unmixing. Although kernelization or sparse methods are globally accepted
solutions for processing data in high dimensions, we present here a study on
the impact of the form of regularization used and its parametrization. We
consider regularization via traditional squared (2) and sparsity-promoting (1)
norms, as well as more unconventional nonconvex regularizers (p and Log Sum
Penalty). We compare their properties and advantages on several classification
and linear unmixing tasks and provide advices on the choice of the best
regularizer for the problem at hand. Finally, we also provide a fully
functional toolbox for the community.Comment: 11 pages, 11 figure
Regularized system identification using orthonormal basis functions
Most of existing results on regularized system identification focus on
regularized impulse response estimation. Since the impulse response model is a
special case of orthonormal basis functions, it is interesting to consider if
it is possible to tackle the regularized system identification using more
compact orthonormal basis functions. In this paper, we explore two
possibilities. First, we construct reproducing kernel Hilbert space of impulse
responses by orthonormal basis functions and then use the induced reproducing
kernel for the regularized impulse response estimation. Second, we extend the
regularization method from impulse response estimation to the more general
orthonormal basis functions estimation. For both cases, the poles of the basis
functions are treated as hyperparameters and estimated by empirical Bayes
method. Then we further show that the former is a special case of the latter,
and more specifically, the former is equivalent to ridge regression of the
coefficients of the orthonormal basis functions.Comment: 6 pages, final submission of an contribution for European Control
Conference 2015, uploaded on March 20, 201
Manifold Elastic Net: A Unified Framework for Sparse Dimension Reduction
It is difficult to find the optimal sparse solution of a manifold learning
based dimensionality reduction algorithm. The lasso or the elastic net
penalized manifold learning based dimensionality reduction is not directly a
lasso penalized least square problem and thus the least angle regression (LARS)
(Efron et al. \cite{LARS}), one of the most popular algorithms in sparse
learning, cannot be applied. Therefore, most current approaches take indirect
ways or have strict settings, which can be inconvenient for applications. In
this paper, we proposed the manifold elastic net or MEN for short. MEN
incorporates the merits of both the manifold learning based dimensionality
reduction and the sparse learning based dimensionality reduction. By using a
series of equivalent transformations, we show MEN is equivalent to the lasso
penalized least square problem and thus LARS is adopted to obtain the optimal
sparse solution of MEN. In particular, MEN has the following advantages for
subsequent classification: 1) the local geometry of samples is well preserved
for low dimensional data representation, 2) both the margin maximization and
the classification error minimization are considered for sparse projection
calculation, 3) the projection matrix of MEN improves the parsimony in
computation, 4) the elastic net penalty reduces the over-fitting problem, and
5) the projection matrix of MEN can be interpreted psychologically and
physiologically. Experimental evidence on face recognition over various popular
datasets suggests that MEN is superior to top level dimensionality reduction
algorithms.Comment: 33 pages, 12 figure
- …