790 research outputs found
Does generalization performance of regularization learning depend on ? A negative example
-regularization has been demonstrated to be an attractive technique in
machine learning and statistical modeling. It attempts to improve the
generalization (prediction) capability of a machine (model) through
appropriately shrinking its coefficients. The shape of a estimator
differs in varying choices of the regularization order . In particular,
leads to the LASSO estimate, while corresponds to the smooth
ridge regression. This makes the order a potential tuning parameter in
applications. To facilitate the use of -regularization, we intend to
seek for a modeling strategy where an elaborative selection on is
avoidable. In this spirit, we place our investigation within a general
framework of -regularized kernel learning under a sample dependent
hypothesis space (SDHS). For a designated class of kernel functions, we show
that all estimators for attain similar generalization
error bounds. These estimated bounds are almost optimal in the sense that up to
a logarithmic factor, the upper and lower bounds are asymptotically identical.
This finding tentatively reveals that, in some modeling contexts, the choice of
might not have a strong impact in terms of the generalization capability.
From this perspective, can be arbitrarily specified, or specified merely by
other no generalization criteria like smoothness, computational complexity,
sparsity, etc..Comment: 35 pages, 3 figure
Model selection of polynomial kernel regression
Polynomial kernel regression is one of the standard and state-of-the-art
learning strategies. However, as is well known, the choices of the degree of
polynomial kernel and the regularization parameter are still open in the realm
of model selection. The first aim of this paper is to develop a strategy to
select these parameters. On one hand, based on the worst-case learning rate
analysis, we show that the regularization term in polynomial kernel regression
is not necessary. In other words, the regularization parameter can decrease
arbitrarily fast when the degree of the polynomial kernel is suitable tuned. On
the other hand,taking account of the implementation of the algorithm, the
regularization term is required. Summarily, the effect of the regularization
term in polynomial kernel regression is only to circumvent the " ill-condition"
of the kernel matrix. Based on this, the second purpose of this paper is to
propose a new model selection strategy, and then design an efficient learning
algorithm. Both theoretical and experimental analysis show that the new
strategy outperforms the previous one. Theoretically, we prove that the new
learning strategy is almost optimal if the regression function is smooth.
Experimentally, it is shown that the new strategy can significantly reduce the
computational burden without loss of generalization capability.Comment: 29 pages, 4 figure
Regularized Regression Problem in hyper-RKHS for Learning Kernels
This paper generalizes the two-stage kernel learning framework, illustrates
its utility for kernel learning and out-of-sample extensions, and proves
{asymptotic} convergence results for the introduced kernel learning model.
Algorithmically, we extend target alignment by hyper-kernels in the two-stage
kernel learning framework. The associated kernel learning task is formulated as
a regression problem in a hyper-reproducing kernel Hilbert space (hyper-RKHS),
i.e., learning on the space of kernels itself. To solve this problem, we
present two regression models with bivariate forms in this space, including
kernel ridge regression (KRR) and support vector regression (SVR) in the
hyper-RKHS. By doing so, it provides significant model flexibility for kernel
learning with outstanding performance in real-world applications. Specifically,
our kernel learning framework is general, that is, the learned underlying
kernel can be positive definite or indefinite, which adapts to various
requirements in kernel learning. Theoretically, we study the convergence
behavior of these learning algorithms in the hyper-RKHS and derive the learning
rates. Different from the traditional approximation analysis in RKHS, our
analyses need to consider the non-trivial independence of pairwise samples and
the characterisation of hyper-RKHS. To the best of our knowledge, this is the
first work in learning theory to study the approximation performance of
regularized regression problem in hyper-RKHS.Comment: 25 pages, 3 figure
Local polynomial regression for circular predictors
We consider local smoothing of datasets where the design space is the d-dimensional (d >= 1) torus and the response variable is real-valued. Our purpose is to extend least squares local polynomial fitting to this situation. We give both theoretical and empirical results
- …