944 research outputs found
Model selection of polynomial kernel regression
Polynomial kernel regression is one of the standard and state-of-the-art
learning strategies. However, as is well known, the choices of the degree of
polynomial kernel and the regularization parameter are still open in the realm
of model selection. The first aim of this paper is to develop a strategy to
select these parameters. On one hand, based on the worst-case learning rate
analysis, we show that the regularization term in polynomial kernel regression
is not necessary. In other words, the regularization parameter can decrease
arbitrarily fast when the degree of the polynomial kernel is suitable tuned. On
the other hand,taking account of the implementation of the algorithm, the
regularization term is required. Summarily, the effect of the regularization
term in polynomial kernel regression is only to circumvent the " ill-condition"
of the kernel matrix. Based on this, the second purpose of this paper is to
propose a new model selection strategy, and then design an efficient learning
algorithm. Both theoretical and experimental analysis show that the new
strategy outperforms the previous one. Theoretically, we prove that the new
learning strategy is almost optimal if the regression function is smooth.
Experimentally, it is shown that the new strategy can significantly reduce the
computational burden without loss of generalization capability.Comment: 29 pages, 4 figure
Kernel methods in machine learning
We review machine learning methods employing positive definite kernels. These
methods formulate learning and estimation problems in a reproducing kernel
Hilbert space (RKHS) of functions defined on the data domain, expanded in terms
of a kernel. Working in linear spaces of function has the benefit of
facilitating the construction and analysis of learning algorithms while at the
same time allowing large classes of functions. The latter include nonlinear
functions as well as functions defined on nonvectorial data. We cover a wide
range of methods, ranging from binary classifiers to sophisticated methods for
estimation with structured data.Comment: Published in at http://dx.doi.org/10.1214/009053607000000677 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining
Big data comes in various ways, types, shapes, forms and sizes. Indeed,
almost all areas of science, technology, medicine, public health, economics,
business, linguistics and social science are bombarded by ever increasing flows
of data begging to analyzed efficiently and effectively. In this paper, we
propose a rough idea of a possible taxonomy of big data, along with some of the
most commonly used tools for handling each particular category of bigness. The
dimensionality p of the input space and the sample size n are usually the main
ingredients in the characterization of data bigness. The specific statistical
machine learning technique used to handle a particular big data set will depend
on which category it falls in within the bigness taxonomy. Large p small n data
sets for instance require a different set of tools from the large n small p
variety. Among other tools, we discuss Preprocessing, Standardization,
Imputation, Projection, Regularization, Penalization, Compression, Reduction,
Selection, Kernelization, Hybridization, Parallelization, Aggregation,
Randomization, Replication, Sequentialization. Indeed, it is important to
emphasize right away that the so-called no free lunch theorem applies here, in
the sense that there is no universally superior method that outperforms all
other methods on all categories of bigness. It is also important to stress the
fact that simplicity in the sense of Ockham's razor non plurality principle of
parsimony tends to reign supreme when it comes to massive data. We conclude
with a comparison of the predictive performance of some of the most commonly
used methods on a few data sets.Comment: 18 pages, 2 figures 3 table
Learning Theory and Approximation
The main goal of this workshop – the third one of this type at the MFO – has been to blend mathematical results from statistical learning theory and approximation theory to strengthen both disciplines and use synergistic effects to work on current research questions. Learning theory aims at modeling unknown function relations and data structures from samples in an automatic manner. Approximation theory is naturally used for the advancement and closely connected to the further development of learning theory, in particular for the exploration of new useful algorithms, and for the theoretical understanding of existing methods. Conversely, the study of learning theory also gives rise to interesting theoretical problems for approximation theory such as the approximation and sparse representation of functions or the construction of rich kernel reproducing Hilbert spaces on general metric spaces. This workshop has concentrated on the following recent topics: Pitchfork bifurcation of dynamical systems arising from mathematical foundations of cell development; regularized kernel based learning in the Big Data situation; deep learning; convergence rates of learning and online learning algorithms; numerical refinement algorithms to learning; statistical robustness of regularized kernel based learning
Positive Definite Kernels in Machine Learning
This survey is an introduction to positive definite kernels and the set of
methods they have inspired in the machine learning literature, namely kernel
methods. We first discuss some properties of positive definite kernels as well
as reproducing kernel Hibert spaces, the natural extension of the set of
functions associated with a kernel defined
on a space . We discuss at length the construction of kernel
functions that take advantage of well-known statistical models. We provide an
overview of numerous data-analysis methods which take advantage of reproducing
kernel Hilbert spaces and discuss the idea of combining several kernels to
improve the performance on certain tasks. We also provide a short cookbook of
different kernels which are particularly useful for certain data-types such as
images, graphs or speech segments.Comment: draft. corrected a typo in figure
Does generalization performance of regularization learning depend on ? A negative example
-regularization has been demonstrated to be an attractive technique in
machine learning and statistical modeling. It attempts to improve the
generalization (prediction) capability of a machine (model) through
appropriately shrinking its coefficients. The shape of a estimator
differs in varying choices of the regularization order . In particular,
leads to the LASSO estimate, while corresponds to the smooth
ridge regression. This makes the order a potential tuning parameter in
applications. To facilitate the use of -regularization, we intend to
seek for a modeling strategy where an elaborative selection on is
avoidable. In this spirit, we place our investigation within a general
framework of -regularized kernel learning under a sample dependent
hypothesis space (SDHS). For a designated class of kernel functions, we show
that all estimators for attain similar generalization
error bounds. These estimated bounds are almost optimal in the sense that up to
a logarithmic factor, the upper and lower bounds are asymptotically identical.
This finding tentatively reveals that, in some modeling contexts, the choice of
might not have a strong impact in terms of the generalization capability.
From this perspective, can be arbitrarily specified, or specified merely by
other no generalization criteria like smoothness, computational complexity,
sparsity, etc..Comment: 35 pages, 3 figure
- …