14 research outputs found

    Variable selection for the multicategory SVM via adaptive sup-norm regularization

    Get PDF
    The Support Vector Machine (SVM) is a popular classification paradigm in machine learning and has achieved great success in real applications. However, the standard SVM can not select variables automatically and therefore its solution typically utilizes all the input variables without discrimination. This makes it difficult to identify important predictor variables, which is often one of the primary goals in data analysis. In this paper, we propose two novel types of regularization in the context of the multicategory SVM (MSVM) for simultaneous classification and variable selection. The MSVM generally requires estimation of multiple discriminating functions and applies the argmax rule for prediction. For each individual variable, we propose to characterize its importance by the supnorm of its coefficient vector associated with different functions, and then minimize the MSVM hinge loss function subject to a penalty on the sum of supnorms. To further improve the supnorm penalty, we propose the adaptive regularization, which allows different weights imposed on different variables according to their relative importance. Both types of regularization automate variable selection in the process of building classifiers, and lead to sparse multi-classifiers with enhanced interpretability and improved accuracy, especially for high dimensional low sample size data. One big advantage of the supnorm penalty is its easy implementation via standard linear programming. Several simulated examples and one real gene data analysis demonstrate the outstanding performance of the adaptive supnorm penalty in various data settings.Comment: Published in at http://dx.doi.org/10.1214/08-EJS122 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On Sparsity Inducing Regularization Methods for Machine Learning

    Full text link
    During the past years there has been an explosion of interest in learning methods based on sparsity regularization. In this paper, we discuss a general class of such methods, in which the regularizer can be expressed as the composition of a convex function ω\omega with a linear function. This setting includes several methods such the group Lasso, the Fused Lasso, multi-task learning and many more. We present a general approach for solving regularization problems of this kind, under the assumption that the proximity operator of the function ω\omega is available. Furthermore, we comment on the application of this approach to support vector machines, a technique pioneered by the groundbreaking work of Vladimir Vapnik.Comment: 12 pages. arXiv admin note: text overlap with arXiv:1104.143

    Universal Kernels

    Get PDF
    In this paper we investigate conditions on the features of a continuous kernel so that it may approximate an arbitrary continuous target function uniformly on any compact subset of the input space. A number of concrete examples are given of kernels with this universal approximating property

    Duality for Neural Networks through Reproducing Kernel Banach Spaces

    Get PDF
    Reproducing Kernel Hilbert spaces (RKHS) have been a very successful tool in various areas of machine learning. Recently, Barron spaces have been used to prove bounds on the generalisation error for neural networks. Unfortunately, Barron spaces cannot be understood in terms of RKHS due to the strong nonlinear coupling of the weights. This can be solved by using the more general Reproducing Kernel Banach spaces (RKBS). We show that these Barron spaces belong to a class of integral RKBS. This class can also be understood as an infinite union of RKHS spaces. Furthermore, we show that the dual space of such RKBSs, is again an RKBS where the roles of the data and parameters are interchanged, forming an adjoint pair of RKBSs including a reproducing kernel. This allows us to construct the saddle point problem for neural networks, which can be used in the whole field of primal-dual optimisation
    corecore