7 research outputs found

    An Exponential Lower Bound on the Complexity of Regularization Paths

    Full text link
    For a variety of regularized optimization problems in machine learning, algorithms computing the entire solution path have been developed recently. Most of these methods are quadratic programs that are parameterized by a single parameter, as for example the Support Vector Machine (SVM). Solution path algorithms do not only compute the solution for one particular value of the regularization parameter but the entire path of solutions, making the selection of an optimal parameter much easier. It has been assumed that these piecewise linear solution paths have only linear complexity, i.e. linearly many bends. We prove that for the support vector machine this complexity can be exponential in the number of training points in the worst case. More strongly, we construct a single instance of n input points in d dimensions for an SVM such that at least \Theta(2^{n/2}) = \Theta(2^d) many distinct subsets of support vectors occur as the regularization parameter changes.Comment: Journal version, 28 Pages, 5 Figure

    On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection

    Get PDF
    Abstract Our objective is to develop formulations and algorithms for efficiently computing the feature selection path -i.e. the variation in classification accuracy as the fraction of selected features is varied from null to unity. Multiple Kernel Learning subject to l p≥1 regularization (l p -MKL) has been demonstrated to be one of the most effective techniques for non-linear feature selection. However, state-of-the-art l p -MKL algorithms are too computationally expensive to be invoked thousands of times to determine the entire path. We propose a novel conjecture which states that, for certain l p -MKL formulations, the number of features selected in the optimal solution monotonically decreases as p is decreased from an initial value to unity. We prove the conjecture, for a generic family of kernel target alignment based formulations, and show that the feature weights themselves decay (grow) monotonically once they are below (above) a certain threshold at optimality. This allows us to develop a path following algorithm that systematically generates optimal feature sets of decreasing size. The proposed algorithm sets certain feature weights directly to zero for potentially large intervals of p thereby reducing optimization costs while simultaneously providing approximation guarantees. We empirically demonstrate that our formulation can lead to classification accuracies which are as much as 10% higher on benchmark data sets not only as compared to other l p -MKL formulations and uniform kernel baselines but also leading feature selection methods. We further demonstrate that our algorithm reduces training time significantly over other path following algorithms and state-of-the-art l p -MKL optimizers such as SMO-MKL. In particular, we generate the entire feature selection path for data sets with a hundred thousand features in approximately half an hour on standard hardware. Entire path generation for such data set is well beyond the scaling capabilities of other methods

    Solution Path for Manifold Regularized Semisupervised Classification

    Full text link

    Sparse convex optimization methods for machine learning

    Get PDF
    Diss., Eidgenössische Technische Hochschule ETH Zürich, Nr. 20013, 201

    Two-dimensional solution path for support vector regression

    No full text
    Recently, a very appealing approach was proposed to compute the entire solution path for support vector classification (SVC) with very low extra computational cost. This approach was later extended to a support vector regression (SVR) model called É›-SVR. However, the method requires that the error parameter É› be set a priori, which is only possible if the desired accuracy of the approximation can be specified in advance. In this paper, we show that the solution path for É›-SVR is also piecewise linear with respect to É›. We further propose an efficient algorithm for exploring the two-dimensional solution space defined by the regularization and error parameters. As opposed to the algorithm for SVC, our proposed algorithm for É›-SVR initializes the number of support vectors to zero and then increases it gradually as the algorithm proceeds. As such, a good regression function possessing the sparseness property can be obtained after only a few iterations. 1
    corecore